定制 halk/item-similarity 二次开发

按需修改功能、优化性能、对接业务系统,提供一站式技术支持

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

halk/item-similarity

Composer 安装命令:

composer require halk/item-similarity

包简介

Content-based, schema-less recommendation service

README 文档

README

Build Status Code Coverage GitHub license

Item Similarity: content-based, schema-less recommendation service

A simple recommendation service which computes the similarity of items.

Since this is part of my ongoing MSc project, README will be improved by October.

Concept

Similarity Computation

The similarity between two items is computed as follows:

Given the following two JSON documents:

a = {
    "brand": "Addi",
    "model": "Speedy",
    "colors": ["black", "white"],
    "category": "Shoes",
    "size": 42
}
b = {
    "brand": "Prima",
    "model": "Kazak",
    "colors": ["red", "white"],
    "category": "Sweater",
    "sleeves": "long"
}

First, any item features which are not in both documents are discared:

a = {
    "brand": "Addi",
    "model": "Speedy",
    "colors": "black,white",
    "category": "Shoes",
}
b = {
    "brand": "Prima",
    "model": "Kazak",
    "colors": "red,white",
    "category": "Sweater",
}

Second, the documents are converted into lists with the keys as a prefix to the values:

a = ["brand_Addi", "model_Ayak", "colors_black", "colors_white", "category_Shoes"]
b = ["brand_Addi", "model_Kazak", "colors_red", "colors_white", "category_Sweater"]

Finally, the variant of the tanimoto coefficient is calculated:

nA = number of features in A
nB = number of features in B
nAB = number of intersecting features
score = nAB / (nA + nB - nAB)

Similarity index

The index is kept in a MongoDB collection with a document for each feature. This document also keeps track of its similarity score against other documents. Every time a new record is processed, the similarity to other documents is computed and stored. This score is then added to the other document as well. Thus when a similarity score is requested for a document, the end result is already pre-computed.

API

The index is managed by POST and DELETE requests. The score is fetched via GET.

The route prefix {index} allows maintaining more than one index within an instance.

POST /{index} Posts a document to the index and calculates the similarity score

DELETE /{index} Deletes a document

GET /{index}?itemIds=1,2,3 Returns similar items for the items in the GET parameter.

Installation

$ git clone https://github.com/halk/item-similarity
$ cd item-similarity
$ cp config/config.php.dist config/config.php

Please see recowise-vagrant for provisioning details.

Tests

$ cp phpunit.xml.dist phpunit.xml
$ phpunit

统计信息

  • 总下载量: 0
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 1
  • 点击次数: 0
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: MIT
  • 更新时间: 2015-08-21