assisted-mindfulness/naive-bayes 问题修复 & 功能扩展

解决BUG、新增功能、兼容多环境部署,快速响应你的开发需求

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

assisted-mindfulness/naive-bayes

最新稳定版本:1.0.0

Composer 安装命令:

composer require assisted-mindfulness/naive-bayes

包简介

Naive Bayes classifier algorithm

README 文档

README

Tests

This PHP package for Naive Bayes works by looking at a training set and making a guess based on that set. It uses simple statistics and a bit of math to calculate the result.

What can I use this for?

You can use this for categorizing any text content into any arbitrary set of categories. For example:

  • is an email spam, or not spam ?
  • is a news article about technology, politics, or sports ?
  • is a piece of text expressing positive emotions, or negative emotions?

Installation

You may install Naive Bayes into your project using the Composer package manager:

composer require assisted-mindfulness/naive-bayes

Learning

Before the algorithm can do anything, it requires a training set with historical information. To teach your classifier which category the text belongs to, call the learn method:

$classifier = new Classifier();

$classifier
    ->learn('I love sunny days', 'positive')
    ->learn('I hate rain', 'negative');

Guessing

After you have trained the classifier, you can use the prediction of which category the transmitted text belongs to, for example:

$classifier->most('is a sunny days'); // positive
$classifier->most('there will be rain'); // negative

In order for you to enter more similar information, you can use:

$classifier->guess('is a sunny days');

/*
items: array:2 [
  "positive" => 0.0064
  "negative" => 0.0039062
]
*/

Uneven

When the training set contains unbalanced data not intentionally but due to insufficient data, you can enable an 'uneven' mode that equalizes the probability calculation for document types.

$classifier
   ->uneven()
   ->guess('is a sunny days');

Tokenizer

The algorithm utilizes a tokenizer to segment the text into words. By default, it splits the text by spaces and includes words with a length of more than 3 symbols. You can also define your custom tokenizer using the following example:

$classifier = new Classifier();

$classifier->setTokenizer(function (string $string) {
    return Str::of($string)
        ->lower()
        ->matchAll('/[[:alpha:]]+/u')
        ->filter(fn (string $word) => Str::length($word) > 3);
});

Wrapping up

There you have it! Even with a very small training set the algorithm can still return some decent results. For example, Naive Bayes has been proven to give decent results in sentiment analyses.

Moreover, Naive Bayes can be applied to more than just text. If you have other ways of calculating the probabilities of your metrics you can also plug those in and it will just as good.

License

The MIT License (MIT). Please see License File for more information.

统计信息

  • 总下载量: 526.57k
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 44
  • 点击次数: 1
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 44
  • Watchers: 1
  • Forks: 3
  • 开发语言: PHP

其他信息

  • 授权协议: MIT
  • 更新时间: 2022-03-11