molbi/php-text-analysis
最新稳定版本:v1.1.0
Composer 安装命令:
composer require molbi/php-text-analysis
包简介
PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language
README 文档
README
PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language. All the documentation for this project can be found in the wiki.
Installation Instructions
Add PHP Text Analysis to your project
composer require yooper/php-text-analysis
Documentation for the library resides in the wiki. https://github.com/yooper/php-text-analysis/wiki
Dictionary Installation
Not required unless you use the dictionary stemmers
For Ubuntu < 16
sudo apt-get install libpspell-dev
sudo apt-get install php5-pspell
sudo apt-get install aspell-en
sudo apt-get install php5-enchant
For Ubuntu >= 16
sudo apt-get install libpspell-dev php7.0-pspell aspell-en php7.0-enchant
For Centos
sudo yum install php5-pspell
sudo yum install aspell-en
sudo yum install php5-enchant
PHP Pecl Stem is not currently available in php 7.0.
Tokenize
There are several tokenizers available
- FixedLengthTokenizer
- GeneralTokenizer
- LambdaTokenizer
- PennTreeBankTokenizer
- RegexTokenizer
- SentenceTokenizer
- WhitespaceTokenizer
Tokenizer Usage
$tokenizer = new GeneralTokenizer()
$tokens = $tokenizer->tokenize("Enter your text here");
Frequency Distribution
$tokenizer = new \TextAnalysis\Tokenizers\GeneralTokenizer();
$tokens = $tokenizer->tokenize("time flies like an arrow and an arrow flies like time");
$freqDist = new \TextAnalysis\Analysis\FreqDist($tokens);
$freqDist->getHapaxes(); //Get the Hapaxes
$freqDist->getTotalTokens();
$freqDist->getTotalUniqueTokens();
Check out the API for full documentation https://github.com/yooper/php-text-analysis/blob/master/src/Analysis/FreqDist.php
统计信息
- 总下载量: 1.54k
- 月度下载量: 0
- 日度下载量: 0
- 收藏数: 0
- 点击次数: 0
- 依赖项目数: 0
- 推荐数: 0
其他信息
- 授权协议: Apache
- 更新时间: 2017-02-06