linguistic/ngramextractor
Composer 安装命令:
composer require linguistic/ngramextractor
包简介
Extracts ngrams from a given text and does linguistic pre-processing like stopword removal
README 文档
README
Installation
Simple install via Composer:
composer require linguistic/ngramextractor
Usage
Coming soon.
Example
$tokenizer = new Tokenizer(); $tokenizer->addRemovalRule('/<\/?\w+[\s\w\=\"\/\#\-\:\.\_]*>/') # Removes HTML Tags ->addRemovalRule('/[^a-z0-9]+/', ' ') # Replaces everything which is not text with a space ->setSeperator('/\s+/'); # Tokenizes text with whitespace as delimiter
$content = ""; # The text that should get tokenized $stopwords = array(); # (optional) array of stopwords $extractor = new NGramExtractor($content, $tokenizer, $stopwords); $unigrams = $extractor->getNGrams(1); # gets all n-grams in the text, n = 1 $unigramsFiltered = NGramExtractor::limitByOccurance($extractor->getNGramCount(1, true), 3); # get unigrams and their occurance if the occurance is greater or equal 3
Ressources
统计信息
- 总下载量: 2.3k
- 月度下载量: 0
- 日度下载量: 0
- 收藏数: 3
- 点击次数: 0
- 依赖项目数: 0
- 推荐数: 0
其他信息
- 授权协议: GPL3
- 更新时间: 2017-12-05