aymanrb/php-unstructured-text-parser 问题修复 & 功能扩展

解决BUG、新增功能、兼容多环境部署,快速响应你的开发需求

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

aymanrb/php-unstructured-text-parser

最新稳定版本:v2.5.0

Composer 安装命令:

composer require aymanrb/php-unstructured-text-parser

包简介

A PHP library to help extract text out of text documents

README 文档

README

Tests Coverage Status Latest Stable Version Total Downloads License

About Unstructured Text Parser

This is a small PHP library to help extract text out of documents that are not structured in a processing friendly format. When you want to parse text out of form generated emails for example you can create a template matching the expected incoming mail format while specifying the variable text elements and leave the rest for the class to extract your pre-formatted variables out of the incoming mails' body text.

Useful when you want to parse data out of:

  • Emails generated from web forms
  • Documents with definable templates / expressions

Installation

PHP Unstructured Text Parser is available on Packagist (using semantic versioning), and installation via Composer is recommended. Add the following line to your composer.json file:

"aymanrb/php-unstructured-text-parser": "~2.0"

or run

composer require aymanrb/php-unstructured-text-parser

Usage example

<?php
include_once __DIR__ . '/../vendor/autoload.php';

$parser = new aymanrb\UnstructuredTextParser\TextParser('/path/to/templatesDirectory');

$textToParse = 'Text to be parsed fetched from a file, mail, web service, or even added directly to the a string variable like this';

//performs brute force parsing against all available templates, returns first match successful parsing
$parseResults = $parser->parseText($textToParse);
print_r($parseResults->getParsedRawData());

//slower, performs a similarity check on available templates to select the most matching template before parsing
print_r(
    $parser
        ->parseText($textToParse, true)
        ->getParsedRawData()
);

Parsing Procedure

1- Grab a single copy of the text you want to parse.

2- Replace every single varying text within it to a named variable in the form of {%VariableName%} if you want to match everything in this part of text or {%VariableName:Pattern%} if you want to match a specific set of characters or use a more precise pattern.

3- Add the templates file into the templates directory (defined in parsing code) with a txt extension fileName.txt

4- Pass the text you wish to parse to the parse method of the class and let it do the magic for you.

Template Example

If the text documents you want to parse looks like this:

Hello,
If you wish to parse message coming from a website that states info like:
ID & Source: 12234432 Website Form  
Name: Pet Cat
E-Mail: email@example.com
Comment: Some text goes here

Thank You,
Best Regards
Admin

Your Template file (example_template.txt) could be something like:

Hello,
If you wish to parse message coming from a website that states info like:
ID & Source: {%id:[0-9]+%} {%source%}
Name: {%senderName%}
E-Mail: {%senderEmail%}
Comment: {%comment%}

Thank You,
Best Regards
Admin

The output of a successful parsing job would be:

Array(
    'id' => '12234432',
    'source' => 'Website Form',
    'senderName' => 'Pet Cat',
    'senderEmail' => 'email@example.com',
    'comment' => 'Some text goes here'
)

Upgrading from v1.x to v2.x

Version 2.0 is more or less a refactored copy of version 1.x of the library and provides the exact same functionality. There is just one slight difference in the results returned. It's now a parsed data object instead of an array. To get the results as an array like it used to be in v1.x simply call "getParsedRawData()" on the returned object.

<?php
//ParseText used to return array in 1.x
$extractedArray = $parser->parseText($textToParse);

//In 2.x you need to do the following if you want an array
$extractedArray = $parser->parseText($textToParse)->getParsedRawData();

统计信息

  • 总下载量: 20.45k
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 47
  • 点击次数: 1
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 46
  • Watchers: 3
  • Forks: 20
  • 开发语言: PHP

其他信息

  • 授权协议: MIT
  • 更新时间: 2014-11-02