承接 jared/php-tokenizer 相关项目开发

从需求分析到上线部署,全程专人跟进,保证项目质量与交付效率

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

jared/php-tokenizer

Composer 安装命令:

composer require jared/php-tokenizer

包简介

This package provides a simple standalone regular expressions powered tokenizer.

README 文档

README

A simple RE driven tokenizer

This package provides a simple standalone regular expressions powered tokenizer.

Installation

composer require jared/php-tokenizer

Usage

The example usage:

$rules = [ 
    'NON_SPACE_STRING' => '/\\G[^\\s]+/u',
    'ANY_CHARACTER' => '/\\G./u'
];

$string = 'abc 1qz';
$stream = ( new Falloff\Tokenizer\Factory( $rules ) )->getStream( $string );
while( $token = $stream->nextToken() ){
    print "Token has type `{$token->type}` and its value is `{$token->value}` at offset `{$token->offset}`\n";
}

// The output is:
# Token has type `NON_SPACE_STRING` and its value is `abc` at offset `0`
# Token has type `ANY_CHARACTER` and its value is ` ` at offset `3`
# Token has type `NON_SPACE_STRING` and its value is `1qz` at offset `4`

Note: the regexps used MUST start with the \G assertion.

Note: data is interpretted as UTF-8 so regexps are recommended to be provided with a u setting.

Rules might be added on the fly to the factory or either the stream itself. Adding rules to the factory will not affect the streams instantinated already.

$rules = [ 
    'NON_SPACE_STRING' => '/\\G[^\\s]+/u',
    'ANY_CHARACTER' => '/\\G./u'
];

$string = 'a b 1 qz';
$stream = ( new Falloff\Tokenizer\Factory( $rules ) )->getStream( $stream );

// This rule will never trigger, coz 'NON_SPACE_STRING' will be macthed earlier
$stream->appendRule('DIGIT', '/\\G\d/u');

// Prepending rules, so these will be matched before the 'NON_SPACE_STRING'
$stream->prependRules([
    'Q_CHAR' => '/\\Gq/u',
    'Z_CHAR' => '/\\Gz/u',
]);

// Stream might be invoked like it was a function
while( $token = $stream() ){
    print "Token has type `{$token->type}` and its value is `{$token->value}` at offset `{$token->offset}`\n";
}

// The output is:
# Token has type `NON_SPACE_STRING` and its value is `a` at offset `0`
# Token has type `ANY_CHARACTER` and its value is ` ` at offset `1`
# Token has type `NON_SPACE_STRING` and its value is `b` at offset `2`
# Token has type `ANY_CHARACTER` and its value is ` ` at offset `3`
# Token has type `NON_SPACE_STRING` and its value is `1` at offset `4`
# Token has type `ANY_CHARACTER` and its value is ` ` at offset `5`
# Token has type `Q_CHAR` and its value is `q` at offset `6`
# Token has type `Z_CHAR` and its value is `z` at offset `7`

When no rules matched for the next chunk of input stream, the UnknownTokenException is thrown. This exception is a token itself. It has its type set to NULL but yet allows accessing the value and offset properties.

When stream ends, the call for next token will return false. The eof property may be checked to retrieve the stream state without asking for next token:

if( $stream->eof ){
    print "Got all the tokens we had there";
} else{
    $token = $stream();
}

The remaining substring may be retrieved at any moment with the tail method:

print "The untokenized substring currently is: " . $stream->tail();

Stream can have an attached callback that is triggered every time token is requested from the tokenizer:

use \Falloff\Tokenizer\{UnknownTokenException,Token};

$stream->onTokenRequest(function( UnknownTokenException|Token $token ){
    print $token->type . ' token retrieved from the stream';
});

If this callback returns a Token instance, this instance will be returned to the initial caller:

统计信息

  • 总下载量: 12
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 0
  • 点击次数: 0
  • 依赖项目数: 1
  • 推荐数: 0

GitHub 信息

  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: Unknown
  • 更新时间: 2023-09-13