yethee/tiktoken
最新稳定版本:1.1.0
Composer 安装命令:
composer require yethee/tiktoken
包简介
PHP version of tiktoken
README 文档
README
This is a port of the tiktoken.
Installation
$ composer require yethee/tiktoken
Usage
use Yethee\Tiktoken\EncoderProvider; $provider = new EncoderProvider(); $encoder = $provider->getForModel('gpt-3.5-turbo-0301'); $tokens = $encoder->encode('Hello world!'); print_r($tokens); // OUT: [9906, 1917, 0] $encoder = $provider->get('p50k_base'); $tokens = $encoder->encode('Hello world!'); print_r($tokens); // OUT: [15496, 995, 0]
Cache
The encoder uses an external vocabularies, so caching is used by default to avoid performance issues.
By default, the directory for temporary files is used.
You can override the directory for cache via environment variable TIKTOKEN_CACHE_DIR
or use EncoderProvider::setVocabCache():
use Yethee\Tiktoken\EncoderProvider; $encProvider = new EncoderProvider(); $encProvider->setVocabCache('/path/to/cache'); // Using the provider
Lib mode
Experimental
You can use tiktoken-rs library via FFI binding. This can improve performance when need to encode medium or large texts. However, the overhead of data marshalling can lead to poor performance for small texts.
use Yethee\Tiktoken\Encoder\LibEncoder; use Yethee\Tiktoken\EncoderProvider; // LibEncoder::init('/path/to/lib'); $encProvider = new EncoderProvider(true); // Force using the lib encoder
You need to provide path to the lib before using the provider. There are several ways to do this:
- Use
Yethee\Tiktoken\Encoder\LibEncoder::init()method. - Use
Yethee\Tiktoken\Encoder\LibEncoder::preload()method, inside opcache preload script. - Use environment variable
TIKTOKEN_LIB_PATHorLD_LIBRARY_PATH
Build lib
Requirements
- Rust >= 1.85
git clone git@github.com:yethee/tiktoken-php.git
cd tiktoken-php
cargo build --release
Copy binary from target/release:
libtiktoken_php.sofor linuxlibtiktoken_php.dylibfor MacOStiktoken_php.dllfor Windows
NOTE: You can see .docker/Dockefile for an example.
Benchmark
You can see benchmark result in #27 or run it locally:
composer bench
TODO
- Add implementation for
Yethee\Tiktoken\Encoder\LibEncoder::encodeInChunks()method
Limitations
- Encoding for GPT-2 is not supported.
- Special tokens (like
<|endofprompt|>) are not supported.
License
统计信息
- 总下载量: 2.47M
- 月度下载量: 0
- 日度下载量: 0
- 收藏数: 152
- 点击次数: 2
- 依赖项目数: 13
- 推荐数: 1
其他信息
- 授权协议: MIT
- 更新时间: 2026-01-04