tecnickcom/tc-lib-unicode 问题修复 & 功能扩展

解决BUG、新增功能、兼容多环境部署,快速响应你的开发需求

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

tecnickcom/tc-lib-unicode

最新稳定版本:2.2.0

Composer 安装命令:

composer require tecnickcom/tc-lib-unicode

包简介

PHP library containing Unicode methods

README 文档

README

UTF-8 and Unicode processing utilities, including bidirectional text handling.

Latest Stable Version Build Coverage License Downloads

Sponsor on GitHub

If this project is useful to you, please consider supporting development via GitHub Sponsors.

Overview

tc-lib-unicode provides Unicode conversion helpers and bidirectional algorithm support for robust multilingual text processing.

It is built to handle multilingual text paths where normalization, code-point handling, and bidirectional ordering directly affect rendering quality. By isolating Unicode-heavy operations, dependent libraries can keep text processing accurate and easier to audit.

Namespace \Com\Tecnick\Unicode
Author Nicola Asuni info@tecnick.com
License GNU LGPL v3 - see LICENSE
API docs https://tcpdf.org/docs/srcdoc/tc-lib-unicode
Packagist https://packagist.org/packages/tecnickcom/tc-lib-unicode

Features

Unicode Utilities

  • UTF-8 character and ordinal conversion helpers
  • String/character array transformations
  • Integration-ready conversion methods for document engines

Bidirectional Support

  • Unicode Bidirectional Algorithm implementation
  • Right-to-left and mixed-direction text processing
  • Supporting shaping/step logic for complex scripts

Character Substitution

  • Context-sensitive codepoint-level substitution via Substitution::replaceChars()
  • Thai — repositions leading vowels (Sara E/AE/O/AI, U+0E40–U+0E44, U+0E4D) to follow their base consonant, matching PDF visual-order glyph streams
  • Devanagari — moves left-positional matras (U+093F) to precede their base consonant cluster, including conjuncts joined by Virama (U+094D)
  • Hangul — composes Hangul Jamo sequences (U+1100–U+11FF, U+A960–U+A97F, U+D7B0–U+D7FF) into precomposed syllables (U+AC00–U+D7A3) per Unicode Standard §3.12

Requirements

  • PHP 8.2 or later
  • Extensions: mbstring, pcre
  • Composer

Installation

composer require tecnickcom/tc-lib-unicode

Quick Start

<?php

require_once __DIR__ . '/vendor/autoload.php';

$bidi = new \Com\Tecnick\Unicode\Bidi('hello ', null, null, 'R', false);
echo $bidi->getString();

Character substitution

Substitution::replaceChars() takes an array of Unicode codepoints and returns a transformed array with script-specific substitutions applied. It is a pure codepoint-level transform with no font or PDF dependency.

<?php

require_once __DIR__ . '/vendor/autoload.php';

$sub = new \Com\Tecnick\Unicode\Substitution();

// Thai: leading vowel repositioned after its base consonant
// Logical order:  [U+0E40 SARA E, U+0E01 KO KAI]
// Visual order:   [U+0E01 KO KAI, U+0E40 SARA E]
$result = $sub->replaceChars([0x0E40, 0x0E01]);
// $result === [0x0E01, 0x0E40]

// Devanagari: left matra repositioned before its base consonant cluster
// Logical order:  [U+0915 KA, U+093F VOWEL SIGN I]
// Visual order:   [U+093F VOWEL SIGN I, U+0915 KA]
$result = $sub->replaceChars([0x0915, 0x093F]);
// $result === [0x093F, 0x0915]

// Hangul: Jamo composed into a precomposed syllable
// [U+1100 KIYEOK, U+1161 JUNGSEONG A, U+11A8 JONGSEONG KIYEOK] → [U+AC01 각]
$result = $sub->replaceChars([0x1100, 0x1161, 0x11A8]);
// $result === [0xAC01]

Supported scripts and Unicode ranges

Script Unicode range(s) Transformation
Thai U+0E00–U+0E7F Leading vowels repositioned after base consonant
Devanagari U+0900–U+097F Left matras repositioned before consonant cluster
Hangul Jamo U+1100–U+11FF, U+A960–U+A97F, U+D7B0–U+D7FF Jamo composed to precomposed syllables (U+AC00–U+D7A3)

Codepoints belonging to unsupported scripts are passed through unchanged.

Development

make deps
make help
make qa

Packaging

make rpm
make deb

For system packages, bootstrap with:

require_once '/usr/share/php/Com/Tecnick/Unicode/autoload.php';

Contributing

Contributions are welcome. Please review CONTRIBUTING.md, CODE_OF_CONDUCT.md, and SECURITY.md.

Contact

Nicola Asuni - info@tecnick.com

统计信息

  • 总下载量: 696.17k
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 10
  • 点击次数: 6
  • 依赖项目数: 2
  • 推荐数: 1

GitHub 信息

  • Stars: 10
  • Watchers: 2
  • Forks: 7
  • 开发语言: PHP

其他信息

  • 授权协议: LGPL-3.0-or-later
  • 更新时间: 2015-09-12