zendevio/bmpm
最新稳定版本:v1.0.1
Composer 安装命令:
composer require zendevio/bmpm
包简介
Modern PHP 8.4+ implementation of the Beider-Morse Phonetic Matching (BMPM) algorithm for multilingual name matching
关键字:
README 文档
README
A modern PHP 8.4+ implementation of the Beider-Morse Phonetic Matching (BMPM) algorithm for multilingual name matching. This library enables phonetic comparison of names across 20+ languages, making it ideal for genealogical research, record linkage, and fuzzy name searching.
Features
- Multilingual Support: 20 languages including Arabic, Cyrillic, Greek, Hebrew, and Latin-based scripts
- Three Name Type Modes: Generic, Ashkenazic, and Sephardic variants
- Dual Matching Accuracy: Exact and Approximate modes for precision vs. recall trade-offs
- Daitch-Mokotoff Soundex: Included D-M Soundex implementation for Slavic/Yiddish names
- Modern PHP: Built for PHP 8.4+ with enums, readonly classes, and strict typing
- Immutable API: Fluent, immutable builder pattern for safe configuration
- Well Tested: 400+ tests with 97.51% coverage, 81% MSI (mutation score), PHPStan level max
Installation
composer require zendevio/bmpm
Requirements
- PHP 8.4 or higher
ext-mbstring- Multibyte string supportext-intl- Internationalization supportext-json- JSON support
Quick Start
use Zendevio\BMPM\BeiderMorse; // Simple usage $encoder = new BeiderMorse(); $phonetic = $encoder->encode('Schwarzenegger'); // Returns: "(Svarcenegr|## more alternatives...)" // Check if two names might match $matches = $encoder->matches('Smith', 'Schmidt'); // Returns: true (they share phonetic codes) // Get similarity score $similarity = $encoder->similarity('Mueller', 'Miller'); // Returns: float between 0.0 and 1.0
Configuration
use Zendevio\BMPM\BeiderMorse; use Zendevio\BMPM\Enums\NameType; use Zendevio\BMPM\Enums\MatchAccuracy; use Zendevio\BMPM\Enums\Language; // Fluent configuration $encoder = BeiderMorse::create() ->withNameType(NameType::Ashkenazic) // Generic, Ashkenazic, or Sephardic ->withAccuracy(MatchAccuracy::Approximate) // Exact or Approximate ->withLanguages(Language::German, Language::Polish); // Encode to array of alternatives $alternatives = $encoder->encodeToArray('Kowalski'); // Returns: ['kovalski', 'kovalske', ...] // Batch encoding $results = $encoder->encodeBatch(['Smith', 'Jones', 'Williams']); // Returns: ['Smith' => '(smit|...)', 'Jones' => '...', ...]
Name Types
| Type | Description | Languages |
|---|---|---|
| Generic | General-purpose matching | 20 languages |
| Ashkenazic | Eastern European Jewish names | 11 languages |
| Sephardic | Mediterranean Jewish names | 6 languages |
// Ashkenazic mode for Eastern European Jewish names $encoder = BeiderMorse::create() ->withNameType(NameType::Ashkenazic); // Sephardic mode for Spanish/Portuguese Jewish names $encoder = BeiderMorse::create() ->withNameType(NameType::Sephardic);
Language Detection
The library automatically detects the likely language(s) of a name:
$encoder = new BeiderMorse(); // Detect all possible languages $languages = $encoder->detectLanguages('Müller'); // Returns: [Language::German] // Get primary language $primary = $encoder->detectPrimaryLanguage('Kowalski'); // Returns: Language::Polish
Daitch-Mokotoff Soundex
For Slavic and Yiddish surname matching:
$encoder = new BeiderMorse(); $soundex = $encoder->soundex('Schwarzenegger'); // Returns: "479465 474659" (multiple codes for ambiguous spellings) $soundex = $encoder->soundex('Cohen'); // Returns: "560000 460000"
Advanced Usage
Restrict to Specific Languages
$encoder = BeiderMorse::create() ->withLanguages(Language::German, Language::English, Language::French); // Or using a bitmask directly $encoder = BeiderMorse::create() ->withLanguageMask(Language::German->value | Language::English->value);
Custom Data Path
// Use custom rule files location $encoder = BeiderMorse::create() ->withDataPath('/path/to/custom/rules');
Direct Engine Access
For advanced use cases, access the engine directly:
use Zendevio\BMPM\Engine\PhoneticEngine; use Zendevio\BMPM\Engine\LanguageDetector; use Zendevio\BMPM\Rules\RuleLoader; $ruleLoader = RuleLoader::create(); $detector = new LanguageDetector($ruleLoader); $engine = new PhoneticEngine($ruleLoader, $detector); $result = $engine->encode('name', NameType::Generic, MatchAccuracy::Approximate);
API Reference
BeiderMorse (Main Facade)
| Method | Description |
|---|---|
encode(string $name): string |
Encode name to phonetic representation |
encodeToArray(string $name): array |
Get all phonetic alternatives as array |
encodeBatch(array $names): array |
Encode multiple names at once |
matches(string $a, string $b): bool |
Check if two names match phonetically |
similarity(string $a, string $b): float |
Get similarity score (0.0 - 1.0) |
detectLanguages(string $name): array |
Detect possible languages |
detectPrimaryLanguage(string $name): Language |
Get most likely language |
soundex(string $name): string |
Get D-M Soundex encoding |
Configuration Methods
| Method | Description |
|---|---|
withNameType(NameType $type): self |
Set name type variant |
withAccuracy(MatchAccuracy $accuracy): self |
Set matching accuracy |
withLanguages(Language ...$langs): self |
Restrict to specific languages |
withLanguageMask(int $mask): self |
Set language bitmask directly |
withAutoLanguageDetection(): self |
Enable automatic detection |
withDataPath(string $path): self |
Set custom rules path |
Supported Languages
Generic Mode (20 languages)
Arabic, Cyrillic, Czech, Dutch, English, French, German, Greek, Greek (Latin), Hebrew, Hungarian, Italian, Latvian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish
Ashkenazic Mode (11 languages)
Cyrillic, English, French, German, Hebrew, Hungarian, Polish, Romanian, Russian, Spanish
Sephardic Mode (6 languages)
French, Hebrew, Italian, Portuguese, Spanish
How It Works
The Beider-Morse algorithm:
- Language Detection: Analyzes spelling patterns to identify likely source language(s)
- Phonetic Rules: Applies language-specific transformation rules
- Approximation: Generates phonetic codes that capture pronunciation variants
- Multi-output: Produces multiple codes for ambiguous spellings
This enables matching names like:
- "Smith" ↔ "Schmidt" ↔ "Schmitt"
- "Cohen" ↔ "Kohn" ↔ "Cohn" ↔ "Cahan"
- "Schwarzenegger" ↔ "Shvarceneger"
Documentation
Testing & Quality
# Run tests composer test # Run with coverage composer test:coverage # Static analysis (PHPStan level max) composer analyse # Code style check composer cs-check # Code style fix composer cs-fix # Mutation testing (min 80% MSI required) composer infection # Automated refactoring composer rector:dry # Preview changes composer rector # Apply changes # Run all checks composer check # cs-check, analyse, test # Full CI pipeline composer ci # security, cs-check, analyse, test, infection
Credits
- Alexander Beider - Original BMPM algorithm
- Stephen P. Morse - Original BMPM algorithm and website
- Alin M. Gheorghe - PHP 8.4+ implementation
License
This library is licensed under the GPL-3.0 License, the same license as the original BMPM implementation.
Related Resources
统计信息
- 总下载量: 3
- 月度下载量: 0
- 日度下载量: 0
- 收藏数: 2
- 点击次数: 0
- 依赖项目数: 0
- 推荐数: 0
其他信息
- 授权协议: GPL-3.0-or-later
- 更新时间: 2025-12-13