承接 zendevio/bmpm 相关项目开发

从需求分析到上线部署,全程专人跟进,保证项目质量与交付效率

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

zendevio/bmpm

最新稳定版本:v1.0.1

Composer 安装命令:

composer require zendevio/bmpm

包简介

Modern PHP 8.4+ implementation of the Beider-Morse Phonetic Matching (BMPM) algorithm for multilingual name matching

README 文档

README

PHP Version License Tests Coverage MSI PHPStan

A modern PHP 8.4+ implementation of the Beider-Morse Phonetic Matching (BMPM) algorithm for multilingual name matching. This library enables phonetic comparison of names across 20+ languages, making it ideal for genealogical research, record linkage, and fuzzy name searching.

Features

  • Multilingual Support: 20 languages including Arabic, Cyrillic, Greek, Hebrew, and Latin-based scripts
  • Three Name Type Modes: Generic, Ashkenazic, and Sephardic variants
  • Dual Matching Accuracy: Exact and Approximate modes for precision vs. recall trade-offs
  • Daitch-Mokotoff Soundex: Included D-M Soundex implementation for Slavic/Yiddish names
  • Modern PHP: Built for PHP 8.4+ with enums, readonly classes, and strict typing
  • Immutable API: Fluent, immutable builder pattern for safe configuration
  • Well Tested: 400+ tests with 97.51% coverage, 81% MSI (mutation score), PHPStan level max

Installation

composer require zendevio/bmpm

Requirements

  • PHP 8.4 or higher
  • ext-mbstring - Multibyte string support
  • ext-intl - Internationalization support
  • ext-json - JSON support

Quick Start

use Zendevio\BMPM\BeiderMorse;

// Simple usage
$encoder = new BeiderMorse();
$phonetic = $encoder->encode('Schwarzenegger');
// Returns: "(Svarcenegr|## more alternatives...)"

// Check if two names might match
$matches = $encoder->matches('Smith', 'Schmidt');
// Returns: true (they share phonetic codes)

// Get similarity score
$similarity = $encoder->similarity('Mueller', 'Miller');
// Returns: float between 0.0 and 1.0

Configuration

use Zendevio\BMPM\BeiderMorse;
use Zendevio\BMPM\Enums\NameType;
use Zendevio\BMPM\Enums\MatchAccuracy;
use Zendevio\BMPM\Enums\Language;

// Fluent configuration
$encoder = BeiderMorse::create()
    ->withNameType(NameType::Ashkenazic)      // Generic, Ashkenazic, or Sephardic
    ->withAccuracy(MatchAccuracy::Approximate) // Exact or Approximate
    ->withLanguages(Language::German, Language::Polish);

// Encode to array of alternatives
$alternatives = $encoder->encodeToArray('Kowalski');
// Returns: ['kovalski', 'kovalske', ...]

// Batch encoding
$results = $encoder->encodeBatch(['Smith', 'Jones', 'Williams']);
// Returns: ['Smith' => '(smit|...)', 'Jones' => '...', ...]

Name Types

Type Description Languages
Generic General-purpose matching 20 languages
Ashkenazic Eastern European Jewish names 11 languages
Sephardic Mediterranean Jewish names 6 languages
// Ashkenazic mode for Eastern European Jewish names
$encoder = BeiderMorse::create()
    ->withNameType(NameType::Ashkenazic);

// Sephardic mode for Spanish/Portuguese Jewish names
$encoder = BeiderMorse::create()
    ->withNameType(NameType::Sephardic);

Language Detection

The library automatically detects the likely language(s) of a name:

$encoder = new BeiderMorse();

// Detect all possible languages
$languages = $encoder->detectLanguages('Müller');
// Returns: [Language::German]

// Get primary language
$primary = $encoder->detectPrimaryLanguage('Kowalski');
// Returns: Language::Polish

Daitch-Mokotoff Soundex

For Slavic and Yiddish surname matching:

$encoder = new BeiderMorse();

$soundex = $encoder->soundex('Schwarzenegger');
// Returns: "479465 474659" (multiple codes for ambiguous spellings)

$soundex = $encoder->soundex('Cohen');
// Returns: "560000 460000"

Advanced Usage

Restrict to Specific Languages

$encoder = BeiderMorse::create()
    ->withLanguages(Language::German, Language::English, Language::French);

// Or using a bitmask directly
$encoder = BeiderMorse::create()
    ->withLanguageMask(Language::German->value | Language::English->value);

Custom Data Path

// Use custom rule files location
$encoder = BeiderMorse::create()
    ->withDataPath('/path/to/custom/rules');

Direct Engine Access

For advanced use cases, access the engine directly:

use Zendevio\BMPM\Engine\PhoneticEngine;
use Zendevio\BMPM\Engine\LanguageDetector;
use Zendevio\BMPM\Rules\RuleLoader;

$ruleLoader = RuleLoader::create();
$detector = new LanguageDetector($ruleLoader);
$engine = new PhoneticEngine($ruleLoader, $detector);

$result = $engine->encode('name', NameType::Generic, MatchAccuracy::Approximate);

API Reference

BeiderMorse (Main Facade)

Method Description
encode(string $name): string Encode name to phonetic representation
encodeToArray(string $name): array Get all phonetic alternatives as array
encodeBatch(array $names): array Encode multiple names at once
matches(string $a, string $b): bool Check if two names match phonetically
similarity(string $a, string $b): float Get similarity score (0.0 - 1.0)
detectLanguages(string $name): array Detect possible languages
detectPrimaryLanguage(string $name): Language Get most likely language
soundex(string $name): string Get D-M Soundex encoding

Configuration Methods

Method Description
withNameType(NameType $type): self Set name type variant
withAccuracy(MatchAccuracy $accuracy): self Set matching accuracy
withLanguages(Language ...$langs): self Restrict to specific languages
withLanguageMask(int $mask): self Set language bitmask directly
withAutoLanguageDetection(): self Enable automatic detection
withDataPath(string $path): self Set custom rules path

Supported Languages

Generic Mode (20 languages)

Arabic, Cyrillic, Czech, Dutch, English, French, German, Greek, Greek (Latin), Hebrew, Hungarian, Italian, Latvian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish

Ashkenazic Mode (11 languages)

Cyrillic, English, French, German, Hebrew, Hungarian, Polish, Romanian, Russian, Spanish

Sephardic Mode (6 languages)

French, Hebrew, Italian, Portuguese, Spanish

How It Works

The Beider-Morse algorithm:

  1. Language Detection: Analyzes spelling patterns to identify likely source language(s)
  2. Phonetic Rules: Applies language-specific transformation rules
  3. Approximation: Generates phonetic codes that capture pronunciation variants
  4. Multi-output: Produces multiple codes for ambiguous spellings

This enables matching names like:

  • "Smith" ↔ "Schmidt" ↔ "Schmitt"
  • "Cohen" ↔ "Kohn" ↔ "Cohn" ↔ "Cahan"
  • "Schwarzenegger" ↔ "Shvarceneger"

Documentation

Testing & Quality

# Run tests
composer test

# Run with coverage
composer test:coverage

# Static analysis (PHPStan level max)
composer analyse

# Code style check
composer cs-check

# Code style fix
composer cs-fix

# Mutation testing (min 80% MSI required)
composer infection

# Automated refactoring
composer rector:dry    # Preview changes
composer rector        # Apply changes

# Run all checks
composer check         # cs-check, analyse, test

# Full CI pipeline
composer ci            # security, cs-check, analyse, test, infection

Credits

  • Alexander Beider - Original BMPM algorithm
  • Stephen P. Morse - Original BMPM algorithm and website
  • Alin M. Gheorghe - PHP 8.4+ implementation

License

This library is licensed under the GPL-3.0 License, the same license as the original BMPM implementation.

Related Resources

统计信息

  • 总下载量: 3
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 2
  • 点击次数: 0
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 2
  • Watchers: 0
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: GPL-3.0-or-later
  • 更新时间: 2025-12-13