承接 weichie-com/blur 相关项目开发

从需求分析到上线部署,全程专人跟进,保证项目质量与交付效率

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

weichie-com/blur

最新稳定版本:v1.0.0

Composer 安装命令:

composer require weichie-com/blur

包简介

PII detection and de-identification SDK for BeNeLux (Belgium, Netherlands, Luxembourg)

README 文档

README

PHP Version License Tests Coverage

A data protection and de-identification SDK focused on BeNeLux countries (Belgium, Netherlands, Luxembourg), inspired by Microsoft Presidio.

Features

  • Pattern-based PII Detection: Fast and accurate entity recognition using regex patterns
  • Full Validation: Checksum validation (Luhn, mod-97, 11-proof) for high accuracy
  • BeNeLux-Specific Recognizers:
    • 🇳🇱 Dutch BSN (Burgerservicenummer) with 11-proof validation
    • 🇧🇪 Belgian National Number with mod-97 validation
    • 🇱🇺 Luxembourg National ID
    • BeNeLux IBAN codes with mod-97 checksum
    • Phone numbers for BE/NL/LU (using libphonenumber)
  • Generic Recognizers: Email, Credit Card (Luhn), IP Address, URL
  • Multiple Anonymization Strategies:
    • Replace with custom values
    • Redact (remove completely)
    • Mask (partial or full)
    • Hash (SHA-256/SHA-512)
    • Encrypt/Decrypt (AES-256-CBC)
  • Context Enhancement: Boost detection confidence with contextual keywords
  • UTF-8 Support: Full multibyte string handling
  • Type-Safe: Built with PHP 8.1+ strict types

Installation

Install via Composer:

composer require weichie-com/blur

Or add to your composer.json:

{
    "require": {
        "weichie-com/blur": "^1.0"
    }
}

Requirements

  • PHP 8.1+ (for strict types and named parameters)
  • ext-mbstring: Multibyte string support (UTF-8)
  • ext-openssl: AES encryption support
  • giggsey/libphonenumber-for-php: Phone number validation (auto-installed)

Quick Start

<?php

require_once 'vendor/autoload.php';

use Weichie\Blur\Analyzer\AnalyzerEngine;
use Weichie\Blur\Analyzer\RecognizerRegistry;
use Weichie\Blur\Analyzer\Recognizers\BeNeLux\BsnRecognizer;
use Weichie\Blur\Anonymizer\AnonymizerEngine;
use Weichie\Blur\Anonymizer\Models\OperatorConfig;
use Weichie\Blur\Anonymizer\Operators\MaskOperator;

// 1. Setup Analyzer
$registry = new RecognizerRegistry();
$registry->addRecognizer(new BsnRecognizer());

$analyzer = new AnalyzerEngine($registry);

// 2. Analyze text
$text = "Het BSN nummer is 111222333 voor deze klant.";
$results = $analyzer->analyze($text, language: 'nl');

// 3. Setup Anonymizer
$anonymizer = new AnonymizerEngine();
$anonymizer->addOperator(new MaskOperator());

// 4. Anonymize
$operators = [
    'NL_BSN' => OperatorConfig::mask('*', 6)
];

$anonymized = $anonymizer->anonymize($text, $results, $operators);
echo $anonymized->getText();
// Output: "Het BSN nummer is ******333 voor deze klant."

Usage Examples

1. Detecting BeNeLux National IDs

use Weichie\Blur\Analyzer\Recognizers\BeNeLux\BsnRecognizer;
use Weichie\Blur\Analyzer\Recognizers\BeNeLux\BelgianNationalNumberRecognizer;
use Weichie\Blur\Analyzer\Recognizers\BeNeLux\LuxembourgNationalIdRecognizer;

$registry = new RecognizerRegistry();
$registry->addRecognizer(new BsnRecognizer());                      // Dutch BSN
$registry->addRecognizer(new BelgianNationalNumberRecognizer());    // Belgian National Number
$registry->addRecognizer(new LuxembourgNationalIdRecognizer());     // Luxembourg National ID

$analyzer = new AnalyzerEngine($registry);

$text = "BSN: 111222333, BE National: 85.07.30-033.61, LU ID: 1990030112345";
$results = $analyzer->analyze($text, language: 'nl');

foreach ($results as $result) {
    echo "{$result->entityType}: score {$result->score}\n";
}

2. Detecting IBAN Codes

use Weichie\Blur\Analyzer\Recognizers\Generic\IbanRecognizer;

$registry = new RecognizerRegistry();
$registry->addRecognizer(new IbanRecognizer());

$analyzer = new AnalyzerEngine($registry);

$text = "IBAN: NL91ABNA0417164300 (Netherlands), BE68539007547034 (Belgium)";
$results = $analyzer->analyze($text);

3. Multiple Anonymization Strategies

// Strategy 1: Replace with labels
$operators = [
    'NL_BSN' => OperatorConfig::replace('[BSN-REDACTED]'),
    'EMAIL_ADDRESS' => OperatorConfig::replace('[EMAIL]'),
];

// Strategy 2: Partial masking
$operators = [
    'NL_BSN' => OperatorConfig::mask('*', 6, false),          // Mask first 6 chars
    'CREDIT_CARD' => OperatorConfig::mask('*', 12, false),    // Mask first 12 chars
];

// Strategy 3: Complete redaction
$operators = [
    'DEFAULT' => OperatorConfig::redact(),  // Remove all detected entities
];

// Strategy 4: Hashing for consistency
$operators = [
    'NL_BSN' => OperatorConfig::hash('sha256'),
    'IBAN_CODE' => OperatorConfig::hash('sha256'),
];

// Strategy 5: Encryption (reversible)
$key = 'your-secret-key';
$operators = [
    'NL_BSN' => OperatorConfig::encrypt($key),
    'BE_NATIONAL_NUMBER' => OperatorConfig::encrypt($key),
];

4. Context Enhancement

Boost detection confidence when context keywords appear near entities:

$text = "Het BSN nummer is 111222333 voor deze klant.";

$results = $analyzer->analyze(
    text: $text,
    language: 'nl',
    context: ['bsn', 'nummer', 'klant'],  // Boost score when these words are nearby
    scoreThreshold: 0.3
);

// The BSN will have a higher confidence score due to context words

5. Entity Filtering

Detect only specific entity types:

$results = $analyzer->analyze(
    text: $text,
    language: 'nl',
    entities: ['NL_BSN', 'EMAIL_ADDRESS']  // Only detect these types
);

6. Allow List

Whitelist specific values to ignore:

$results = $analyzer->analyze(
    text: $text,
    language: 'nl',
    allowList: ['test@example.com', '111222333']  // Ignore these values
);

Supported Recognizers

BeNeLux-Specific

Entity Type Description Validation Country
NL_BSN Dutch Burgerservicenummer 11-proof checksum 🇳🇱 NL
BE_NATIONAL_NUMBER Belgian National Number mod-97 checksum 🇧🇪 BE
LU_NATIONAL_ID Luxembourg National ID Date validation 🇱🇺 LU
IBAN_CODE IBAN (BE/NL/LU) mod-97 checksum 🇧🇪🇳🇱🇱🇺
PHONE_NUMBER Phone numbers libphonenumber 🇧🇪🇳🇱🇱🇺

Generic

Entity Type Description Validation
EMAIL_ADDRESS Email addresses RFC validation
CREDIT_CARD Credit card numbers Luhn checksum
IP_ADDRESS IPv4/IPv6 addresses IP validation
URL URLs URL validation

Supported Operators

Operator Description Parameters
replace Replace with custom value new_value
redact Remove completely None
mask Partial/full masking masking_char, chars_to_mask, from_end
hash SHA-256/SHA-512 hashing algorithm (default: sha256)
encrypt AES-256-CBC encryption key
decrypt AES-256-CBC decryption key

Validation Algorithms

Luhn Checksum (Credit Cards)

Used to validate credit card numbers. Prevents false positives from random digit sequences.

Mod-97 Checksum (IBAN, Belgian National Number)

ISO 7064 mod-97 algorithm for IBAN codes and Belgian National Numbers.

11-Proof Checksum (Dutch BSN)

Dutch "elfproef" (11-check) algorithm for validating BSN numbers.

Architecture

Weichie\Blur\
├── Analyzer/
│   ├── AnalyzerEngine.php           # Main detection orchestrator
│   ├── EntityRecognizer.php         # Base recognizer interface
│   ├── PatternRecognizer.php        # Pattern-based recognition
│   ├── RecognizerRegistry.php       # Recognizer management
│   ├── Recognizers/
│   │   ├── Generic/                 # Universal recognizers
│   │   │   ├── EmailRecognizer.php
│   │   │   ├── CreditCardRecognizer.php
│   │   │   ├── IpRecognizer.php
│   │   │   ├── UrlRecognizer.php
│   │   │   ├── IbanRecognizer.php
│   │   │   └── PhoneRecognizer.php
│   │   └── BeNeLux/                 # BeNeLux-specific
│   │       ├── BsnRecognizer.php
│   │       ├── BelgianNationalNumberRecognizer.php
│   │       └── LuxembourgNationalIdRecognizer.php
│   └── Models/
│       ├── RecognizerResult.php
│       └── Pattern.php
└── Anonymizer/
    ├── AnonymizerEngine.php         # Main anonymization orchestrator
    ├── Operator.php                 # Base operator interface
    ├── TextReplaceBuilder.php       # Text manipulation
    ├── Operators/
    │   ├── ReplaceOperator.php
    │   ├── RedactOperator.php
    │   ├── MaskOperator.php
    │   ├── HashOperator.php
    │   ├── EncryptOperator.php
    │   └── DecryptOperator.php
    └── Models/
        ├── OperatorConfig.php
        ├── OperatorResult.php
        └── EngineResult.php

Design Principles

  1. Simple but Complete: Focus on core functionality without ML/NLP complexity
  2. Pattern-Based: Fast regex matching with validation for accuracy
  3. Type-Safe: PHP 8.1+ with strict types throughout
  4. UTF-8 First: Proper multibyte string handling everywhere
  5. Extensible: Easy to add custom recognizers and operators
  6. Immutable Results: Thread-safe result objects

Performance

  • Fast Pattern Matching: No ML model overhead
  • Efficient Validation: Checksum algorithms run in O(n) time
  • UTF-8 Optimized: Uses mb_* functions for correct character offsets
  • Minimal Dependencies: Only essential libraries (libphonenumber)

Examples

See examples/benelux_example.php for a comprehensive demonstration including:

  • All BeNeLux recognizers in action
  • Multiple anonymization strategies
  • Context enhancement
  • Different operator configurations

Run it:

php examples/benelux_example.php

Contributing

Contributions are welcome! To add support for additional countries:

  1. Create a recognizer in src/Analyzer/Recognizers/CountryName/
  2. Extend PatternRecognizer or implement EntityRecognizer
  3. Add validation logic (checksum, format, etc.)
  4. Include context words in local language(s)
  5. Add tests and examples

License

MIT License - See LICENSE file for details

Credits

This project is inspired by Microsoft Presidio. Special thanks to the Presidio team for their excellent work on PII detection and de-identification.

Roadmap

  • Additional country-specific recognizers (Germany, France, Spain, etc.)
  • Custom recognizer builder API
  • Batch processing support
  • Performance benchmarks
  • Integration with popular PHP frameworks (Laravel, Symfony)

Support

For issues, questions, or contributions, please visit the GitHub repository.

统计信息

  • 总下载量: 3
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 0
  • 点击次数: 0
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: MIT
  • 更新时间: 2025-11-21