定制 loupe/matcher 二次开发

按需修改功能、优化性能、对接业务系统,提供一站式技术支持

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

loupe/matcher

最新稳定版本:0.2.1

Composer 安装命令:

composer require loupe/matcher

包简介

Highlight and crop around search terms

README 文档

README

A PHP library for search term highlighting and text snippet generation. Transform search results into user-friendly formatted text with highlighted matches and contextual cropping.

Lorem ipsum dolor sit amet, consetetur [...] no sea takimata sanctus est lorem est ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur [...] dolore te feugait nulla facilisi lorem ipsum dolor sit amet, consectetuer [...]

Caution

Work in progress. Expect frequent changes to API and functionality.

Installation

composer require loupe/matcher

Quick Start

Here's a simple example of how to use Loupe Matcher to highlight search terms in a text document and crop around the highlights:

use Loupe\Matcher\Tokenizer\Tokenizer;
use Loupe\Matcher\Matcher;
use Loupe\Matcher\Formatter;
use Loupe\Matcher\FormatterOptions;

$tokenizer = new Tokenizer();
$matcher = new Matcher($tokenizer);
$formatter = new Formatter($matcher);

$options = (new FormatterOptions())
    ->withEnableHighlight()
    ->withEnableCrop()
    ->withCropLength(10);

$result = $formatter->format(
    'This is a long document with many words to search through and compare.',
    'search words',
    $options
);

// "...with many <em>words</em> to <em>search</em> through..."
echo $result->getFormattedText();

Core Components

Tokenizer

Purpose: Breaks text into searchable tokens (words, phrases, terms) for accurate matching.

The Tokenizer converts strings into TokenCollection objects, handling:

  • Word boundaries using ext-intl rules
  • Phrase groups (quoted terms like "exact phrase")
  • Negated terms (prefixed with -)
  • Locale-specific tokenization
$tokenizer = new Tokenizer('en_US'); // Optional locale
$tokens = $tokenizer->tokenize('search for "exact phrase" -exclude');

$tokens->all();          // All tokens
$tokens->phraseGroups(); // Quoted phrases only
$tokens->allNegated();   // Terms to exclude

Matcher

Purpose: Finds which tokens in your text match the search query.

The Matcher compares tokenized text against search terms, with support for:

  • Stop word filtering (ignore common words like "the", "and")
  • Match span calculation (start/end positions)
  • Flexible matching between token collections
$matcher = new Matcher($tokenizer, ['the', 'and', 'or']); // Stop words
$matches = $matcher->calculateMatches('Text to search', 'search query');

// Get position information for highlighting
$spans = $matcher->calculateMatchSpans('Text to search', 'query', $matches);
foreach ($spans as $span) {
    echo "Match at position {$span->getStartPosition()}-{$span->getEndPosition()}";
}

Formatter

Purpose: Combines matching and highlighting to create formatted output with context.

The Formatter orchestrates the entire process:

  • Highlights matched terms with HTML tags
  • Crops text to show relevant context around matches
  • Configurable through FormatterOptions
$formatter = new Formatter($matcher);

$options = (new FormatterOptions())
    ->withEnableHighlight()
    ->withHighlightStartTag('<mark>')
    ->withHighlightEndTag('</mark>')
    ->withEnableCrop()
    ->withCropLength(150)
    ->withCropMarker('...');

$result = $formatter->format($text, $query, $options);
echo $result->getFormattedText();

Advanced Usage

Custom Tokenizer

Implement TokenizerInterface for specialized tokenization:

class CustomTokenizer implements TokenizerInterface {
    public function tokenize(string $text): TokenCollection {
        // Your custom tokenization logic
    }

    public function matches(Token $token, TokenCollection $tokens): bool {
        // Your custom logic for checking if a token is a match
    }
}

Pre-highlighted Text Cropping

When you already have highlighted text that needs cropping:

$cropper = new \Loupe\Matcher\Formatting\Cropper(
    cropLength: 50,
    cropMarker: '',
    highlightStartTag: '<em>',
    highlightEndTag: '</em>'
);

// "...text with <em>highlighted</em> terms."
echo $cropper->cropHighlightedText('Long text with <em>highlighted</em> terms.');

Using Pre-calculated Matches

When you already have a TokenCollection of matches (e.g., from a previous search operation or external source), you can format text directly without re-calculating matches. This approach is useful when your search engine already provides match information or you want to cache match results for performance.

// Assume you already have matches from somewhere else
$existingMatches = new TokenCollection(/* ... */);

// Set up the tokenizer, matcher, and formatter as usual
$tokenizer = new Tokenizer();
$matcher = new Matcher($tokenizer);
$formatter = new Formatter($matcher);
$options = (new FormatterOptions())
    ->withEnableHighlight()
    ->withEnableCrop()
    ->withCropLength(100);

// Format using the existing matches - no duplicate processing
$result = $formatter->format($text, $query, $options, matches: $existingMatches);
echo $result->getFormattedText();

统计信息

  • 总下载量: 71.73k
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 10
  • 点击次数: 1
  • 依赖项目数: 2
  • 推荐数: 0

GitHub 信息

  • Stars: 10
  • Watchers: 1
  • Forks: 1
  • 开发语言: PHP

其他信息

  • 授权协议: MIT
  • 更新时间: 2025-04-23