content-extract/content-processor
Robust PHP library for batch document processing. Extracts content from PDFs/text and generates structured JSON according to user-defined schemas. Now with semantic structuring, OCR support for scanned PDFs, text normalization, and alias-driven field matching. Production-ready, secure, zero unnecess
时间:2026-04-19 15:27
onstage2426/fuzor
Dependency-free full-text search for PHP. BM25 ranking, fuzzy and boolean modes, search-as-you-type prefix matching, stopword filtering and Snowball stemming for 62 languages, snippet extraction and result highlighting — one SQLite file, zero infrastructure.
时间:2026-03-21 13:20
jcfrane/pdf-text-extractor
A Laravel PDF text extraction package with multiple strategies (PdfParser, XObject, AWS Textract, Tesseract OCR). Handles Canva-generated PDFs, scanned documents, and other edge cases with automatic fallback.
时间:2026-02-11 09:00
daniel-jorg-schuppelius/php-pdf-toolkit
PHP 8.2+ library for PDF text extraction with automatic reader selection. Supports embedded text and scanned documents via OCR.
时间:2026-01-26 11:55
apache-solr-for-typo3/tika
Apache Tika for TYPO3
时间:2026-01-04 19:00
nojimage/twitter-text-php
A library of PHP classes that provide auto-linking and extraction of usernames, lists, hashtags and URLs from tweets.
时间:2026-01-04 18:09
silverstripe/textextraction
Text Extraction API for SilverStripe CMS (mostly used with 'fulltextsearch' module)
时间:2026-01-04 03:05
keyvan/german-ocr
High-performance German document OCR - Local & Cloud API
时间:2026-01-02 21:53
kreuzberg/kreuzberg
High-performance document intelligence for PHP. Extract text, metadata, and structured information from PDFs, Office documents, images, and 75 formats. Powered by Rust core for 10-50x speed improvements.
时间:2025-12-27 13:40
shibashish/pdf-reader
A comprehensive Laravel package for extracting text, HTML, images, and metadata from PDF files using Poppler utilities.
时间:2025-12-09 09:46
sharpapi/laravel-content-detect-emails
AI Email Detection for Laravel powered by SharpAPI.com
时间:2025-06-16 10:51
puma/libreria
Librería reconoce palabras que comienzan con mayusculas y te devuelve como palabras correctas y tambien extrae numeros del un texto.
时间:2025-05-12 00:44
fathkoc/php-textmagic
A lightweight PHP library for basic text analysis operations like summarization, sentiment analysis, keyword extraction, and classification.
时间:2024-10-04 13:49
joest8/pdfinterpreter
This class is designed to convert multiple PDF files, whether image-based or text-based, into an array of data.The class uses user-defined templates containing regular expressions to control the data extraction process, allowing for customized and flexible output.
时间:2023-11-05 19:09
kalimeromk/rssfeed
Full-Text RSS extraction package for Laravel - converts partial RSS feeds to full content
时间:2023-06-11 14:08