content-extract/content-processor
Robust PHP library for batch document processing. Extracts content from PDFs/text and generates structured JSON according to user-defined schemas. Now with semantic structuring, OCR support for scanned PDFs, text normalization, and alias-driven field matching. Production-ready, secure, zero unnecess
时间:2026-04-19 15:27
rembish/text-at-any-cost
Extract plain text from common document formats: DOC, PDF, PPT, RTF, DOCX, ODT, RAR
时间:2026-02-17 14:30
nojimage/twitter-text-php
A library of PHP classes that provide auto-linking and extraction of usernames, lists, hashtags and URLs from tweets.
时间:2026-01-04 18:09
smalot/pdfparser
Pdf parser library. Can read and extract information from pdf file.
时间:2026-01-04 10:04
ipwsystems/rtftools
Library used to extract raw text from an RTF file
时间:2026-01-04 08:25
hello-solucoes/pdf-to-text
Extract text from a pdf
时间:2026-01-04 05:19
kreuzberg/kreuzberg
High-performance document intelligence for PHP. Extract text, metadata, and structured information from PDFs, Office documents, images, and 75 formats. Powered by Rust core for 10-50x speed improvements.
时间:2025-12-27 13:40
ediazaro/receipt-scanner
Use OpenAI to extract structured receipt and invoice data from Text, Html, Images and PDFs.
时间:2025-11-13 22:53
denisdeejay/pdfparser
(fork of smalot/pdfparser) Pdf parser library. Can read and extract information from pdf file.
时间:2025-09-16 07:44
ottosmops/office2text
Extract text from Microsoft Office (docx, pptx, xlsx) and LibreOffice (odt, odp, ods) documents using PHP and ZipArchive.
时间:2025-09-01 11:22
mostlyserious/craft-text-extractor
A tool to extract text from documents.
时间:2025-04-30 20:48
aleksanm/excel2txt
Extract text from MS Excel xlsx file using builtin tool: /usr/bin/ssconvert
时间:2025-04-10 12:58
aleksanm/docx2txt
Extract text from docx using docx2txt
时间:2025-04-09 12:49
ledsquare/pdfparser
Pdf parser library. Can read and extract information from pdf file.
时间:2025-01-17 15:22