ahmedghanem00/tesseract-ocr
最新稳定版本:1.0.11
Composer 安装命令:
composer require ahmedghanem00/tesseract-ocr
包简介
A PHP wrapper for Tesseract-OCR binary
README 文档
README
A PHP wrapper for Tesseract-OCR binary.
Originally inspired from ddeboer/tesseract with added features + some Improvements.
Installation
$ composer require ahmedghanem00/tesseract-ocr
Usage
if the tesseract is added to your path, You can just do:
$tesseract = new \ahmedghanem00\TesseractOCR\Tesseract();
Otherwise, You can do:
$tesseract = new \ahmedghanem00\TesseractOCR\Tesseract("path/to/binary/location"); # OR, If you already have an initiated instance $tesseract->setBinaryPath("path/to/binary/location");
To specify the tesseract process timeout:
$tesseract = new \ahmedghanem00\TesseractOCR\Tesseract(processTimeout: 3); # OR $tesseract->setProcessTimeout(2.5);
To specify a custom tessdata-dir:
$tesseract->setTessDataDirPath("path/to/data/dir")
To reset tessdata-dir to default:
$tesseract->resetTessDataDirPath();
To get version of the binary:
$version = $tesseract->getVersion();
To get all the supported languages:
$languages = $tesseract->getSupportedLanguages();
To OCR an Image:
$result = $tesseract->recognize("test.png"); ## OR $result = $tesseract->recognize("https://example.com/test.png"); ## etc.
Thanks to the Intervention/image package. The recognize method can accept different sources for an image:
- Path of the image in filesystem.
- URL of an image (allow_url_fopen must be enabled).
- Binary image data.
- Data-URL encoded image data.
- Base64 encoded image data.
- PHP resource of type gd
- Imagick instance
- Intervention\Image\Image instance
- SplFileInfo instance (To handle Laravel file uploads via Symfony\Component\HttpFoundation\File\UploadedFile)
To Specify the language(s):
$result = $tesseract->recognize("test.png", langs: ["eng", "ara"]);
To specify the Page-Segmentation-Model (PSM):
use ahmedghanem00\TesseractOCR\Enum\PSM; # using PSM enum $result = $tesseract->recognize("test.png", psm: PSM::SINGLE_BLOCK); # OR by using id directly $result = $tesseract->recognize("test.png", psm: 3);
To specify the OCR-Engine-Mode (OEM):
use ahmedghanem00\TesseractOCR\Enum\OEM; # using OEM enum $result = $tesseract->recognize("test.png", oem: OEM::LEGACY_WITH_LSTM); # OR by using id directly $result = $tesseract->recognize("test.png", oem: 3);
To specify the DPI of the input image:
$result = $tesseract->recognize("test.png", dpi: 200);
To make the recognize method output the result as a searchable PDF instead of raw text:
$pdfBinaryData = $tesseract->recognize("test.png", outputAsPDF: true); file_put_contents("result.pdf", $pdfBinaryData)
To specify words-file or patterns-file:
$result = $tesseract->recognize("test.png", wordsFilePath: "/path/to/file"); # OR $result = $tesseract->recognize("test.png", patternsFilePath: "/path/to/file");
To set a config parameters:
use ahmedghanem00\TesseractOCR\ConfigBag; $config = ConfigBag::new() ->setParameter("tessedit_char_whitelist", "abcrety") ->setParameter("textord_pitch_range", 3); $result = $tesseract->recognize("test.png", config: $config);
You can also run tesseract --print-parameters to see the list of available config parameters.
Licence
Package is licensed under the MIT License. For more info, You can take a look at the License File.
统计信息
- 总下载量: 58
- 月度下载量: 0
- 日度下载量: 0
- 收藏数: 0
- 点击次数: 1
- 依赖项目数: 0
- 推荐数: 0
其他信息
- 授权协议: MIT
- 更新时间: 2023-06-01