tranvantri/pdf-to-html
最新稳定版本:1.0.2
Composer 安装命令:
composer require tranvantri/pdf-to-html
包简介
This PHP class can convert your pdf files to html using poppler-utils.
README 文档
README
This PHP class can convert your pdf files to html using poppler-utils.
Thanks
Big thanks Mochamad Gufron (mgufrone)! I did a packet based on its package (https://github.com/mgufrone/pdf-to-html).
Important Notes
Please see how to use below.
Installation
composer require tranvantri/pdf-to-html
Requirements
1. Install Poppler-Utils
Debian/Ubuntu
sudo apt-get install poppler-utils
Mac OS X
brew install poppler
Windows
For those who need this package in windows, there is a way. First download poppler-utils for windows here http://blog.alivate.com.au/poppler-windows/. And download the latest binary.
After download it, extract it.
2. We need to know where is utilities
Debian/Ubuntu
$ whereis pdftohtml pdftohtml: /usr/bin/pdftohtml $ whereis pdfinfo pdfinfo: /usr/bin/pdfinfo
Mac OS X
$ which pdfinfo /usr/local/bin/pdfinfo $ which pdftohtml /usr/local/bin/pdfinfo
Windows
Go in extracted directory. There will be a directory called bin. We will need this one.
3. PHP Configuration with shell access enabled
Usage
Example:
<?php // if you are using composer, just use this include 'vendor/autoload.php'; // initiate $pdf = new \TvT\PdfToHtml\Pdf('test.pdf', [ 'pdftohtml_path' => '/usr/bin/pdftohtml', 'pdfinfo_path' => '/usr/bin/pdfinfo' ]); // example for windows // $pdf = new \TvT\PdfToHtml\Pdf('test.pdf', [ // 'pdftohtml_path' => '/path/to/poppler/bin/pdftohtml.exe', // 'pdfinfo_path' => '/path/to/poppler/bin/pdfinfo.exe' // ]); // get pdf info $pdfInfo = $pdf->getInfo(); // get count pages $countPages = $pdf->countPages(); // get content from one page $contentFirstPage = $pdf->getHtml()->getPage(1); // get content from all pages and loop for they foreach ($pdf->getHtml()->getAllPages() as $page) { echo $page . '<br/>'; }
Full list settings:
<?php $fullSettings = [ 'pdftohtml_path' => '/usr/bin/pdftohtml', // path to pdftohtml 'pdfinfo_path' => '/usr/bin/pdfinfo', // path to pdfinfo 'generate' => [ // settings for generating html 'singlePage' => false, // we want separate pages 'imageJpeg' => false, // we want png image 'ignoreImages' => false, // we need images 'zoom' => 1.5, // scale pdf 'noFrames' => false, // we want separate pages ], 'clearAfter' => true, // auto clear output dir (if removeOutputDir==false then output dir will remain) 'removeOutputDir' => true, // remove output dir 'outputDir' => '/tmp/'.uniqid(), // output dir 'html' => [ // settings for processing html 'inlineCss' => true, // replaces css classes to inline css rules 'inlineImages' => true, // looks for images in html and replaces the src attribute to base64 hash 'onlyContent' => true, // takes from html body content only ] ]
统计信息
- 总下载量: 342
- 月度下载量: 0
- 日度下载量: 0
- 收藏数: 0
- 点击次数: 1
- 依赖项目数: 0
- 推荐数: 0
其他信息
- 授权协议: MIT
- 更新时间: 2023-12-13