定制 skuola/pdf-text-parser 二次开发

按需修改功能、优化性能、对接业务系统,提供一站式技术支持

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

skuola/pdf-text-parser

最新稳定版本:v0.4.2

Composer 安装命令:

composer require skuola/pdf-text-parser

包简介

Library to parse XML resulting from pdftotext

README 文档

README

Build Status Code Climate SensioLabsInsight

This library is a parser for XML text files obtained via pdftotext

You can install it using composer require skuola/pdf-text-parser

Suppose you're just converted a pdf file, getting some text like the following:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
<doc>
  <page width="595.200000" height="841.800000">
    <word xMin="56.640000" yMin="59.770680" xMax="118.022880" yMax="72.406680">Lorem</word>
    <word xMin="121.209960" yMin="59.770680" xMax="176.485440" yMax="72.406680">ipsum</word>
  </page>
</doc>
</body>
</html>

The above text is the result of a command like pdftotext -htmlmeta -bbox-layout yourfile.pdf -.

You can use this library as follows:

<?php

require_once 'vendor/autoload.php';

$data = '...';  // the text above

$converter = new \Skuola\PdfTextParser\Converter($data);
// get as plain text...
$txt = $converter->getAsText();
// ...or get as HTML
$html = $converter->getAsHtml();

As alternate mode, you can save your HTML file and pass it to library:

<?php

require_once 'vendor/autoload.php';

$path = '...';  // a path containing the same text as previous example

$converter = new \Skuola\PdfTextParser\Converter(null, $path);
$html = $converter->getAsHtml();

Generated HTML is composed by a <h2> tag or an <p> tag for each document line (depending on the line being a title or not).

More informations to come...

统计信息

  • 总下载量: 11.74k
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 3
  • 点击次数: 2
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 2
  • Watchers: 2
  • Forks: 2
  • 开发语言: HTML

其他信息

  • 授权协议: MIT
  • 更新时间: 2018-05-24