dimabdc/php-fast-simple-html-dom-parser 问题修复 & 功能扩展

解决BUG、新增功能、兼容多环境部署,快速响应你的开发需求

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

dimabdc/php-fast-simple-html-dom-parser

最新稳定版本:1.4

Composer 安装命令:

composer require dimabdc/php-fast-simple-html-dom-parser

包简介

PHP Fast Simple HTML DOM parser.

README 文档

README

Build Status Total Downloads Latest Stable Version License

PHP Fast Simple HTML DOM Parser - fast and low mamory usage HTML DOM Parser with syntax like PHP Simple HTML DOM Parser

Установка

Для установки выполните команду:

composer require dimabdc/php-fast-simple-html-dom-parser

Быстрый старт

require_once "vendor/autoload.php";
use FastSimpleHTMLDom\Document;

// Create DOM from URL
$html = Document::file_get_html('https://habrahabr.ru/interesting/');

// Find all post blocks
$post = [];
foreach($html->find('div.post') as $post) {
    $item['title']   = $post->find('h1.title', 0)->plaintext;
    $item['hubs']    = $post->find('div.hubs', 0)->plaintext;
    $item['content'] = $post->find('div.content', 0)->plaintext;
    $post[] = $item;
}

print_r($post);

Как создать HTML DOM объект

// Create a DOM object from a string
$html = new Document('<html><body>Hello!</body></html>');

// Create a DOM object from a string
$html = new Document();
$html->loadHtml('<html><body>Hello!</body></html>');

// Create a DOM object from a HTML file
$html = new Document();
$html->loadHtmlFile('test.htm');

// Create a DOM object from a URL
$html = new Document(file_get_contents('https://habrahabr.ru/interesting/'));

Как искать HTML DOM элементы?

Основа

// Find all anchors, returns a array of element objects
$ret = $html->find('a');

// Find (N)th anchor, returns element object or null if not found (zero based)
$ret = $html->find('a', 0);

// Find lastest anchor, returns element object or null if not found (zero based)
$ret = $html->find('a', -1); 

// Find all <div> with the id attribute
$ret = $html->find('div[id]');

// Find all <div> which attribute id=foo
$ret = $html->find('div[id=foo]'); 

Часто используемое

// Find all element which id=foo
$ret = $html->find('#foo');

// Find all element which class=foo
$ret = $html->find('.foo');

// Find all element has attribute id
$ret = $html->find('*[id]'); 

// Find all anchors and images 
$ret = $html->find('a, img'); 

// Find all anchors and images with the "title" attribute
$ret = $html->find('a[title], img[title]');

Слекторы потомков

// Find all <li> in <ul> 
$es = $html->find('ul li');

// Find Nested <div> tags
$es = $html->find('div div div'); 

// Find all <td> in <table> which class=hello 
$es = $html->find('table.hello td');

// Find all td tags with attribite align=center in table tags 
$es = $html->find('table td[align=center]');

Вложенные селекторы

// Find all <li> in <ul> 
foreach($html->find('ul') as $ul) 
{
       foreach($ul->find('li') as $li) 
       {
             // do something...
       }
}

// Find first <li> in first <ul> 
$e = $html->find('ul', 0)->find('li', 0);

Фильтр атрибутов

Filter Description
[attribute] Matches elements that have the specified attribute.
[!attribute] Matches elements that don't have the specified attribute.
[attribute=value] Matches elements that have the specified attribute with a certain value.
[attribute!=value] Matches elements that don't have the specified attribute with a certain value.
[attribute^=value] Matches elements that have the specified attribute and it starts with a certain value.
[attribute$=value] Matches elements that have the specified attribute and it ends with a certain value.
[attribute*=value] Matches elements that have the specified attribute and it contains a certain value.

Текст, комментарии

// Find all text blocks 
$es = $html->find('text');

// Find all comment (<!--...-->) blocks 
$es = $html->find('comment');

Доступ к атрибутам

Получение, установка и удаление атрибутов

// Get a attribute ( If the attribute is non-value attribute (eg. checked, selected...), it will returns true or false)
$value = $e->href;

// Set a attribute(If the attribute is non-value attribute (eg. checked, selected...), set it's value as true or false)
$e->href = 'my link';

// Remove a attribute, set it's value as null! 
$e->href = null;

// Determine whether a attribute exist? 
if(isset($e->href)) 
        echo 'href exist!';

"Магические" атрибуты

// Example
$html = str_get_html('<div>foo <b>bar</b></div>'); 
$e = $html->find('div', 0);

echo $e->tag; // Returns: "div"
echo $e->outertext; // Returns: "<div>foo <b>bar</b></div>"
echo $e->innertext; // Returns: "foo <b>bar</b>"
echo $e->plaintext; // Returns: "foo bar"
Attribute Name Usage
$e->tag Read or write the tag name of element.
$e->outertext Read or write the outer HTML text of element.
$e->innertext Read or write the inner HTML text of element.
$e->plaintext Read or write the plain text of element.

Трюки

// Extract contents from HTML 
echo $html->plaintext;

// Wrap a element
$e->outertext = '<div class="wrap">' . $e->outertext . '<div>';

// Remove a element, set it's outertext as an empty string 
$e->outertext = '';

// Append a element
$e->outertext = $e->outertext . '<div>foo<div>';

// Insert a element
$e->outertext = '<div>foo<div>' . $e->outertext;

Прогон по DOM-дереву

// If you are not so familiar with HTML DOM, check this link to learn more... 

// Example
echo $html->find('#div1', 0)->children(1)->children(1)->children(2)->id;
// or 
echo $html->getElementById('div1')->childNodes(1)->childNodes(1)->childNodes(2)->getAttribute('id');
Method Description
mixed $e->children([int $index]) Returns the Nth child object if index is set, otherwise return an array of children.
Element $e->parent() Returns the parent of element.
Element $e->first_child() Returns the first child of element, or null if not found.
Element $e->last_child() Returns the last child of element, or null if not found.
Element $e->next_sibling() Returns the next sibling of element, or null if not found.
Element $e->prev_sibling() Returns the previous sibling of element, or null if not found.

API-справочник

Методы и свойства DOM

Name Description
void __construct([string Element $html])
string plaintext Returns the contents extracted from HTML.
mixed find (string $selector [, int $index]) Find elements by the CSS selector. Returns the Nth element object if index is set, otherwise return an array of object.

Методы и свойства элементов

Name Description
string [attribute] Read or write element's attribure value.
string tag Read or write the tag name of element.
string outertext Read or write the outer HTML text of element.
string innertext Read or write the inner HTML text of element.
string plaintext Read or write the plain text of element.
mixed find (string $selector [, int $index]) Find children by the CSS selector. Returns the Nth element object if index is set, otherwise, return an array of object.

Прогон по дереву DOM

Name Description
mixed $e->children([int $index]) Returns the Nth child object if index is set, otherwise return an array of children.
element $e->parent() Returns the parent of element.
element $e->first_child() Returns the first child of element, or null if not found.
element $e->last_child() Returns the last child of element, or null if not found.
element $e->next_sibling() Returns the next sibling of element, or null if not found.
element $e->prev_sibling() Returns the previous sibling of element, or null if not found.

camelCase эквиваленты

string $e->getAttribute($name)
string $e->attribute

void $e->setAttribute($name, $value)
void $value = $e->attribute

bool $e->hasAttribute($name)
bool isset($e->attribute)

void $e->removeAttribute($name)
void $e->attribute = null

element $e->getElementById($id)
mixed $e->find("#$id", 0)

mixed $e->getElementsById($id [,$index])
mixed $e->find("#$id" [, int $index])

element $e->getElementByTagName($name)
mixed $e->find($name, 0)

mixed $e->getElementsByTagName($name [, $index])
mixed $e->find($name [, int $index])

element $e->parentNode()
element $e->parent()

mixed $e->childNodes([$index])
mixed $e->children([int $index])

element $e->firstChild()
element $e->first_child()

element $e->lastChild()
element $e->last_child()

element $e->nextSibling()
element $e->next_sibling()

element $e->previousSibling()
element $e->prev_sibling()

统计信息

  • 总下载量: 50.37k
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 93
  • 点击次数: 1
  • 依赖项目数: 1
  • 推荐数: 0

GitHub 信息

  • Stars: 90
  • Watchers: 8
  • Forks: 29
  • 开发语言: PHP

其他信息

  • 授权协议: MIT
  • 更新时间: 2016-02-02