承接 2upmedia/scout 相关项目开发

从需求分析到上线部署,全程专人跟进,保证项目质量与交付效率

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

2upmedia/scout

最新稳定版本:0.2.1

Composer 安装命令:

composer require 2upmedia/scout

包简介

Flexible, structured scraping

README 文档

README

Build Status Scrutinizer Quality Score Code Coverage Latest Stable Version Dependency Status

Scout is a easy-to-use and fast scraper that uses your knowledge of PHP to transform data the way you want without having to learn another transformation language such as XSLT.

This is currently in stable beta and I encourage submitting tickets for bug, feedback, and ideas.

Currently Supported

  • Document types: HTML and XML
  • Querying: XPath
  • PHP 5.4+, including PHP 7!

Planned for the future

  • Save to a JSON, CSV, and XML file
  • Support for querying with CSS selectors
  • Support for querying JSON
  • Ability to persist information and track atomic changes

Possible Uses

  • Track search rankings
  • Spy competitors websites
  • Scrape coupon websites
  • Scrape websites for your own aggregation website
  • Migrate data from large static websites to import into a CMS
  • Get a list of jobs you're interested in from a wide range of job boards online
  • Transform XML responses from your webservice into JSON
  • Anything else that involves transforming XML/HTML to a data structure you want.

Consulting

For consulting, contact jorge@2upmedia.com

Examples

<?php

$queryHandler = new Xpath(Html::parseDocument(file_get_contents('./tests/fixtures/header-and-table.html')));

$titlesAndPrices = (new DataPoint())->setQueryHandler($queryHandler);

$data = $titlesAndPrices
    ->setCollection('//table/tr')
    ->forKey('title')->set('./td[1]') // each tr is used as a context, so the key selectors should use "." to be relative to it
    ->forKey('price')->set('./td[2]')
    ->getData();
/*
    array (
      0 => 
      array (
        'title' => 'Title #1',
        'price' => '$10.00',
      ),
      1 => 
      array (
        'title' => 'Title #2',
        'price' => '$23.20',
      ),
      2 => 
      array (
        'title' => 'Title #3',
        'price' => '$1.00',
      ),
      3 => 
      array (
        'title' => 'Title #4',
        'price' => '$5.00',
      ),
    )
*/

For more information on how to use the API please have a look at the integration test.

XPath Primer

Currently XPath is used as the query language. XPath is simple to use after a little bit of practice.

The core of XPath is the "path". If you understand file paths and URLs, you understand half of XPath already.

Read up on the syntax: http://www.w3schools.com/xpath/xpath_syntax.asp. Then have a look at the XPath Primer example.

统计信息

  • 总下载量: 36
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 3
  • 点击次数: 0
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 3
  • Watchers: 3
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: BSD-2-Clause
  • 更新时间: 2015-07-07