johnroyer/crawler-php
最新稳定版本:0.3.6
Composer 安装命令:
composer require johnroyer/crawler-php
包简介
crawler implement in PHP
README 文档
README
Web crawler in simple.
Note: this is a site project. Do NOT use in production.
Usage
Create handler from AbstractHandler, and set domain which handler should handles:
class MyHandler extends \Zeroplex\Crawler\Handler\AbstractHandler { public function getDomain(): string { return 'test.com'; } public function shouldFetch(\Psr\Http\Message\RequestInterface $request): bool { if (1 === preg_match('/(css|js|jpg|png|gif)$/', $request->getUri())) { // ignore css, js and common images return false; } return true; } public function handle(\Psr\Http\Message\ResponseInterface $response): void { // get content using $response->getBody()->getContents() } }
Then setup crawler and run:
$crawler = new \Zeroplex\Crawler\Crawler(); $crawler->setDelay(0) ->setTimeout(3) ->setFollowRedirect(true) ->setUserAgent('Mozilla/5.0 (platform; rv:geckoversion) Gecko/geckotrail Firefox/100.1'); $crawler->addHandler(new BlogHandler()); // URL to start $crawler->run('https://test.com');
Extends
For example, implement URL queue by Predis.
composer install:
composer require predis/predis
Implement UrlQueueInterface:
class RedisQueue implements Zeroplex\Crawler\UrlQueue\UrlQueueInterface { private $redis; public function __construct(string $host, int $port) { } public function push(string $url): void { $this->redis->lpush($url); } public function pop(): string { return $this->redis->lpop(); } // and so on }
统计信息
- 总下载量: 65
- 月度下载量: 0
- 日度下载量: 0
- 收藏数: 3
- 点击次数: 0
- 依赖项目数: 0
- 推荐数: 0
其他信息
- 授权协议: MIT
- 更新时间: 2023-01-12