vipnytt/robotstxtparser 问题修复 & 功能扩展

解决BUG、新增功能、兼容多环境部署,快速响应你的开发需求

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

vipnytt/robotstxtparser

最新稳定版本:v2.1.0

Composer 安装命令:

composer require vipnytt/robotstxtparser

包简介

Robots.txt parsing library, with full support for every directive and specification.

README 文档

README

Build Status Scrutinizer Code Quality Maintainability Test Coverage License Packagist Gitter

Robots.txt parser

An easy to use, extensible robots.txt parser library with full support for literally every directive and specification on the Internet.

SensioLabsInsight

Usage cases:

  • Permission checks
  • Fetch crawler rules
  • Sitemap discovery
  • Host preference
  • Dynamic URL parameter discovery
  • robots.txt rendering

Advantages

(compared to most other robots.txt libraries)

  • Automatic robots.txt download. (optional)
  • Integrated Caching system. (optional)
  • Crawl Delay handler.
  • Documentation available.
  • Support for literally every single directive, from every specification.
  • HTTP Status code handler, according to Google's spec.
  • Dedicated User-Agent parser and group determiner library, for maximum accuracy.
  • Provides additional data like preferred host, dynamic URL parameters, Sitemap locations, etc.
  • Protocols supported: HTTP, HTTPS, FTP, SFTP and FTP/S.

Requirements:

Installation

The recommended way to install the robots.txt parser is through Composer. Add this to your composer.json file:

{
  "require": {
    "vipnytt/robotstxtparser": "^2.1"
  }
}

Then run: php composer update

Getting started

Basic usage example

<?php
$client = new vipnytt\RobotsTxtParser\UriClient('http://example.com');

if ($client->userAgent('MyBot')->isAllowed('http://example.com/somepage.html')) {
    // Access is granted
}
if ($client->userAgent('MyBot')->isDisallowed('http://example.com/admin')) {
    // Access is denied
}

A small excerpt of basic methods

<?php
// Syntax: $baseUri, [$statusCode:int|null], [$robotsTxtContent:string], [$encoding:string], [$byteLimit:int|null]
$client = new vipnytt\RobotsTxtParser\TxtClient('http://example.com', 200, $robotsTxtContent);

// Permission checks
$allowed = $client->userAgent('MyBot')->isAllowed('http://example.com/somepage.html'); // bool
$denied = $client->userAgent('MyBot')->isDisallowed('http://example.com/admin'); // bool

// Crawl delay rules
$crawlDelay = $client->userAgent('MyBot')->crawlDelay()->getValue(); // float | int

// Dynamic URL parameters
$cleanParam = $client->cleanParam()->export(); // array

// Preferred host
$host = $client->host()->export(); // string | null
$host = $client->host()->getWithUriFallback(); // string
$host = $client->host()->isPreferred(); // bool

// XML Sitemap locations
$host = $client->sitemap()->export(); // array

The above is just a taste the basics, a whole bunch of more advanced and/or specialized methods are available for almost any purpose. Visit the cheat-sheet for the technical details.

Visit the Documentation for more information.

Directives

Specifications

统计信息

  • 总下载量: 726.83k
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 27
  • 点击次数: 2
  • 依赖项目数: 7
  • 推荐数: 1

GitHub 信息

  • Stars: 26
  • Watchers: 2
  • Forks: 6
  • 开发语言: PHP

其他信息

  • 授权协议: MIT
  • 更新时间: 2016-04-08