承接 lobotomised/laravel-autocrawler 相关项目开发

从需求分析到上线部署,全程专人跟进,保证项目质量与交付效率

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

lobotomised/laravel-autocrawler

最新稳定版本:1.3.0

Composer 安装命令:

composer require lobotomised/laravel-autocrawler

包简介

A tool to crawl your own laravel installation checking your HTTP status codes

README 文档

README

GitHub Tests Action Status Total Downloads Latest Stable Version License

Using this package you can check if your application have broken links.

php artisan crawl
200 OK - http://myapp.test/ 
200 OK - http://myapp.test/login found on http://myapp.test/
200 OK - http://myapp.test/register found on http://myapp.test/
301 301 Moved Permanently - http://myapp.test/homepage found on http://myapp.test/register
404 Not Found - http://myapp.test/brokenlink found on http://myapp.test/register
200 OK - http://myapp.test/features found on http://myapp.test/


Crawl finished

Results:
Status 200: 4 founds
Status 301: 1 found
Status 404: 1 found

Installation

This package can be installed via Composer:

composer require --dev lobotomised/laravel-autocrawler

When crawling your site, it will automatically detect the url your application is using. If instead it scan http://localhost, check in your .env you properly configure the APP_URL variable

APP_URL="http://myapp.test"

Usage

Crawl a specific url

By default, the crawler will crawl the URL from your current laravel installation. You can force the url with the --url option:

php artisan crawl --url=http://myapp.test/my-page

Concurrent connection

The crawler run with 10 concurrent connections to speed up the crawling process. You can change that by passing the --concurrency option:

php artisan crawl --concurrency=5

Timeout

The request timeout is by default 30 seconds. Use the --timeout to change this value

php artisan crawl --timeout=10

Ignore robots.txt

By default, the crawler respect the robots.txt. These rules can be ignored with the --ignore-robots option:

php artisan crawl --ignore-robots

External link

When the crawler find an external link, it will check this link. It can be deactivated with the --ignore-external-links option:

php artisan crawl --ignore-external-links

Log non-2xx or non-3xx status code

By default, the crawler will only in your console. You can log all non-2xx or non 3xx status code to a file with the --output option. Result will be store in storage/autocrawler/output.txt

php artisan crawl --output

The output.txt will look like that:

403 Forbidden - http://myapp.test/dashboard found on http://myapp.test/home
404 Not Found - http://myapp.test/brokenlink found on http://myapp.test/register

Fail when non-2xx or non-3xx are found

By default, the command exit codes is 0. You can change it to 1 to indicate that the command has failed with the --fail-on-error

php artisan crawl --fail-on-error

Launch the robot interactively

Eventually, you may configure the crawler interactively by using the --interactive option:

php artisan crawl --interactive

Working with GitHub actions

To execute the crawler you first need to start a web server. You can choose to install apache or nginx. Here is an example using the php build-in webserver

If the crawl found some non-2xx or non-3xx response, the action will fail, and the result will be store as an artifacts of the Action.

steps:
  - uses: actions/checkout@v3
  - name: Prepare The Environment
    run: cp .env.example .env
  - name: Install Composer Dependencies
    run: composer install
  - name: Generate Application Key
    run: php artisan key:generate
  - name: Install npm Dependencies
    run: npm ci
  - name: Compile assets
    run: npm run build

  - name: Start php build-in webserver
    run: (php artisan serve &) || /bin/true

  - name: Crawl website
    run: php artisan crawl --url=http://localhost:8000/ --fail-on-error --output
  
  - name: Upload artifacts
    if: failure()
    uses: actions/upload-artifact@master
    with:
      name: Autocrawler
      path: ./storage/autocrawler

Documentation

All commands and informations are available with the command:

php artisan crawl --help

Alternatives

This package is heavily inspire by spatie/http-status-check, but instead of being a project dependency, it is a global installation

Testing

First we need to start the included node http server in a separate terminal.

make start

Then to run the tests:

make test

Changelog

Please see CHANGELOG for more information on what has changed recently.

License

The MIT License (MIT). Please see License File for more information.

统计信息

  • 总下载量: 39.89k
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 2
  • 点击次数: 1
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: MIT
  • 更新时间: 2022-08-16