承接 monteduro/laravel-datafast-aicrawl 相关项目开发

从需求分析到上线部署,全程专人跟进,保证项目质量与交付效率

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

monteduro/laravel-datafast-aicrawl

Composer 安装命令:

composer require monteduro/laravel-datafast-aicrawl

包简介

Server-side AI crawler & bot traffic tracking for Laravel, reporting to DataFast. Unofficial PHP port of @datafast/ai-crawl.

README 文档

README

Laravel DataFast AI Crawl — bot traffic tracking

Latest Version on Packagist Total Downloads License

Server-side AI crawler & bot traffic tracking for Laravel, reporting to DataFast.

See when AI assistants (ChatGPT, Claude, Perplexity…), search crawlers (Googlebot, Bingbot…) and model-training bots (GPTBot, ClaudeBot, CCBot…) request pages on your site. These bots fetch raw HTML and skip frontend JavaScript, so DataFast's browser script never sees them — this package detects them in your backend.

Unofficial PHP port of @datafast/ai-crawl. Not affiliated with DataFast. The ingestion endpoint it posts to is undocumented and may change.

Install

composer require monteduro/laravel-datafast-aicrawl

The service provider is auto-discovered and the middleware is auto-registered on the web group. Two things to set in .env:

# Required — your DataFast website id
DATAFAST_WEBSITE_ID=dfid_xxxxxxxx

# Strongly recommended — run the tracking HTTP call on a real queue
# so it never adds latency to your pages (defaults to the sync driver).
DATAFAST_AICRAWL_QUEUE=default
DATAFAST_AICRAWL_CONNECTION=redis

That's the whole setup. Requires PHP 8.2+ and Laravel 11/12.

About the queue

Every detected crawler request triggers one outbound HTTP POST to DataFast. That call is dispatched as a best-effort queued job (no retries, errors swallowed) and the middleware runs after the response is sent (terminate()), so even on the sync driver the visitor is never blocked. Point DATAFAST_AICRAWL_CONNECTION/QUEUE at a real worker for best throughput.

How it works

For each GET/HEAD request the middleware classifies the User-Agent against a built-in catalog of ~80 crawlers, skips non-page traffic (static extensions, /api, /_next, Sec-Fetch-Dest sub-resources, …), and if it's a real crawler page hit, queues the event to DataFast.

Payload sent: websiteId, domain, href, referrer, and an ai object with provider, agent, category (answer_fetch / search_index / training / ai_crawler), userAgent, ip, statusCode.

Configuration

Optional — publish only to customize defaults:

php artisan vendor:publish --tag=datafast-aicrawl-config
Env var Default Purpose
DATAFAST_WEBSITE_ID Required. Your DataFast website id.
DATAFAST_AICRAWL_QUEUE default Queue name for the tracking job.
DATAFAST_AICRAWL_CONNECTION default Queue connection.
DATAFAST_AICRAWL_ENABLED true Master on/off switch.
DATAFAST_AICRAWL_ENDPOINT https://datafa.st/api/ai-crawls Ingestion endpoint.
DATAFAST_AICRAWL_DOMAIN request host Override reported domain.
DATAFAST_AICRAWL_TIMEOUT 5 HTTP timeout (seconds).
DATAFAST_AICRAWL_AUTO_MIDDLEWARE true Auto-append middleware to a group.
DATAFAST_AICRAWL_MIDDLEWARE_GROUP web Group to append the middleware to.

Disable crawler categories or tweak ignored paths/extensions in config/datafast-aicrawl.php.

To register the middleware manually instead, set DATAFAST_AICRAWL_AUTO_MIDDLEWARE=false and attach \Monteduro\DataFastAiCrawl\Http\Middleware\TrackAiCrawler::class yourself.

Keeping the crawler list fresh

The catalog data lives in resources/crawlers.php, generated from the upstream npm bundle (not fetched at runtime). You normally never touch it — a GitHub Action runs weekly, regenerates it from @datafast/ai-crawl, runs the tests, and opens a PR when something changed. Merging that PR auto-tags the next patch version, which Packagist publishes.

To run the sync manually:

composer sync-catalog:check   # show drift vs upstream (exits non-zero if outdated)
composer sync-catalog         # regenerate resources/crawlers.php

The weekly workflow needs Settings → Actions → General → "Allow GitHub Actions to create and approve pull requests" enabled on the repo.

Testing

composer install && composer test

Credits & License

Crawler catalog and detection logic ported from @datafast/ai-crawl (MIT). MIT licensed — see LICENSE.

统计信息

  • 总下载量: 0
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 0
  • 点击次数: 3
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: MIT
  • 更新时间: 2026-06-18