README

ElephScraper is a lightweight and PHP-native web scraping toolkit built using Guzzle and Symfony DomCrawler. It provides a clean and powerful interface to extract HTML content, metadata, and structured data from any website.

Fast. Clean. Eleph-style scraping. 🐘⚡

🚀 Features

✅ Extract metadata: title, description, keywords, author, charset, canonical, and more
✅ Supports Open Graph, Twitter Card, CSRF tokens, and HTTP-equiv headers
✅ Extract headings, paragraphs, images, lists, and links
✅ Powerful filter() method with support for class/ID/tag-based selectors
✅ Return raw HTML or clean plain text
✅ Clean return types: string, array, or associative array
✅ Built with Guzzle + Symfony DomCrawler + CssSelector

📦 Installation

Install via Composer:

composer require riodevnet/elephscraper

Requires PHP 7.4 or newer.

🛠️ Basic Usage

<?php

require_once __DIR__ . '/vendor/autoload.php';

use Riodevnet\Elephscraper\ElephScraper;

$scraper = new ElephScraper("https://example.com");

echo $scraper->title(); // "Welcome to Example.com"
echo $scraper->description(); // "Example site for testing"
print_r($scraper->h1()); // ["Main Title", "News"]
print_r($scraper->openGraph());

🧪 Available Methods

🔹 Page Metadata

$scraper->title();
$scraper->description();
$scraper->keywords();
$scraper->keywordString();
$scraper->charset();
$scraper->canonical();
$scraper->contentType();
$scraper->author();
$scraper->csrfToken();
$scraper->image();

🔹 Open Graph & Twitter Card

$scraper->openGraph();                 // All OG meta
$scraper->openGraph("og:title");      // Specific OG tag

$scraper->twitterCard();              // All Twitter tags
$scraper->twitterCard("twitter:title");

🔹 Headings & Text

$scraper->h1();
$scraper->h2();
$scraper->h3();
$scraper->h4();
$scraper->h5();
$scraper->h6();
$scraper->p();

🔹 Lists

$scraper->ul(); // all <ul><li> text
$scraper->ol(); // all <ol><li> text

🔹 Images

$scraper->images();         // just src URLs
$scraper->imageDetails();   // src, alt, title

🔹 Links

$scraper->links();        // just hrefs
$scraper->linkDetails();  // full detail

🔍 Custom DOM Filtering

▸ Example: Filter Single Element

$scraper->filter(
    element: 'div',
    attributes: ['id' => 'main'],
    multiple: false,
    extract: ['.title', '#desc', 'p'],
    returnHtml: false
);

▸ Example: Filter Multiple Elements

$scraper->filter(
    element: 'div',
    attributes: ['class' => 'card'],
    multiple: true,
    extract: ['h2', '.subtitle', '#info'],
    returnHtml: false
);

▸ Example: Return HTML Content

$scraper->filter(
    element: 'section',
    attributes: ['class' => 'hero'],
    returnHtml: true
);

Extract selectors support:

Tag names: h1, p, span, etc.

Class: .className

ID: #idName

Output keys auto-normalized to original selector.

🤝 Contributing

Found a bug? Want to add features? Open an issue or create a pull request!

📄 License

🔗 Related Libraries

💡 Why ElephScraper?

ElephScraper is your dependable PHP elephant — strong, smart, and always ready to extract the right data.

riodevnet/elephscraper

包简介

README 文档