hm-marketing/onix-parser
最新稳定版本:1.9.0
Composer 安装命令:
composer require hm-marketing/onix-parser
包简介
A modern PHP library for parsing ONIX (ONline Information eXchange) XML files
README 文档
README
A modern, object-oriented PHP library for parsing ONIX (ONline Information eXchange) XML files. This library supports both ONIX 3.0 namespaced and non-namespaced XML formats.
Features
- Full ONIX 3.0 XML Support - Complete parsing of ONIX 3.0 namespaced and non-namespaced XML
- ChunkOnixParser - True byte-level resume processing for unlimited file sizes (NEW!)
- Memory-Efficient Streaming - Stream large XML files without memory issues
- Rich Product Information - Extract all product metadata including titles, descriptions, subjects
- Physical Product Details - Page count, dimensions (height, width, thickness), weight with unit conversion
- Language Support - Primary language detection with ISO 639 code mapping
- Publishing Metadata - Publisher, imprint, country/city of publication, edition details
- Product Classification - Product form, product form details with human-readable names
- Availability & Pricing - Detailed availability status and comprehensive price information
- Subject Classification - Full support for CLIL, THEMA, and ScoLOMFR classification schemes
- Images & Media - Extract cover images and other media resources with URL validation
- Collections & Series - Hierarchical collection and series relationships
- Supplier Information - GLN (Global Location Number) and supplier details
- Code Mapping - Extensive ONIX code lists with human-readable translations
- Comprehensive Logging - Detailed logging with configurable levels
- Type Safety - Well-documented code with PHP type hints
- Extensible Architecture - Modular design for easy customization
Requirements
- PHP 7.4 or higher
- DOM and SimpleXML extensions (usually included with PHP)
- Composer for dependency management and autoloading
Installation
composer require hm-marketing/onix-parser
Or clone the repository and install dependencies:
git clone https://github.com/your-username/onix-parser.git
cd onix-parser
composer install
Project Structure
onix-parser/
├── composer.json
├── README.md
├── Example.php
├── src/
│ ├── CodeMaps.php
│ ├── FieldMappings.php
│ ├── Logger.php
│ ├── OnixParser.php
│ └── Model/
│ ├── Collection.php
│ ├── Description.php
│ ├── Header.php
│ ├── Image.php
│ ├── Onix.php
│ ├── Price.php
│ ├── Product.php
│ ├── Subject.php
│ └── Title.php
├── data/
│ ├── clil_codes.json
│ └── thema_codes.json
├── tests/
│ ├── OnixParserTest.php
│ ├── streaming_test.php
│ └── fixtures/
│ └── onix_samples/
│ └── demo.xml
└── vendor/
└── ...
Basic Usage
<?php require_once 'vendor/autoload.php'; use ONIXParser\OnixParser; use ONIXParser\Logger; // Create a parser $parser = new OnixParser(); try { // Parse an ONIX file $onix = $parser->parseFile('path/to/your/onix_file.xml'); // Access header information echo "Sender: " . $onix->getHeader()->getSender() . "\n"; echo "Date: " . $onix->getHeader()->getSentDateTime() . "\n"; // Access product information foreach ($onix->getProducts() as $product) { echo "ISBN: " . $product->getIsbn() . "\n"; echo "Title: " . $product->getTitle()->getText() . "\n"; // Access subject information foreach ($product->getClilSubjects() as $subject) { echo "CLIL Subject: " . $subject->getHeadingText() . " (" . $subject->getCode() . ")\n"; } // Check availability if ($product->isAvailable()) { echo "Status: Available\n"; // Get price information foreach ($product->getPrices() as $price) { echo "Price: " . $price->getFormattedPrice() . "\n"; } } else { echo "Status: Not available\n"; } // Get cover images if ($coverImage = $product->getPrimaryCoverImage()) { echo "Cover Image: " . $coverImage->getUrl() . "\n"; echo "Image HTML: " . $coverImage->getImageTag(['alt' => 'Cover image', 'class' => 'book-cover']) . "\n"; } echo "\n"; } // Find a specific product by ISBN $foundProduct = $onix->findProductByIsbn('9780123456789'); } catch (Exception $e) { echo "Error: " . $e->getMessage() . "\n"; }
See the examples directory for more detailed usage examples.
New Features (v1.6.0+)
Enhanced Product Information
The ONIX Parser now provides access to comprehensive product metadata with human-readable translations:
$product = $onix->getProducts()[0]; // Physical product details $pageCount = $product->getPageCount(); // 224 $height = $product->getHeight(); // ['value' => 155, 'unit' => 'mm', 'unit_name' => 'Millimeters'] $width = $product->getWidth(); // ['value' => 105, 'unit' => 'mm', 'unit_name' => 'Millimeters'] $thickness = $product->getThickness(); // ['value' => 12, 'unit' => 'mm', 'unit_name' => 'Millimeters'] $weight = $product->getWeight(); // ['value' => 224, 'unit' => 'gr', 'unit_name' => 'Grams'] // Language information $languageCode = $product->getLanguageCode(); // 'fra' $languageName = $product->getLanguageName(); // 'French' // Enhanced availability with human-readable names $availabilityCode = $product->getAvailabilityCode(); // '20' $availabilityName = $product->getAvailabilityName(); // 'Available' // Product form details $productForm = $product->getProductForm(); // 'BC' $productFormName = $product->getProductFormName(); // 'Paperback' $productFormDetail = $product->getProductFormDetail(); // 'A103' $productFormDetailName = $product->getProductFormDetailName(); // 'Trade paperback (UK mass-market paperback)' // Publishing information $publisher = $product->getPublisherName(); // 'Hachette Livre' $imprint = $product->getImprintName(); // 'HACHETTE TOURI' $countryOfPublication = $product->getCountryOfPublication(); // 'FR' $countryName = $product->getCountryOfPublicationName(); // 'France' $cityOfPublication = $product->getCityOfPublication(); // 'Paris' // Edition and copyright information $editionNumber = $product->getEditionNumber(); // '2' $editionStatement = $product->getEditionStatement(); // 'Second edition' $copyrightYear = $product->getCopyrightYear(); // '2023' $firstPublicationYear = $product->getFirstPublicationYear(); // '2023' // Enhanced pricing with type names foreach ($product->getPrices() as $price) { echo $price->getType(); // '04' echo $price->getPriceTypeName(); // 'Fixed retail price including tax' echo $price->getFormattedPrice(); // '29.99 EUR' }
New Code Mappings
Four new comprehensive code mapping systems have been added:
use ONIXParser\CodeMaps; // Product form detail codes (List 175) $productFormDetailMap = CodeMaps::getProductFormDetailMap(); // Examples: 'A101' => 'Hardcover book with dust jacket', 'A301' => 'Basic ebook' // Language codes (ISO 639) $languageCodeMap = CodeMaps::getLanguageCodeMap(); // Examples: 'fra' => 'French', 'eng' => 'English', 'spa' => 'Spanish' // Country codes (ISO 3166-1) $countryCodeMap = CodeMaps::getCountryCodeMap(); // Examples: 'FR' => 'France', 'US' => 'United States', 'GB' => 'United Kingdom' // Measure unit codes (List 50) $measureUnitMap = CodeMaps::getMeasureUnitMap(); // Examples: 'mm' => 'Millimeters', 'cm' => 'Centimeters', 'gr' => 'Grams' // Get all code mappings at once $allMaps = CodeMaps::getAllMaps();
Field Mappings Enhancements
New XPath mappings have been added for language and publishing metadata:
use ONIXParser\FieldMappings; $mappings = FieldMappings::getMappings(); // Language mappings $languageMappings = $mappings['language']; // Includes: 'primary', 'code', 'role', 'nodes' // Publishing metadata mappings $publishingMappings = $mappings['publishing_metadata']; // Includes: 'country_of_publication', 'city_of_publication', 'copyright_year', // 'edition_number', 'edition_statement'
Batch Processing for Large Files
The ONIX Parser supports batch processing of large XML files using a streaming approach, which is more memory-efficient and helps avoid timeout issues when dealing with very large ONIX files.
<?php require_once 'vendor/autoload.php'; use ONIXParser\OnixParser; use ONIXParser\Logger; // Create a parser with logger $logger = new Logger(Logger::INFO, 'onix_streaming_parser.log'); $parser = new OnixParser($logger); // Define a callback function to process each product as it's parsed $productCallback = function($product, $index, $total) { echo "Processing product " . ($index + 1) . " of $total: " . $product->getRecordReference() . "\n"; // You can do additional processing here, such as: // - Save to database // - Generate reports // - Export to other formats // For this example, we'll just print the title if ($product->getTitle()) { echo " Title: " . $product->getTitle()->getText() . "\n"; } // Return true to continue processing, or false to stop return true; // Returning false would stop processing after this product }; // Options for batch processing $options = [ 'limit' => 100, // Process only 100 products (0 means no limit) 'offset' => 0, // Start from the first product 'callback' => $productCallback, // Process each product as it's parsed 'continue_on_error' => true, // Continue processing if an error occurs ]; // Parse the file using streaming approach $onix = $parser->parseFileStreaming('path/to/large/onix/file.xml', $options);
Benefits of Streaming Parser
The streaming parser offers several advantages for large ONIX files:
- Memory Efficiency: Processes one product at a time, keeping memory usage low
- Batch Processing: Can process a subset of products (using limit and offset)
- Progress Tracking: Provides real-time feedback through the callback function
- Conditional Processing: Can stop processing when callback returns false
- Error Handling: Can continue processing even if some products fail to parse
- Performance: Better performance for large files, avoiding timeouts
- Consistency: Extracts the same data as the regular parser, including supplier GLN
See examples/streaming.php for a complete example of using the streaming parser.
Subject Classification
The ONIX Parser supports the three classification schemes recommended by CLIL:
- CLIL Classification (mandatory) - scheme identifier '29'
- THEMA Classification - scheme identifiers '93' to '99'
- ScoLOMFR Classification - scheme identifiers 'A6' and 'B3'
The parser uses JSON mapping files to convert subject codes to human-readable names:
data/clil_codes.json- Maps CLIL codes to descriptive namesdata/thema_codes.json- Maps THEMA codes to descriptive names
Example of using subject information:
// Get all CLIL subjects $clilSubjects = $product->getClilSubjects(); // Get the main subject $mainSubject = $product->getMainSubject(); // Check if product has a specific subject code if ($product->hasSubjectCode('3455', '29')) { echo "This is a psychological thriller!"; }
Working with Images
The ONIX Parser supports extracting images and other media resources from ONIX files:
// Get all images for a product $images = $product->getImages(); // Get just the cover images $coverImages = $product->getCoverImages(); // Get the primary cover image $coverImage = $product->getPrimaryCoverImage(); // Get sample content $sampleContent = $product->getSampleContent(); // Get images of a specific type $contributorImages = $product->getImagesByType('04'); // Contributor pictures // Generate an HTML image tag if ($coverImage) { $imgTag = $coverImage->getImageTag([ 'alt' => 'Book cover for ' . $product->getTitle()->getText(), 'class' => 'book-cover', 'width' => '300' ]); echo $imgTag; // Outputs: <img src="http://example.com/cover.jpg" alt="Book cover for Book Title" class="book-cover" width="300"> } // Check image type if ($image->isCoverImage()) { echo "This is a cover image"; } if ($image->isImage()) { echo "This is an image (not video or audio)"; } // Get file extension $extension = $image->getFileExtension(); // Returns 'jpg', 'png', etc. // Validate URL if ($image->hasValidUrl()) { echo "The image URL is valid"; }
Working with Collections and Series
The ONIX Parser supports extracting collection and series information from ONIX files, including complex collection structures with multiple titles and levels:
// Check if a product is part of a series or collection if ($product->isPartOfSeries()) { echo "This product is part of a series"; } if ($product->isPartOfCollection()) { echo "This product is part of a collection"; } // Get all collections and series $collections = $product->getCollections(); // Get only series $series = $product->getSeries(); // Get only regular collections $regularCollections = $product->getRegularCollections(); // Get primary series $primarySeries = $product->getPrimarySeries(); if ($primarySeries) { echo "Series: " . $primarySeries->getDisplayName(); echo "Series Title: " . $primarySeries->getTitleText(); echo "Part Number: " . $primarySeries->getPartNumber(); } // Working with a collection object $collection = $product->getCollections()[0]; if ($collection) { // Get collection type if ($collection->isSeries()) { echo "This is a series"; } elseif ($collection->isCollection()) { echo "This is a regular collection"; } // Get collection title echo "Title: " . $collection->getTitleText(); // Get part number within the collection echo "Part: " . $collection->getPartNumber(); // Get formatted display name (includes part number if available) echo "Display Name: " . $collection->getDisplayName(); // Working with additional titles $additionalTitles = $collection->getAdditionalTitles(); foreach ($additionalTitles as $type => $levelTitles) { foreach ($levelTitles as $level => $text) { echo "Title (Type $type, Level $level): $text"; } } // Get titles of a specific type (e.g., abbreviated titles - type 05) $abbreviatedTitles = $collection->getTitlesByType('05'); if ($abbreviatedTitles) { foreach ($abbreviatedTitles as $level => $text) { echo "Abbreviated title (Level $level): $text"; } } // Convert to string (same as getDisplayName()) echo $collection; // Implicitly calls __toString() }
Working with Descriptions
The ONIX Parser provides comprehensive support for product descriptions:
// Get all descriptions for a product $descriptions = $product->getDescriptions(); // Get specific types of descriptions $mainDescription = $product->getMainDescription(); $shortDescription = $product->getShortDescription(); $longDescription = $product->getLongDescription(); $tableOfContents = $product->getTableOfContents(); $reviewQuotes = $product->getReviewQuotes(); // Get descriptions by type code $featuresDescriptions = $product->getDescriptionsByType('11'); // Features // Check if a product has a specific description type if ($product->hasDescriptionType('03')) { echo "This product has a long description"; } // Working with a description object $description = $product->getMainDescription(); if ($description) { // Get the raw content (which may include HTML) $content = $description->getContent(); // Get plain text (with HTML stripped if necessary) $plainText = $description->getPlainText(); // Get a short excerpt (useful for search results or listings) $excerpt = $description->getExcerpt(150, '...'); // Check if description is in HTML format if ($description->isHtml()) { echo "Description contains HTML formatting"; } // Convert to string (same as getPlainText()) echo $description; // Implicitly calls __toString() }
Supplier Information
The ONIX Parser extracts supplier information, including the Global Location Number (GLN):
// Get supplier information $supplierName = $product->getSupplierName(); $supplierRole = $product->getSupplierRole(); $supplierGLN = $product->getSupplierGLN(); // Check if the supplier is a publisher if ($product->getSupplierRole() === '01') { echo "Supplier is the publisher"; } // Get formatted supplier information echo "Supplied by: " . $product->getSupplierName() . " (GLN: " . $product->getSupplierGLN() . ")";
Configuration
Logging
You can configure the logging level when creating the parser:
// Create a logger with debug level $logger = new Logger(Logger::DEBUG, 'path/to/logfile.log'); // Pass the logger to the parser $parser = new OnixParser($logger);
Available log levels:
Logger::ERROR- Only log errorsLogger::WARNING- Log errors and warningsLogger::INFO- Log errors, warnings, and informational messagesLogger::DEBUG- Log all messages, including debug information
Advanced Usage
Working with Product Objects
The Product class provides access to common product attributes:
$product = $onix->getProducts()[0]; // Basic identifiers $isbn = $product->getIsbn(); $ean = $product->getEan(); // Title information $title = $product->getTitle()->getText(); $subtitle = $product->getTitle()->getSubtitle(); $fullTitle = $product->getTitle()->getFullTitle(); // Subject information $allSubjects = $product->getSubjects(); $clilSubjects = $product->getClilSubjects(); $themaSubjects = $product->getThemaSubjects(); $scoLOMFRSubjects = $product->getScoLOMFRSubjects(); $mainSubject = $product->getMainSubject(); // Image information $allImages = $product->getImages(); $coverImages = $product->getCoverImages(); $primaryCover = $product->getPrimaryCoverImage(); // Availability $isAvailable = $product->isAvailable(); $availabilityCode = $product->getAvailabilityCode(); // Prices $defaultPrice = $product->getDefaultPrice(); $allPrices = $product->getPrices();
Accessing Original XML
You can access the original XML representation of ONIX objects:
// Get original XML for the header $headerXml = $onix->getHeader()->getXml(); // Get original XML for a product $productXml = $product->getXml();
Extending the Parser
The parser is designed to be extensible. To add support for additional ONIX elements:
- Add the necessary XPath expressions to
FieldMappings.php - Add any new code lists to
CodeMaps.php - Create new model classes as needed
- Update the
OnixParserclass to parse the new elements
License
Credits
Developed by [Kiran Mohamed/HM Marketing]
Missing Methods & Future Enhancements
While this version (1.6.0) adds comprehensive support for most commonly needed ONIX data, some methods that could be added in future releases include:
Potential Future Methods
Contributor Details
getContributorDetails()- Enhanced contributor information with roles and biographical notesgetMainAuthor()- Primary author extractiongetTranslator()- Translator information
Advanced Pricing
getPriceByType($type)- Get specific price by type codegetLowestPrice()- Get the lowest available pricegetPriceHistory()- Price change tracking (if available in ONIX)
Sales Information
getSalesRestrictions()- Territory and sales restrictionsgetSalesRights()- Sales rights informationgetAudienceInformation()- Target audience details
Technical Details
getFileFormat()- For digital products (EPUB, PDF, etc.)getFileSize()- Digital file size informationgetTechnicalProtection()- DRM and protection information
Extended Classification
getAgeRange()- Target age range informationgetReadingLevel()- Reading difficulty levelgetAudience()- Audience code (general, academic, professional, etc.)
Related Products
getRelatedProducts()- Related product informationgetAlternativeFormats()- Other formats of the same workgetReplacedBy()- Replacement product information
Requesting New Features
If you need any of these methods or have other requirements, please:
- Check the current capabilities - The library already supports extensive ONIX data extraction
- Open an issue on GitHub describing your use case
- Contribute by implementing the feature and submitting a pull request
Current Method Coverage
✅ Fully Implemented:
- Basic product information (ISBN, title, publisher, dates)
- Physical measurements (dimensions, weight, page count)
- Language information with human-readable names
- Availability status with translations
- Product classification (form, form details)
- Publishing metadata (country, city, edition, copyright)
- Pricing with type descriptions
- Subject classification (CLIL, THEMA, ScoLOMFR)
- Images and media resources
- Collections and series
- Descriptions and content
- Supplier information (including GLN)
🔄 Partially Implemented:
- Contributor information (basic roles available, could be enhanced)
- Pricing (comprehensive but could add convenience methods)
⏳ Not Yet Implemented:
- Sales restrictions and rights
- Advanced audience targeting
- Technical details for digital products
- Related product relationships
- Advanced contributor biographical information
API Documentation
For complete API documentation including all methods, parameters, and examples, see:
- API Reference - Comprehensive method documentation
- Changelog - Detailed version history and changes
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Development Guidelines
When adding new methods:
- Use FieldMappings for all XPath queries
- Use CodeMaps for human-readable translations
- Add comprehensive documentation with PHPDoc comments
- Include test coverage for new functionality
- Follow existing patterns for consistency
统计信息
- 总下载量: 53
- 月度下载量: 0
- 日度下载量: 0
- 收藏数: 1
- 点击次数: 0
- 依赖项目数: 0
- 推荐数: 0
其他信息
- 授权协议: MIT
- 更新时间: 2025-03-21