dgtlss/parqbridge
最新稳定版本:1.2.0
Composer 安装命令:
composer require dgtlss/parqbridge
包简介
Export Laravel database tables to Parquet files using Storage disks (no external deps).
README 文档
README
Export your Laravel database tables to real Apache Parquet files on any Storage disk (local, S3, etc.) with a simple artisan command.
ParqBridge focuses on zero PHP dependency bloat while still producing spec-compliant Parquet files by delegating the final write step to a tiny, embedded Python script using PyArrow (or any custom CLI you prefer). You keep full Laravel DX for configuration and Storage; we bridge your data to Parquet.
Installation
- Require the package in your app (path repo or VCS):
composer require dgtlss/parqbridge
-
Laravel will auto-discover the service provider. Alternatively, register
ParqBridge\\ParqBridgeServiceProvidermanually. -
Publish the config if you want to customize defaults:
php artisan vendor:publish --tag="parqbridge-config"
Configuration
Set your export disk and options in .env or config/parqbridge.php.
PARQUET_DISK: which filesystem disk to use (e.g.,s3,local).PARQUET_OUTPUT_DIR: directory prefix within the disk (defaultparquet-exports).PARQUET_CHUNK_SIZE: rows per DB chunk when exporting (default 1000).PARQUET_INFERENCE:database|sample|hybrid(defaulthybrid).PARQUET_COMPRESSION: compression codec for Parquet (UNCOMPRESSED/NONE,SNAPPY,GZIP,ZSTD,BROTLI,LZ4_RAW) when using PyArrow backend.PARQBRIDGE_WRITER:pyarrow(default) orcustom. Ifcustom, setPARQBRIDGE_CUSTOM_CMD.PARQBRIDGE_PYTHON: python executable for PyArrow (defaultpython3).
Example .env:
PARQUET_DISK=s3 PARQUET_OUTPUT_DIR=parquet-exports PARQUET_CHUNK_SIZE=2000
Ensure your filesystems disk is configured (e.g., s3) in config/filesystems.php.
FTP disk configuration
You can export directly to an FTP server using Laravel's ftp disk. Add an FTP disk to config/filesystems.php and reference it via PARQUET_DISK=ftp or --disk=ftp.
'disks' => [ 'ftp' => [ 'driver' => 'ftp', 'host' => env('FTP_HOST'), 'username' => env('FTP_USERNAME'), 'password' => env('FTP_PASSWORD'), // Optional FTP settings 'port' => (int) env('FTP_PORT', 21), 'root' => env('FTP_ROOT', ''), 'passive' => filter_var(env('FTP_PASSIVE', true), FILTER_VALIDATE_BOOL), 'ssl' => filter_var(env('FTP_SSL', false), FILTER_VALIDATE_BOOL), 'timeout' => (int) env('FTP_TIMEOUT', 90), ], ],
Note: This package will coerce common FTP env values (e.g., port, timeout, passive, ssl) to the proper types before resolving the disk to avoid Flysystem type errors like "Argument #5 ($port) must be of type int, string given".
Usage
- List tables:
php artisan parqbridge:tables
- Export a table to the configured disk:
php artisan parqbridge:export users --where="active = 1" --limit=1000 --output="parquet-exports" --disk=s3
On success, the command prints the full path written within the disk. Files are named {table}-{YYYYMMDD_HHMMSS}.parquet.
- Export ALL tables into one folder (timestamped subfolder inside
parqbridge.output_directory):
php artisan parqbridge:export-all --disk=s3 --output="parquet-exports" --exclude=migrations,password_resets
Options:
--include=: comma-separated allowlist of table names--exclude=: comma-separated denylist of table names
Data types
The schema inferrer maps common DB types to a set of Parquet primitive types and logical annotations. With the PyArrow backend, an Arrow schema is constructed to faithfully write types:
- Primitive:
BOOLEAN,INT32,INT64,FLOAT,DOUBLE,BYTE_ARRAY,FIXED_LEN_BYTE_ARRAY - Logical:
UTF8,DATE,TIME_MILLIS,TIME_MICROS,TIMESTAMP_MILLIS,TIMESTAMP_MICROS,DECIMAL
For decimals we write Arrow decimal types (decimal128/decimal256) with declared precision/scale.
Testing
Run the test suite:
composer install vendor/bin/phpunit
The tests bootstrap a minimal container, create a SQLite database, and verify:
- listing tables works on SQLite
- exporting a table writes a Parquet file to the configured disk (magic
PAR1) - schema inference on SQLite maps major families
Backend requirements
- By default ParqBridge uses Python + PyArrow. Ensure
python3is available and install PyArrow:
python3 -m pip install --upgrade pip python3 -m pip install pyarrow
- Alternatively set a custom converter command via
PARQBRIDGE_WRITER=customandPARQBRIDGE_CUSTOM_CMD(must read{input}CSV and write{output}Parquet).
You can automate setup via the included command:
php artisan parqbridge:setup --write-env
Options:
--python=: path/name of Python (default from configparqbridge.pyarrow_python)--venv=: location for virtualenv (default./parqbridge-venv)--no-venv: install into global Python instead of a venv--write-env: appendPARQBRIDGE_PYTHONandPARQBRIDGE_WRITERto.env--upgrade: upgrade pip first--dry-run: print commands without executing
统计信息
- 总下载量: 301
- 月度下载量: 0
- 日度下载量: 0
- 收藏数: 2
- 点击次数: 0
- 依赖项目数: 0
- 推荐数: 0
其他信息
- 授权协议: MIT
- 更新时间: 2025-08-13