centamiv/vektor 问题修复 & 功能扩展

解决BUG、新增功能、兼容多环境部署,快速响应你的开发需求

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

centamiv/vektor

最新稳定版本:v20260104225022

Composer 安装命令:

composer create-project centamiv/vektor

包简介

A native PHP Vector Database implementation with strict binary storage and Zero-RAM overhead.

README 文档

README

Vektor is a high-performance, purely file-based, embedded Vector Database written entirely in native PHP. It is designed for Zero-RAM Overhead, meaning it does not require loading your entire dataset into memory to function.

Each Vektor instance operates as a standalone database, with data stored by default in the /data directory.

Instead of memory-heavy indexes, Vektor utilizes strict binary file layouts and optimized disk-seeking strategies to perform Approximate Nearest Neighbor (ANN) searches using the HNSW (Hierarchical Navigable Small World) algorithm.

Features

  • Pure PHP: No external dependencies or C extentions required. Runs on any standard PHP 8.2+ environment.
  • Zero-RAM Overhead: Data is read directly from disk. Memory usage is constant regardless of dataset size.
  • HNSW Algorithm: Efficient graph-based index for fast approximate nearest neighbor search.
  • Binary Storage: Compact binary file formats for Vectors, Graph connections, and Metadata.
  • Embedded or Server: Use it directly in your PHP code or run it as a standalone HTTP API server.
  • Thread-Safety: Implements file locking (flock) to safely handle concurrent reads and writes.
  • Cosine Similarity: Optimized distance metric for high-dimensional embeddings (configurable, default 1536).

Requirements

  • PHP: 8.2 or higher
  • Composer: For dependency management

Upgrading to v2.0.0

⚠️ Important Breaking Change: Version 2.0.0 introduces a change to the binary record length in meta.bin. If you are upgrading from a previous version, your existing data files will be incompatible. You must delete your existing data/ directory and re-index (rebuild) your dataset.

Installation

1. Installation via Composer

To use Vektor in your existing PHP project:

composer require centamiv/vektor

2. Standalone Installation

To run Vektor as a standalone API server:

git clone https://github.com/centamiv/vektor.git
cd vektor
composer install --no-dev

Ensure the data/ directory is writable by your web server or script user:

mkdir -p data
chmod -R 775 data

Configuration

Vektor uses a .env file for configuration when running as a server.

  1. Copy the example environment file:

    cp .env.example .env
  2. Open .env and configure your API Token:

    # .env
    VEKTOR_API_TOKEN=your_secure_random_string_here
    VEKTOR_DIMENSIONS=1536
  • VEKTOR_API_TOKEN: If set, all API requests (except /up) must include this token in the Authorization header. If left empty, the API is open to the public.
  • VEKTOR_DIMENSIONS: Set the dimension of your vectors (default: 1536). IMPORTANT: Changing this requires a fresh database (delete data/ dir).

Usage

Vektor is designed for flexibility, allowing you to either integrate it directly into your PHP projects as a library or deploy it as a standalone REST API server.

Usage: HTTP API Server

Vektor includes a built-in Controller to run as a REST API. You can serve this using Apache, Nginx, or the PHP built-in server.

Starting the Server

For testing/development:

# Serves the public/ directory on port 8000
php -S 0.0.0.0:8000 -t public

Authentication

If VEKTOR_API_TOKEN is set in your .env, you must include the header in all requests:

Authorization: Bearer <your-token>

API Endpoints

GET /up

Health check endpoint.

  • Auth Required: No
  • Response:
{
  "status": "up"
}
GET /info

Returns database statistics.

  • Response:
{
  "storage": {
    "vector_file_bytes": 1048576,
    "graph_file_bytes": 524288,
    "meta_file_bytes": 2048,
    "payload_file_bytes": 4096
  },
  "records": {
    "vectors_total": 150,
    "graph_nodes": 150
  },
  "config": {
    "dimension": 1536,
    "max_levels": 4
  }
}
POST /insert

Insert a vector.

  • Body:
{
  "id": "my-doc-id",
  "vector": [0.1, 0.2, 0.3, ...],
  "metadata": {
    "source": "docs/intro.md",
    "chunk": 3
  }
}
  • Response:
{
  "status": "success",
  "id": "my-doc-id"
}
POST /search

Search for nearest neighbors.

  • Body:
{
  "vector": [0.1, 0.2, 0.3, ...],
  "k": 5
}
  • Response:
{
  "results": [
    { "id": "my-doc-id", "distance": 0.95 },
    { "id": "another-id", "distance": 0.88 }
  ]
}

Optionally pass "include_vector": true to also get vector data of similar documents. Optionally pass "include_metadata": true to also get metadata stored with the document.

  • Body:
{
  "vector": [0.1, 0.2, 0.3, ...],
  "include_vector": true,
  "include_metadata": true,
  "k": 5
}
  • Response:
{
  "results": [
    { "id": "my-doc-id", "distance": 0.95, "vector": [0.5, 1.0, 0.3, ...], "metadata": { "source": "docs/intro.md", "chunk": 3 } },
    { "id": "another-id", "distance": 0.88, "vector": [0.5, 1.1, 0.3, ...], "metadata": { "source": "docs/faq.md", "chunk": 1 } }
  ]
}
POST /delete

Delete a vector.

  • Body:
{
  "id": "my-doc-id"
}
  • Response:
{
  "status": "success",
  "message": "..."
}
POST /optimize

Trigger database optimization.

  • Response:
{
  "status": "success",
  "message": "..."
}

Usage: Embedded Library

You can use Vektor directly in your PHP scripts without running an HTTP server. This is the fastest way to interact with the database.

Configuration

By default, Vektor stores data in the data/ directory relative to the package root. You can change this path using the Config class:

use Centamiv\Vektor\Core\Config;

Config::setDataDir(__DIR__ . '/my_custom_data_dir');

You can also set the vector dimensions (default 1536):

Config::setDimensions(768);
// Note: This must be called BEFORE initializing Indexer/Searcher

Initialization

use Centamiv\Vektor\Services\Indexer;
use Centamiv\Vektor\Services\Searcher;
use Centamiv\Vektor\Services\Optimizer;

// The Indexer handles writing (Insert, Delete)
$indexer = new Indexer();

// The Searcher handles reading (Search)
$searcher = new Searcher();

1. Inserting Vectors

Vectors must be 1536-dimensional arrays of floats.

$id = "doc-123"; // String ID (max 36 chars)
$vector = [0.0123, -0.5231, ...]; // Array of 1536 floats

// Insert (or update if ID exists - NOTE: Updates are essentially Appends with pointer updates)
$metadata = ['source' => 'docs/intro.md', 'chunk' => 3];
$indexer->insert($id, $vector, $metadata);

2. Searching

Find the k nearest neighbors to a query vector.

$queryVector = [0.0123, ...];
$k = 5; // Number of results

$results = $searcher->search($queryVector, $k, includeMetadata: true);

// Output:
// [
//   ['id' => 'doc-123', 'score' => 0.9823, 'metadata' => ['source' => 'docs/intro.md', 'chunk' => 3]],
//   ['id' => 'doc-456', 'score' => 0.8912, 'metadata' => ['source' => 'docs/faq.md', 'chunk' => 1]],
//   ...
// ]

3. Deleting Results

Deletes a document by its ID. This performs a "soft delete" in the vector file and updates the metadata mapping.

$success = $indexer->delete("doc-123");

if ($success) {
    echo "Document deleted.";
} else {
    echo "Document not found.";
}

4. Getting Statistics

Retrieve current database stats, including file sizes and node counts.

$stats = $indexer->getStats();
print_r($stats);

5. Optimizing (Vacuum)

Since deletions are "soft", the file size can grow over time. Run the optimizer to rebuild the index and reclaim space. Note: This is a blocking operation.

$optimizer = new Optimizer();
$optimizer->run();

Database File Structure

Vektor achieves its performance and low memory footprint through three specialized binary files located in the data/ directory.

  • vector.bin: Stores raw vector data in an append-only structure.
  • meta.bin: Maps external string IDs to internal file offsets using a disk-based Binary Search Tree (BST) for efficient lookups without loading maps into RAM.
  • payload.bin: Stores serialized metadata (JSON) in an append-only structure referenced by meta.bin.
  • graph.bin: Stores the HNSW Graph structure to enable fast navigation and approximate nearest neighbor searches.
  • Concurrency: Implements advisory file locking to manage simultaneous shared reads and exclusive write operations safely.

How to generate embeddings from a document?

Vektor stores vectors, but it does not generate them. You need an embedding model for that. A great local option is Ollama.

  1. Install Ollama and pull an embedding model (e.g., nomic-embed-text).
  2. Use the Ollama API to generate the vector for your text.
  3. Pass that vector to Vektor:
function getEmbedding(string $text): array {
    $ch = curl_init('http://localhost:11434/api/embeddings');
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode([
        'model' => 'nomic-embed-text',
        'prompt' => $text
    ]));

    $response = json_decode(curl_exec($ch), true);
    curl_close($ch);

    return $response['embedding'];
}

// IMPORTANT: Vektor stores the ID and the Vector, but NOT the original content.
// You are responsible for storing the actual text (in files, S3, etc.).

// 1. Read your document
$id = "doc-hello";
$text = file_get_contents("{$id}.txt");

// 2. Generate vector
$vector = getEmbedding($text);

// 3. Insert into Vektor using the filename/ID as the reference
$indexer->insert($id, $vector);

Troubleshooting

1. Permission Denied Errors

Ensure your PHP process (e.g., www-data) has read/write access to the data/ folder and the files inside it.

chown -R www-data:www-data data/
chmod -R 775 data/

2. "Invalid Vector Dimensions"

Vektor defaults to 1536 dimensions. If you send a vector with different dimensions, it will be rejected. To change this, you can use VECTOR_DIMENSIONS in your .env or Config::setDimensions(N) in your code. Important: If you change dimensions, you must start with an empty data directory, as the binary file structure depends on the dimension size.

3. Slow Performance?

  • Disk I/O: Since Vektor is disk-based, SSDs are highly recommended. HDDs will result in slow seek times.
  • Opcache: Ensure PHP Opcache is enabled for production.

Contributing

Contributions are welcome! Please run the test suite before submitting a PR.

composer test

The test suite includes Unit tests for storage engines and Feature tests for the HNSW logic.

License

This project is licensed under the MIT License.

统计信息

  • 总下载量: 10
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 21
  • 点击次数: 0
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 21
  • Watchers: 1
  • Forks: 2
  • 开发语言: PHP

其他信息

  • 授权协议: MIT
  • 更新时间: 2026-01-02