G
v1.0.0-beta.1 · 740 tests passing

GalaxDB Documentation

GalaxDB is an AI-native database written in Rust. It combines SQL, HNSW vector search, local embeddings, time-travel queries, training export (Lance format), and near-dedup (MinHash LSH) in a single binary.

What is GalaxDB?

GalaxDB is designed for AI and ML workloads that need more than a traditional database. Instead of stitching together a relational database, a vector store, an embedding API, and a training pipeline, GalaxDB provides all of these capabilities in one binary with a single SQL interface.

It speaks the PostgreSQL wire protocol, so any PostgreSQL client — psql, psycopg2, SQLAlchemy, node-postgres — connects without modification.

SQL
-- Create a table with automatic embeddings
CREATE TABLE docs (
    id   INT PRIMARY KEY,
    body TEXT EMBEDDING MODEL 'sentence-transformers/all-MiniLM-L6-v2' DIM 384
);

-- Insert rows — embeddings computed automatically
INSERT INTO docs (id, body) VALUES (1, 'machine learning and neural networks');
INSERT INTO docs (id, body) VALUES (2, 'cooking recipes italian pasta');

-- Semantic search
SELECT id, body FROM docs WHERE SEMANTIC_MATCH(body, 'AI deep learning', 0.4);

-- Time-travel
CREATE VERSION TAG 'v1' FOR TRAINING WITH TRAINING PRECISION 'float32';
SELECT * FROM docs AT VERSION 'v1';

Key Features

Full SQL

Complete AuroraSQL dialect — CREATE, INSERT, UPDATE, DELETE, SELECT with WHERE, joins, and aggregates.

Local Embeddings

Text → vector conversion runs inside the process via a sidecar. No API key, no data leaving your machine.

HNSW Vector Search

recall@10 = 0.990 on SIFT-1M at ef=200. 459 µs mean latency. SEMANTIC_MATCH in any WHERE clause.

Time-Travel

SELECT ... AT VERSION 'tag' to query historical snapshots. Reproducible ML training, EU AI Act compliance.

Training Export

CREATE VERSION TAG ... FOR TRAINING exports a Lance dataset. Zero-copy PyTorch-ready in one SQL command.

Encryption at Rest

AES-256-GCM on every block and WAL record. Pluggable key management: local, env, AWS KMS, Vault.

Architecture Overview

GalaxDB is a single Rust binary with an optional sidecar process for embedding computation. The core engine handles SQL parsing (AuroraSQL), storage (LSM-tree with PAX blocks), vector indexing (HNSW), and the PostgreSQL wire protocol.

The embedding sidecar is a separate process that loads HuggingFace sentence-transformer models and serves embedding requests over a local socket. This isolation means the main database process never loads Python or ML frameworks — it stays lean and crash-safe.

┌─────────────────────────────────────────────────────┐
│                  galaxdb-server                      │
│                                                      │
│  PostgreSQL Wire Protocol (port 5433)                │
│  ↓                                                   │
│  AuroraSQL Parser → Executor                         │
│  ↓                                                   │
│  Storage Engine (LSM + PAX + WAL + ART)              │
│  ↓                                                   │
│  HNSW Vector Index                                   │
│  ↓                                                   │
│  HTTP Observability (port 9090)                      │
└─────────────────────────────────────────────────────┘
         ↕ local socket
┌─────────────────────────────────────────────────────┐
│              galaxdb-sidecar (optional)              │
│  HuggingFace sentence-transformer model              │
│  text → float32[384] embeddings                      │
└─────────────────────────────────────────────────────┘

The storage engine is built on 12 components: PAX blocks, WAL, memtable (crossbeam-skiplist), ART primary key index, Bloom filters (Monkey allocation), NUMA-aware buffer pool, lazy leveling compaction (Dostoevsky), KV separation, AES-256-GCM encryption, write stall mitigation, disk-full handling, and statistics collection. Every design decision has a research citation.