GalaxDB Documentation
GalaxDB is an AI-native database written in Rust. It combines SQL, HNSW vector search, local embeddings, time-travel queries, training export (Lance format), and near-dedup (MinHash LSH) in a single binary.
What is GalaxDB?
GalaxDB is designed for AI and ML workloads that need more than a traditional database. Instead of stitching together a relational database, a vector store, an embedding API, and a training pipeline, GalaxDB provides all of these capabilities in one binary with a single SQL interface.
It speaks the PostgreSQL wire protocol, so any PostgreSQL client — psql, psycopg2, SQLAlchemy, node-postgres — connects without modification.
-- Create a table with automatic embeddings
CREATE TABLE docs (
id INT PRIMARY KEY,
body TEXT EMBEDDING MODEL 'sentence-transformers/all-MiniLM-L6-v2' DIM 384
);
-- Insert rows — embeddings computed automatically
INSERT INTO docs (id, body) VALUES (1, 'machine learning and neural networks');
INSERT INTO docs (id, body) VALUES (2, 'cooking recipes italian pasta');
-- Semantic search
SELECT id, body FROM docs WHERE SEMANTIC_MATCH(body, 'AI deep learning', 0.4);
-- Time-travel
CREATE VERSION TAG 'v1' FOR TRAINING WITH TRAINING PRECISION 'float32';
SELECT * FROM docs AT VERSION 'v1';Key Features
Complete AuroraSQL dialect — CREATE, INSERT, UPDATE, DELETE, SELECT with WHERE, joins, and aggregates.
Text → vector conversion runs inside the process via a sidecar. No API key, no data leaving your machine.
recall@10 = 0.990 on SIFT-1M at ef=200. 459 µs mean latency. SEMANTIC_MATCH in any WHERE clause.
SELECT ... AT VERSION 'tag' to query historical snapshots. Reproducible ML training, EU AI Act compliance.
CREATE VERSION TAG ... FOR TRAINING exports a Lance dataset. Zero-copy PyTorch-ready in one SQL command.
AES-256-GCM on every block and WAL record. Pluggable key management: local, env, AWS KMS, Vault.
Quick Links
Architecture Overview
GalaxDB is a single Rust binary with an optional sidecar process for embedding computation. The core engine handles SQL parsing (AuroraSQL), storage (LSM-tree with PAX blocks), vector indexing (HNSW), and the PostgreSQL wire protocol.
The embedding sidecar is a separate process that loads HuggingFace sentence-transformer models and serves embedding requests over a local socket. This isolation means the main database process never loads Python or ML frameworks — it stays lean and crash-safe.
┌─────────────────────────────────────────────────────┐
│ galaxdb-server │
│ │
│ PostgreSQL Wire Protocol (port 5433) │
│ ↓ │
│ AuroraSQL Parser → Executor │
│ ↓ │
│ Storage Engine (LSM + PAX + WAL + ART) │
│ ↓ │
│ HNSW Vector Index │
│ ↓ │
│ HTTP Observability (port 9090) │
└─────────────────────────────────────────────────────┘
↕ local socket
┌─────────────────────────────────────────────────────┐
│ galaxdb-sidecar (optional) │
│ HuggingFace sentence-transformer model │
│ text → float32[384] embeddings │
└─────────────────────────────────────────────────────┘The storage engine is built on 12 components: PAX blocks, WAL, memtable (crossbeam-skiplist), ART primary key index, Bloom filters (Monkey allocation), NUMA-aware buffer pool, lazy leveling compaction (Dostoevsky), KV separation, AES-256-GCM encryption, write stall mitigation, disk-full handling, and statistics collection. Every design decision has a research citation.