5-Minute Quickstart
This guide walks you through creating a table with automatic embeddings, inserting data, running semantic search, and querying a historical snapshot — all in under 5 minutes.
1. Start the Server
Start GalaxDB with the embedding sidecar for full AI features:
galaxdb-server \
--data-dir ./quickstart-data \
--port 5433 \
--observe-port 9090 \
--sidecar /usr/local/bin/galaxdb-sidecar \
--model sentence-transformers/all-MiniLM-L6-v2On first run, the sidecar downloads the model (~90 MB). Wait for the log line:
# Verify the sidecar is healthy
curl http://localhost:9090/health
# {"status":"ok","version":"1.0.0-beta.1","subsystems":{"disk_full":false,"sidecar_healthy":true,"connections_active":0}}Tip
sidecar_healthy: true confirms embeddings are active. If it shows false, the model is still downloading or the sidecar path is wrong.2. Create a Table
Connect with psql and create a table with an embedding column:
psql "host=localhost port=5433 dbname=galaxdb sslmode=disable"-- Create table with embedding column
-- The EMBEDDING MODEL clause tells GalaxDB to automatically
-- compute 384-dimensional embeddings for the 'body' column
CREATE TABLE docs (
id INT PRIMARY KEY,
body TEXT EMBEDDING MODEL 'sentence-transformers/all-MiniLM-L6-v2' DIM 384
);3. Insert Data
Insert rows — embeddings are computed automatically by the sidecar. No extra code needed.
INSERT INTO docs (id, body) VALUES (1, 'machine learning and neural networks');
INSERT INTO docs (id, body) VALUES (2, 'cooking recipes italian pasta');
INSERT INTO docs (id, body) VALUES (3, 'rust programming language systems');
INSERT INTO docs (id, body) VALUES (4, 'deep learning transformers attention');
INSERT INTO docs (id, body) VALUES (5, 'database storage engine LSM tree');For bulk inserts, use BULK INSERT which is faster for many rows:
BULK INSERT INTO docs (id, body) VALUES
(6, 'natural language processing BERT'),
(7, 'computer vision image classification'),
(8, 'reinforcement learning reward policy');4. Semantic Search
Use SEMANTIC_MATCH to find rows semantically similar to a query string. The threshold (0.0–1.0) is cosine similarity — higher means stricter matching.
-- Find AI/ML related documents
SELECT id, body
FROM docs
WHERE SEMANTIC_MATCH(body, 'AI deep learning', 0.4);
-- Expected results: rows 1, 4, 6, 7 (AI/ML topics)
-- Row 2 (cooking) and row 3 (Rust) won't match
-- Combine with SQL filters (hybrid search)
SELECT id, body
FROM docs
WHERE SEMANTIC_MATCH(body, 'machine learning', 0.5)
AND id > 2;Note
5. Time-Travel Query
Create a named snapshot, insert more data, then query the snapshot to see only the data that existed at snapshot time.
-- Create a named snapshot (also exports a Lance training dataset)
CREATE VERSION TAG 'v1' FOR TRAINING WITH TRAINING PRECISION 'float32';
-- Insert more data after the snapshot
INSERT INTO docs (id, body) VALUES (9, 'new document added after snapshot');
-- Query the snapshot — only sees rows 1-8, not row 9
SELECT * FROM docs AT VERSION 'v1';
-- Current table has all 9 rows
SELECT COUNT(*) FROM docs; -- returns 9Python Example
The same workflow using the Python client in embedded mode (no server required):
import galaxdb
# Open (or create) a database
db = galaxdb.Database("./quickstart-data")
# Create table with embedding column
db.execute("""
CREATE TABLE docs (
id INT PRIMARY KEY,
body TEXT EMBEDDING MODEL 'sentence-transformers/all-MiniLM-L6-v2' DIM 384
)
""")
# Insert rows — embeddings computed automatically
db.execute("INSERT INTO docs (id, body) VALUES (1, 'machine learning and neural networks')")
db.execute("INSERT INTO docs (id, body) VALUES (2, 'cooking recipes italian pasta')")
db.execute("INSERT INTO docs (id, body) VALUES (3, 'deep learning transformers attention')")
# Semantic search — returns list of dicts
results = db.execute("SELECT id, body FROM docs WHERE SEMANTIC_MATCH(body, 'AI deep learning', 0.4)")
for row in results:
print(f"[{row['id']}] {row['body']}")
# Create training snapshot
ts = db.create_training_snapshot('v1', seed=42)
print(f"Snapshot timestamp: {ts}")
# Export as Lance dataset for PyTorch
path = db.training_dataset('v1')
print(f"Lance dataset at: {path}")
# Load into PyTorch (requires lance and torch)
# import lance, torch
# dataset = lance.dataset(path).to_pytorch()
# loader = torch.utils.data.DataLoader(dataset, batch_size=32)Next Steps
- Storage Engine — understand the LSM-tree, PAX blocks, and WAL
- Vector Search — HNSW configuration and recall tuning
- AuroraSQL Reference — complete SQL dialect documentation
- Python Client — full API reference for embedded and server modes
- Server Configuration — CLI flags, environment variables, and TLS
- Benchmarks — real performance numbers from AWS c6id.4xlarge