5-Minute Quickstart

This guide walks you through creating a table with automatic embeddings, inserting data, running semantic search, and querying a historical snapshot - all in under 5 minutes.

1. Start the Server

Start GalaxDB with the embedding sidecar for full AI features:

bash

galaxdb-server \
  --data-dir ./quickstart-data \
  --port 5433 \
  --observe-port 9090 \
  --sidecar /usr/local/bin/galaxdb-sidecar \
  --model sentence-transformers/all-MiniLM-L6-v2

On first run, the sidecar downloads the model (~90 MB). Wait for the log line:

bash

# Verify the sidecar is healthy
curl http://localhost:9090/health
# {"status":"ok","version":"0.7.0","subsystems":{"disk_full":false,"sidecar_healthy":true,"connections_active":0}}

Tip

sidecar_healthy: true confirms embeddings are active. If it shows false, the model is still downloading or the sidecar path is wrong.

2. Create a Table

Connect with psql and create a table with an embedding column:

bash

psql "host=localhost port=5433 dbname=galaxdb sslmode=disable"

SQL

-- Create table with embedding column
-- The EMBEDDING MODEL clause tells GalaxDB to automatically
-- compute 384-dimensional embeddings for the 'body' column
CREATE TABLE docs (
    id   INT PRIMARY KEY,
    body TEXT EMBEDDING MODEL 'sentence-transformers/all-MiniLM-L6-v2' DIM 384
);

3. Insert Data

Insert rows - embeddings are computed automatically by the sidecar. No extra code needed.

SQL

INSERT INTO docs (id, body) VALUES (1, 'machine learning and neural networks');
INSERT INTO docs (id, body) VALUES (2, 'cooking recipes italian pasta');
INSERT INTO docs (id, body) VALUES (3, 'rust programming language systems');
INSERT INTO docs (id, body) VALUES (4, 'deep learning transformers attention');
INSERT INTO docs (id, body) VALUES (5, 'database storage engine LSM tree');

For bulk inserts, use BULK INSERT which is faster for many rows:

SQL

BULK INSERT INTO docs (id, body) VALUES
  (6, 'natural language processing BERT'),
  (7, 'computer vision image classification'),
  (8, 'reinforcement learning reward policy');

4. Semantic Search

Use SEMANTIC_MATCH to find rows semantically similar to a query string. The threshold (0.0–1.0) is cosine similarity - higher means stricter matching.

SQL

-- Find AI/ML related documents
SELECT id, body
FROM docs
WHERE SEMANTIC_MATCH(body, 'AI deep learning', 0.4);

-- Expected results: rows 1, 4, 6, 7 (AI/ML topics)
-- Row 2 (cooking) and row 3 (Rust) won't match

-- Combine with SQL filters (hybrid search)
SELECT id, body
FROM docs
WHERE SEMANTIC_MATCH(body, 'machine learning', 0.5)
  AND id > 2;

Note

Threshold guide: 0.8+ = near-duplicates only, 0.5–0.8 = clearly related, 0.3–0.5 = loosely related, 0.0 = all rows ranked by similarity.

5. Time-Travel Query

Create a named snapshot, insert more data, then query the snapshot to see only the data that existed at snapshot time.

SQL

-- Create a named snapshot (also exports a Lance training dataset)
CREATE VERSION TAG 'v1' FOR TRAINING WITH TRAINING PRECISION 'float32';

-- Insert more data after the snapshot
INSERT INTO docs (id, body) VALUES (9, 'new document added after snapshot');

-- Query the snapshot - only sees rows 1-8, not row 9
SELECT * FROM docs AT VERSION 'v1';

-- Current table has all 9 rows
SELECT COUNT(*) FROM docs;  -- returns 9

Python Example

The same workflow using the Python client in embedded mode (no server required):

Python

import galaxdb

# Open (or create) a database
db = galaxdb.Database("./quickstart-data")

# Create table with embedding column
db.execute("""
    CREATE TABLE docs (
        id   INT PRIMARY KEY,
        body TEXT EMBEDDING MODEL 'sentence-transformers/all-MiniLM-L6-v2' DIM 384
    )
""")

# Insert rows - embeddings computed automatically
db.execute("INSERT INTO docs (id, body) VALUES (1, 'machine learning and neural networks')")
db.execute("INSERT INTO docs (id, body) VALUES (2, 'cooking recipes italian pasta')")
db.execute("INSERT INTO docs (id, body) VALUES (3, 'deep learning transformers attention')")

# Semantic search - returns list of dicts
results = db.execute("SELECT id, body FROM docs WHERE SEMANTIC_MATCH(body, 'AI deep learning', 0.4)")
for row in results:
    print(f"[{row['id']}] {row['body']}")

# Create training snapshot
ts = db.create_training_snapshot('v1', seed=42)
print(f"Snapshot timestamp: {ts}")

# Export as Lance dataset for PyTorch
path = db.training_dataset('v1')
print(f"Lance dataset at: {path}")

# Load into PyTorch (requires lance and torch)
# import lance, torch
# dataset = lance.dataset(path).to_pytorch()
# loader = torch.utils.data.DataLoader(dataset, batch_size=32)

Next Steps

Storage Engine - understand the LSM-tree, PAX blocks, and WAL
Vector Search - HNSW configuration and recall tuning
AuroraSQL Reference - complete SQL dialect documentation
Python Client - full API reference for embedded and server modes
Server Configuration - CLI flags, environment variables, and TLS
Benchmarks - real performance numbers from AWS c6id.4xlarge
RAG & Vector Indexing - semantic cache, DiskANN, and reranking
Serializable Isolation - write-skew prevention

Windows

Overview