Time-Travel Queries
GalaxDB supports time-travel queries — the ability to query data as it existed at a specific point in time. This is implemented via named version snapshots that you create with CREATE VERSION TAG and query with AT VERSION.
Overview
Every CREATE VERSION TAG creates an immutable snapshot of the database at that moment. Snapshots are lightweight — they reference existing data blocks rather than copying them. You can create as many snapshots as you need without significant storage overhead.
Snapshots can also be queried by raw timestamp (a uint64 Unix timestamp in microseconds), which lets you query any point in time even without a named tag.
CREATE VERSION TAG
-- Simple snapshot
CREATE VERSION TAG 'before-migration';
-- Snapshot for training export (also exports Lance dataset)
CREATE VERSION TAG 'train-v1'
FOR TRAINING
WITH TRAINING PRECISION 'float32'
TRAINING SEED 42;
-- Snapshot with sq8 quantization (4× smaller than float32)
CREATE VERSION TAG 'train-v1-compressed'
FOR TRAINING
WITH TRAINING PRECISION 'sq8';Training precision options:
- float32 — full precision (default)
- sq8 — 8-bit scalar quantization (4× smaller)
- rabitq — RaBitQ binary quantization (32× smaller)
The TRAINING SEED sets the random seed for reproducible dataset shuffling.
AT VERSION
-- Query by tag name
SELECT * FROM docs AT VERSION 'before-migration';
-- Query by timestamp (uint64 microseconds since epoch)
SELECT * FROM docs AT VERSION 1715385600000000;
-- Combine with WHERE and other clauses
SELECT id, body
FROM docs AT VERSION 'train-v1'
WHERE SEMANTIC_MATCH(body, 'machine learning', 0.4);
-- Count rows at a specific version
SELECT COUNT(*) FROM docs AT VERSION 'train-v1';Note
AT VERSION queries are read-only. You cannot INSERT, UPDATE, or DELETE against a historical snapshot.Use Cases
Reproducible ML Training
Create a version tag before each training run. If you need to reproduce a training run months later, query the exact same data:
-- Before training run
CREATE VERSION TAG 'experiment-42'
FOR TRAINING
WITH TRAINING PRECISION 'float32'
TRAINING SEED 42;
-- Six months later, reproduce the exact dataset
SELECT * FROM training_data AT VERSION 'experiment-42';Audit Trails
Create snapshots before and after data migrations to verify correctness:
CREATE VERSION TAG 'pre-migration';
-- Run migration
UPDATE users SET email = LOWER(email);
CREATE VERSION TAG 'post-migration';
-- Verify: count should be the same
SELECT COUNT(*) FROM users AT VERSION 'pre-migration';
SELECT COUNT(*) FROM users AT VERSION 'post-migration';EU AI Act Compliance
The EU AI Act requires documentation of training data used for high-risk AI systems. Version tags provide an immutable, queryable record of exactly what data was used for each training run.
A/B Testing
Compare model performance on different dataset versions:
import galaxdb
db = galaxdb.Database("./data")
# Export two dataset versions for comparison
path_v1 = db.training_dataset('train-v1')
path_v2 = db.training_dataset('train-v2')
# Train models on each and compare metrics