G

Time-Travel Queries

GalaxDB supports time-travel queries — the ability to query data as it existed at a specific point in time. This is implemented via named version snapshots that you create with CREATE VERSION TAG and query with AT VERSION.

Overview

Every CREATE VERSION TAG creates an immutable snapshot of the database at that moment. Snapshots are lightweight — they reference existing data blocks rather than copying them. You can create as many snapshots as you need without significant storage overhead.

Snapshots can also be queried by raw timestamp (a uint64 Unix timestamp in microseconds), which lets you query any point in time even without a named tag.

CREATE VERSION TAG

SQL
-- Simple snapshot
CREATE VERSION TAG 'before-migration';

-- Snapshot for training export (also exports Lance dataset)
CREATE VERSION TAG 'train-v1'
  FOR TRAINING
  WITH TRAINING PRECISION 'float32'
  TRAINING SEED 42;

-- Snapshot with sq8 quantization (4× smaller than float32)
CREATE VERSION TAG 'train-v1-compressed'
  FOR TRAINING
  WITH TRAINING PRECISION 'sq8';

Training precision options:

  • float32 — full precision (default)
  • sq8 — 8-bit scalar quantization (4× smaller)
  • rabitq — RaBitQ binary quantization (32× smaller)

The TRAINING SEED sets the random seed for reproducible dataset shuffling.

AT VERSION

SQL
-- Query by tag name
SELECT * FROM docs AT VERSION 'before-migration';

-- Query by timestamp (uint64 microseconds since epoch)
SELECT * FROM docs AT VERSION 1715385600000000;

-- Combine with WHERE and other clauses
SELECT id, body
FROM docs AT VERSION 'train-v1'
WHERE SEMANTIC_MATCH(body, 'machine learning', 0.4);

-- Count rows at a specific version
SELECT COUNT(*) FROM docs AT VERSION 'train-v1';

Note

AT VERSION queries are read-only. You cannot INSERT, UPDATE, or DELETE against a historical snapshot.

Use Cases

Reproducible ML Training

Create a version tag before each training run. If you need to reproduce a training run months later, query the exact same data:

SQL
-- Before training run
CREATE VERSION TAG 'experiment-42'
  FOR TRAINING
  WITH TRAINING PRECISION 'float32'
  TRAINING SEED 42;

-- Six months later, reproduce the exact dataset
SELECT * FROM training_data AT VERSION 'experiment-42';

Audit Trails

Create snapshots before and after data migrations to verify correctness:

SQL
CREATE VERSION TAG 'pre-migration';

-- Run migration
UPDATE users SET email = LOWER(email);

CREATE VERSION TAG 'post-migration';

-- Verify: count should be the same
SELECT COUNT(*) FROM users AT VERSION 'pre-migration';
SELECT COUNT(*) FROM users AT VERSION 'post-migration';

EU AI Act Compliance

The EU AI Act requires documentation of training data used for high-risk AI systems. Version tags provide an immutable, queryable record of exactly what data was used for each training run.

A/B Testing

Compare model performance on different dataset versions:

Python
import galaxdb

db = galaxdb.Database("./data")

# Export two dataset versions for comparison
path_v1 = db.training_dataset('train-v1')
path_v2 = db.training_dataset('train-v2')

# Train models on each and compare metrics