G

Version Tags

Version tags are named snapshots of the database at a specific point in time. They enable time-travel queries (AT VERSION) and training data export (FOR TRAINING).

CREATE VERSION TAG

SQL
CREATE VERSION TAG 'tag_name'
  [FOR TRAINING
   [WITH TRAINING PRECISION 'float32' | 'sq8' | 'rabitq']
   [TRAINING SEED n]];

Creates an immutable snapshot of the current database state. The tag name must be unique. Tags are lightweight — they reference existing data blocks rather than copying them.

SQL
-- Simple snapshot
CREATE VERSION TAG 'before-migration';

-- Training snapshot with default precision (float32)
CREATE VERSION TAG 'train-v1' FOR TRAINING;

-- Training snapshot with quantization
CREATE VERSION TAG 'train-v1-sq8'
  FOR TRAINING
  WITH TRAINING PRECISION 'sq8';

-- Training snapshot with seed for reproducibility
CREATE VERSION TAG 'experiment-42'
  FOR TRAINING
  WITH TRAINING PRECISION 'float32'
  TRAINING SEED 42;

FOR TRAINING Options

The FOR TRAINING clause exports the snapshot as a Lance dataset, accessible via db.training_dataset('tag') in Python.

OptionValuesDescription
TRAINING PRECISIONfloat32 (default), sq8, rabitqEmbedding vector precision in Lance dataset
TRAINING SEEDuint64Random seed for reproducible dataset shuffling

AT VERSION

SQL
SELECT ... FROM table AT VERSION 'tag_name';
SELECT ... FROM table AT VERSION timestamp_uint64;

Queries the table as it existed when the tag was created. The timestamp form accepts a uint64 Unix timestamp in microseconds.

SQL
-- Query by tag name
SELECT * FROM docs AT VERSION 'train-v1';

-- Query by timestamp
SELECT * FROM docs AT VERSION 1715385600000000;

-- Combine with WHERE
SELECT id, body
FROM docs AT VERSION 'train-v1'
WHERE SEMANTIC_MATCH(body, 'machine learning', 0.4);

Note

AT VERSION queries are read-only. INSERT, UPDATE, and DELETE against a historical snapshot are not supported.

Examples

Training pipeline

SQL
-- Insert training data
BULK INSERT INTO training_data (id, text, label) VALUES
  (1, 'positive example', 1),
  (2, 'negative example', 0),
  (3, 'another positive', 1);

-- Create training snapshot
CREATE VERSION TAG 'train-2024-01'
  FOR TRAINING
  WITH TRAINING PRECISION 'float32'
  TRAINING SEED 12345;

-- Add more data later
INSERT INTO training_data (id, text, label) VALUES (4, 'new data', 1);

-- The snapshot still has only 3 rows
SELECT COUNT(*) FROM training_data AT VERSION 'train-2024-01';  -- 3
SELECT COUNT(*) FROM training_data;  -- 4

Python training workflow

Python
import galaxdb
import lance
import torch

db = galaxdb.Database("./data")

# Create snapshot
ts = db.create_training_snapshot('train-v1', seed=42)
print(f"Snapshot at: {ts}")

# Export Lance dataset
path = db.training_dataset('train-v1')

# Load into PyTorch
dataset = lance.dataset(path).to_pytorch()
loader = torch.utils.data.DataLoader(dataset, batch_size=32)

for batch in loader:
    embeddings = batch['text']  # float32 tensors
    labels = batch['label']
    # ... training step