Version Tags
Version tags are named snapshots of the database at a specific point in time. They enable time-travel queries (AT VERSION) and training data export (FOR TRAINING).
CREATE VERSION TAG
SQL
CREATE VERSION TAG 'tag_name'
[FOR TRAINING
[WITH TRAINING PRECISION 'float32' | 'sq8' | 'rabitq']
[TRAINING SEED n]];Creates an immutable snapshot of the current database state. The tag name must be unique. Tags are lightweight — they reference existing data blocks rather than copying them.
SQL
-- Simple snapshot
CREATE VERSION TAG 'before-migration';
-- Training snapshot with default precision (float32)
CREATE VERSION TAG 'train-v1' FOR TRAINING;
-- Training snapshot with quantization
CREATE VERSION TAG 'train-v1-sq8'
FOR TRAINING
WITH TRAINING PRECISION 'sq8';
-- Training snapshot with seed for reproducibility
CREATE VERSION TAG 'experiment-42'
FOR TRAINING
WITH TRAINING PRECISION 'float32'
TRAINING SEED 42;FOR TRAINING Options
The FOR TRAINING clause exports the snapshot as a Lance dataset, accessible via db.training_dataset('tag') in Python.
| Option | Values | Description |
|---|---|---|
| TRAINING PRECISION | float32 (default), sq8, rabitq | Embedding vector precision in Lance dataset |
| TRAINING SEED | uint64 | Random seed for reproducible dataset shuffling |
AT VERSION
SQL
SELECT ... FROM table AT VERSION 'tag_name';
SELECT ... FROM table AT VERSION timestamp_uint64;Queries the table as it existed when the tag was created. The timestamp form accepts a uint64 Unix timestamp in microseconds.
SQL
-- Query by tag name
SELECT * FROM docs AT VERSION 'train-v1';
-- Query by timestamp
SELECT * FROM docs AT VERSION 1715385600000000;
-- Combine with WHERE
SELECT id, body
FROM docs AT VERSION 'train-v1'
WHERE SEMANTIC_MATCH(body, 'machine learning', 0.4);Note
AT VERSION queries are read-only. INSERT, UPDATE, and DELETE against a historical snapshot are not supported.
Examples
Training pipeline
SQL
-- Insert training data
BULK INSERT INTO training_data (id, text, label) VALUES
(1, 'positive example', 1),
(2, 'negative example', 0),
(3, 'another positive', 1);
-- Create training snapshot
CREATE VERSION TAG 'train-2024-01'
FOR TRAINING
WITH TRAINING PRECISION 'float32'
TRAINING SEED 12345;
-- Add more data later
INSERT INTO training_data (id, text, label) VALUES (4, 'new data', 1);
-- The snapshot still has only 3 rows
SELECT COUNT(*) FROM training_data AT VERSION 'train-2024-01'; -- 3
SELECT COUNT(*) FROM training_data; -- 4Python training workflow
Python
import galaxdb
import lance
import torch
db = galaxdb.Database("./data")
# Create snapshot
ts = db.create_training_snapshot('train-v1', seed=42)
print(f"Snapshot at: {ts}")
# Export Lance dataset
path = db.training_dataset('train-v1')
# Load into PyTorch
dataset = lance.dataset(path).to_pytorch()
loader = torch.utils.data.DataLoader(dataset, batch_size=32)
for batch in loader:
embeddings = batch['text'] # float32 tensors
labels = batch['label']
# ... training step