SELECT & SEMANTIC_MATCH

GalaxDB supports standard SQL SELECT with WHERE, ORDER BY, LIMIT, and OFFSET, plus AuroraSQL extensions: SEMANTIC_MATCH for vector search, WHERE NOT DUPLICATE for near-dedup, and AT VERSION for time-travel.

SELECT Syntax

SQL

SELECT [DISTINCT] col1, col2, ... | *
FROM table_name
[WHERE condition]
[ORDER BY col [ASC | DESC]]
[LIMIT n]
[OFFSET n]
[AT VERSION tag_or_timestamp];

Warning

AT VERSION must be the final clause of the statement. Putting WHERE after AT VERSION is a typed parse error.

SEMANTIC_MATCH

SEMANTIC_MATCH(column, query_text, threshold) finds rows where the cosine similarity between the column's embedding and the query embedding is greater than or equal to the threshold.

SQL

-- Syntax
SEMANTIC_MATCH(column_name, 'query text', threshold)
-- threshold: cosine similarity 0.0-1.0
-- Returns rows where similarity >= threshold

SQL

-- Basic semantic search
SELECT id, body
FROM docs
WHERE SEMANTIC_MATCH(body, 'artificial intelligence deep learning', 0.4);

-- Stricter threshold (near-duplicates only)
SELECT id, body
FROM docs
WHERE SEMANTIC_MATCH(body, 'machine learning', 0.8);

-- Broader search (loosely related)
SELECT id, body
FROM docs
WHERE SEMANTIC_MATCH(body, 'technology', 0.2);

Note

SEMANTIC_MATCH requires the server to be started with --sidecar and --model. The query text is embedded using the same model as the column.

Hybrid Search

Combine SEMANTIC_MATCH with standard SQL predicates:

SQL

-- Semantic search + metadata filter
SELECT id, title, body, created_at
FROM articles
WHERE SEMANTIC_MATCH(body, 'machine learning', 0.5)
  AND category = 'technology'
  AND created_at > 1700000000
ORDER BY id
LIMIT 20;

-- Multiple semantic conditions
SELECT id, body
FROM docs
WHERE SEMANTIC_MATCH(body, 'database performance', 0.4)
   OR SEMANTIC_MATCH(body, 'storage engine', 0.4);

WHERE NOT DUPLICATE

Filter near-duplicate rows using MinHash LSH. Returns one representative row per near-duplicate cluster.

SQL

-- Select unique rows only
SELECT * FROM docs WHERE NOT DUPLICATE;

-- Combine with semantic search
SELECT id, body
FROM docs
WHERE SEMANTIC_MATCH(body, 'machine learning', 0.4)
  AND NOT DUPLICATE;

-- Count unique vs total
SELECT COUNT(*) FROM docs;                    -- total
SELECT COUNT(*) FROM docs WHERE NOT DUPLICATE; -- unique

AT VERSION

Query data as it existed at a named snapshot or timestamp:

SQL

-- Query by tag name
SELECT * FROM docs AT VERSION 'train-v1';

-- Query by timestamp (uint64 microseconds since epoch)
SELECT * FROM docs AT VERSION 1715385600000000;

-- Combine with WHERE - AT VERSION is always the final clause
SELECT id, body
FROM docs
WHERE SEMANTIC_MATCH(body, 'machine learning', 0.4)
AT VERSION 'train-v1';

-- Count rows at a specific version
SELECT COUNT(*) FROM docs AT VERSION 'train-v1';

CONSISTENCY 'SEMANTIC_SNAPSHOT'

Restrict a semantic search to rows visible at the snapshot version. Rows inserted after the snapshot are excluded; the live index is not mutated.

SQL

-- Exact historical vector search
SELECT id, body
FROM docs
WHERE SEMANTIC_MATCH(body, 'machine learning', 0.4)
AT VERSION 'train-v1' CONSISTENCY 'SEMANTIC_SNAPSHOT';

Works with any AT VERSION query that includes SEMANTIC_MATCH. Use this for reproducible RAG audits - verifying what results a query would have returned at a specific point in time.

Examples

Standard SQL queries

SQL

-- Filter and sort
SELECT id, name, age
FROM users
WHERE age > 25
ORDER BY name ASC
LIMIT 10;

-- Aggregates
SELECT COUNT(*), AVG(age), MIN(age), MAX(age)
FROM users;

-- DISTINCT
SELECT DISTINCT category FROM products;

Python client SELECT

Python

import galaxdb

db = galaxdb.Database("./data")

# SELECT returns list of dicts
rows = db.execute("SELECT id, body FROM docs WHERE SEMANTIC_MATCH(body, 'AI', 0.4)")
for row in rows:
    print(f"[{row['id']}] {row['body']}")

# Empty result
rows = db.execute("SELECT * FROM docs WHERE id = 999")
print(rows)  # []

# Aggregate
result = db.execute("SELECT COUNT(*) FROM docs")
print(result[0]['count'])  # int

INSERT

UPDATE & DELETE