SELECT & SEMANTIC_MATCH
GalaxDB supports standard SQL SELECT with WHERE, ORDER BY, LIMIT, and OFFSET, plus AuroraSQL extensions: SEMANTIC_MATCH for vector search, WHERE NOT DUPLICATE for near-dedup, and AT VERSION for time-travel.
SELECT Syntax
SQL
SELECT [DISTINCT] col1, col2, ... | *
FROM table_name [AT VERSION tag_or_timestamp]
[WHERE condition]
[ORDER BY col [ASC | DESC]]
[LIMIT n]
[OFFSET n];SEMANTIC_MATCH
SEMANTIC_MATCH(column, query_text, threshold) finds rows where the cosine similarity between the column's embedding and the query embedding is greater than or equal to the threshold.
SQL
-- Syntax
SEMANTIC_MATCH(column_name, 'query text', threshold)
-- threshold: cosine similarity 0.0-1.0
-- Returns rows where similarity >= thresholdSQL
-- Basic semantic search
SELECT id, body
FROM docs
WHERE SEMANTIC_MATCH(body, 'artificial intelligence deep learning', 0.4);
-- Stricter threshold (near-duplicates only)
SELECT id, body
FROM docs
WHERE SEMANTIC_MATCH(body, 'machine learning', 0.8);
-- Broader search (loosely related)
SELECT id, body
FROM docs
WHERE SEMANTIC_MATCH(body, 'technology', 0.2);Note
SEMANTIC_MATCH requires the server to be started with
--sidecar and --model. The query text is embedded using the same model as the column.Hybrid Search
Combine SEMANTIC_MATCH with standard SQL predicates:
SQL
-- Semantic search + metadata filter
SELECT id, title, body, created_at
FROM articles
WHERE SEMANTIC_MATCH(body, 'machine learning', 0.5)
AND category = 'technology'
AND created_at > 1700000000
ORDER BY id
LIMIT 20;
-- Multiple semantic conditions
SELECT id, body
FROM docs
WHERE SEMANTIC_MATCH(body, 'database performance', 0.4)
OR SEMANTIC_MATCH(body, 'storage engine', 0.4);WHERE NOT DUPLICATE
Filter near-duplicate rows using MinHash LSH. Returns one representative row per near-duplicate cluster.
SQL
-- Select unique rows only
SELECT * FROM docs WHERE NOT DUPLICATE;
-- Combine with semantic search
SELECT id, body
FROM docs
WHERE SEMANTIC_MATCH(body, 'machine learning', 0.4)
AND NOT DUPLICATE;
-- Count unique vs total
SELECT COUNT(*) FROM docs; -- total
SELECT COUNT(*) FROM docs WHERE NOT DUPLICATE; -- uniqueAT VERSION
Query data as it existed at a named snapshot or timestamp:
SQL
-- Query by tag name
SELECT * FROM docs AT VERSION 'train-v1';
-- Query by timestamp (uint64 microseconds since epoch)
SELECT * FROM docs AT VERSION 1715385600000000;
-- Combine with WHERE
SELECT id, body
FROM docs AT VERSION 'train-v1'
WHERE SEMANTIC_MATCH(body, 'machine learning', 0.4);
-- Count rows at a specific version
SELECT COUNT(*) FROM docs AT VERSION 'train-v1';Examples
Standard SQL queries
SQL
-- Filter and sort
SELECT id, name, age
FROM users
WHERE age > 25
ORDER BY name ASC
LIMIT 10;
-- Aggregates
SELECT COUNT(*), AVG(age), MIN(age), MAX(age)
FROM users;
-- DISTINCT
SELECT DISTINCT category FROM products;Python client SELECT
Python
import galaxdb
db = galaxdb.Database("./data")
# SELECT returns list of dicts
rows = db.execute("SELECT id, body FROM docs WHERE SEMANTIC_MATCH(body, 'AI', 0.4)")
for row in rows:
print(f"[{row['id']}] {row['body']}")
# Empty result
rows = db.execute("SELECT * FROM docs WHERE id = 999")
print(rows) # []
# Aggregate
result = db.execute("SELECT COUNT(*) FROM docs")
print(result[0]['count']) # int