A browser-based Lance file reader with SQL and vector search support. Query Lance datasets directly in your browser using HTTP Range requests - no server required.
@logic_table: Business Logic That Runs WITH Your Data - Compile Python to native GPU kernels. Eliminates Python interpreter overhead for UDF-style workflows.
Live Demo: https://teamchong.github.io/lanceql
Documentation:
- SQL Reference - Complete SQL dialect reference
- Vector Search Guide - Semantic search with NEAR clause
- @logic_table - Compile Python to native GPU kernels
| Use Case | LanceQL |
|---|---|
| Query columnar files in browser | Direct SQL on Lance/Parquet, no backend needed |
| Semantic search on embeddings | Built-in vector search with IVF-PQ indices |
| Python UDFs that are too slow | @logic_table compiles Python to native code |
| Time-travel queries | Query any historical version |
| Query remote data efficiently | HTTP Range requests - only fetch what you need |
- Vector Search - Semantic search with
NEARclause using MiniLM/CLIP embeddings - Time Travel - Query historical versions with
read_lance(url, version)
- SQL -
SELECT,WHERE,ORDER BY,LIMIT, aggregations (COUNT,SUM,AVG,MIN,MAX) - HTTP Range Requests - Only fetch the bytes you need, not the entire file
- Local + Remote - Drag & drop local files or load from URL
- DataFrame API -
dataset.df().filter(...).select(...).limit(50)
- Full SQL -
ORDER BY,GROUP BY,DISTINCT, all aggregations - Data Types - int32/64, float32/64, bool, string, timestamp (s/ms/us/ns), date32/64
- Parameterized Queries - Bind values with
?placeholders - Drop-in APIs - better-sqlite3 (Node.js), pyarrow.parquet (Python)
npm install lanceql # WIP - not yet publishednpm install @lanceql/node # WIP - not yet publishedDrop-in replacement for better-sqlite3 with Lance columnar files:
// Instead of: const Database = require('better-sqlite3');
const Database = require('@lanceql/node');
const db = new Database('dataset.lance');
const rows = db.prepare('SELECT * FROM data WHERE id > ?').all(100);pip install metal0-lanceql # WIP - not yet publishedDrop-in replacement for pyarrow.parquet with Lance columnar files:
# Instead of: import pyarrow.parquet as pq
import metal0.lanceql as pq
table = pq.read_table('dataset.lance')
df = table.to_pandas()cd examples/wasm
python -m http.server 3000
# Open http://localhost:3000Default dataset: 1M LAION images with text embeddings at https://data.metal0.dev/laion-1m/images.lance
See SQL Reference for complete documentation.
-- Local uploaded file
SELECT * FROM read_lance(FILE) LIMIT 50
SELECT * FROM read_lance(FILE, 24) LIMIT 50 -- with version
-- Remote URL
SELECT * FROM read_lance('https://data.metal0.dev/laion-1m/images.lance') LIMIT 50
SELECT * FROM read_lance('https://...', 24) LIMIT 50 -- with version
-- Filter
SELECT url, text, aesthetic FROM read_lance(FILE)
WHERE aesthetic > 0.5
LIMIT 100
-- Aggregations
SELECT COUNT(*), AVG(aesthetic), MAX(aesthetic) FROM read_lance(FILE)
-- Vector search (see Vector Search Guide for more)
SELECT * FROM read_lance(FILE) NEAR 'sunset beach' TOPK 20
SELECT * FROM read_lance(FILE) NEAR embedding 'cat' TOPK 50
SELECT * FROM read_lance(FILE) WHERE aesthetic > 0.5 NEAR 'beach' TOPK 30See Vector Search Guide for IVF-PQ indices, encoders, and performance tuning.
import lanceql
dataset = lanceql.open("https://data.metal0.dev/laion-1m/images.lance")
# Vector search by text
result = (
dataset.df()
.search("cat playing", encoder="minilm", top_k=20)
.select(["url", "text"])
.collect()
)
# Vector search by row
result = (
dataset.df()
.search_by_row(0, column="embedding", top_k=10)
.collect()
)
# Filter numeric columns
result = (
dataset.df()
.filter("aesthetic", ">", 0.6)
.limit(50)
.collect()
)
# Time travel
dataset = lanceql.open("https://...", version=24)src/
├── lanceql.zig # Zig WASM module for Lance parsing
├── format/ # Lance file format (footer, columns)
├── proto/ # Protobuf decoder for manifests
├── io/ # VFS abstraction (file, memory, HTTP)
└── encoding/
├── plain.zig # Lance column decoders
└── parquet/ # Parquet file reader
├── page.zig # Page decoder (PLAIN, RLE_DICTIONARY)
├── snappy.zig # Snappy decompression (pure Zig, SIMD)
└── thrift.zig # TCompactProtocol decoder
examples/wasm/
├── index.html # Demo UI
└── lanceql.js # JS wrapper, SQL parser, vector search
LanceQL uses an Immer-style Proxy pattern for WASM interop:
// Traditional WASM interop - verbose, error-prone
const ptr = wasm.alloc(str.length);
const mem = new Uint8Array(wasm.memory.buffer);
mem.set(encoder.encode(str), ptr);
const result = wasm.someFunc(ptr, str.length);
wasm.free(ptr);
// Immer-style - auto marshalling via Proxy (like metal0)
const lanceql = await LanceQL.load('./lanceql.wasm');
lanceql.someFunc("hello"); // strings auto-copied to WASM memory
lanceql.parseData(bytes); // Uint8Array auto-copied too
lanceql.raw.someFunc(ptr, len); // raw access when needed
lanceql.memory; // WASM memoryHow it works:
// ~30 lines of runtime code handles all marshalling
const proxy = new Proxy({}, {
get(_, name) {
if (typeof wasm[name] === 'function') {
return (...args) => wasm[name](...args.flatMap(marshal));
}
return wasm[name];
}
});
// Marshal function - auto-converts strings and Uint8Array
const marshal = arg => {
if (arg instanceof Uint8Array) {
// Copy bytes to WASM memory, return [ptr, len]
buffer.set(arg); return [ptr, arg.length];
}
if (typeof arg === 'string') {
// Encode string to WASM memory, return [ptr, len]
const bytes = encoder.encode(arg);
buffer.set(bytes); return [ptr, bytes.length];
}
return [arg]; // Numbers pass through
};Benefits:
- Zero boilerplate - No manual
alloc/free/copy for each call - Auto marshalling - Strings and Uint8Array automatically copied to WASM memory
- Tiny runtime - ~30 lines, no dependencies
- JS debugging - All logic stays in JS where DevTools works
Requires Zig 0.13.0+
# Build WASM module
zig build wasm
# Output: zig-out/bin/lanceql.wasm
# Copy to demo
cp zig-out/bin/lanceql.wasm examples/wasm/
# Run tests
zig build test<script type="module">
import { LanceQL } from './lanceql.js';
const lanceql = await LanceQL.load('./lanceql.wasm');
// Open remote dataset
const dataset = await lanceql.openDataset('https://data.metal0.dev/laion-1m/images.lance');
// Query
const strings = await dataset.readStrings(0, 50); // First 50 rows of column 0
</script>For TypeScript, the lanceql.d.ts file provides type definitions:
import { LanceQL } from './lanceql.js';
// Types are automatically picked up from lanceql.d.ts- Zero-copy Arrow - Direct memory sharing via Arrow C Data Interface
- Metal GPU (Apple Silicon) - Zero-copy unified memory, auto-switch at 100K+ vectors
- Batch cosine similarity: 14.3M vectors/sec (1M × 384-dim)
- Shaders compile at runtime - no Xcode required
- Accelerate vDSP (macOS) - SIMD-optimized for small batches and single-vector ops
- Dot product: 42 ns/op (384-dim)
- Cosine similarity: 106 ns/op (384-dim)
- Comptime SIMD - 32-byte vectors, bit-width specialization (1-20 bits)
- IndexedDB Cache - Schema and column types cached for repeat visits
- Sidecar Manifest - Optional
.meta.jsonfor faster startup - Fragment Prefetching - Parallel metadata loading on dataset open
- Speculative Prefetch - Next page loaded in background
| Scale | Path | Throughput |
|---|---|---|
| 10K vectors | CPU (Accelerate) | 13.9M vec/s |
| 100K vectors | GPU (Metal) | 13.5M vec/s |
| 1M vectors | GPU (Metal) | 14.3M vec/s |
Apple Silicon only - Intel Macs use CPU path (Accelerate still fast)
const builtin = @import("builtin");
const is_macos = builtin.os.tag == .macos;
const is_apple_silicon = is_macos and builtin.cpu.arch == .aarch64;
// Auto-switch: GPU at 100K+ vectors on Apple SiliconRun vector benchmark: zig build bench-vector
Generate sidecar manifest:
cd fixtures
# Local dataset
python generate_sidecar.py /path/to/dataset.lance
# Remote S3/R2 dataset (requires aws profile 'r2')
python generate_sidecar.py s3://bucket/dataset.lance ./meta.json
aws s3 cp ./meta.json s3://bucket/dataset.lance/.meta.json --profile r2 --endpoint-url https://...| Format | Function | Features |
|---|---|---|
| Lance | read_lance() |
v2.0/v2.1, IVF-PQ indices, time travel, deletion vectors |
| Parquet | read_parquet() |
Pure Zig, Snappy, RLE/PLAIN/DICTIONARY encoding |
| Delta Lake | read_delta() |
Parquet + transaction log |
| Iceberg | read_iceberg() |
Parquet + metadata layer |
| Arrow IPC | read_arrow() |
.arrow, .arrows, .feather files |
| Avro | read_avro() |
Deflate/Snappy compression |
| ORC | read_orc() |
Snappy compression |
| Excel | read_xlsx() |
Multi-sheet support |
Supported Data Types: int32/64, float32/64, bool, string, timestamp[s/ms/us/ns], date32/64
See SQL Reference for complete data source documentation.
Apache-2.0 (same as Lance)