LanceQL

A browser-based Lance file reader with SQL and vector search support. Query Lance datasets directly in your browser using HTTP Range requests - no server required.

@logic_table: Business Logic That Runs WITH Your Data - Compile Python to native GPU kernels. Eliminates Python interpreter overhead for UDF-style workflows.

Live Demo: https://teamchong.github.io/lanceql

Documentation:

SQL Reference - Complete SQL dialect reference
Vector Search Guide - Semantic search with NEAR clause
@logic_table - Compile Python to native GPU kernels

When to Use LanceQL

Use Case	LanceQL
Query columnar files in browser	Direct SQL on Lance/Parquet, no backend needed
Semantic search on embeddings	Built-in vector search with IVF-PQ indices
Python UDFs that are too slow	@logic_table compiles Python to native code
Time-travel queries	Query any historical version
Query remote data efficiently	HTTP Range requests - only fetch what you need

Features

All Platforms

Vector Search - Semantic search with NEAR clause using MiniLM/CLIP embeddings
Time Travel - Query historical versions with read_lance(url, version)

Browser (WASM + JavaScript)

SQL - SELECT, WHERE, ORDER BY, LIMIT, aggregations (COUNT, SUM, AVG, MIN, MAX)
HTTP Range Requests - Only fetch the bytes you need, not the entire file
Local + Remote - Drag & drop local files or load from URL
DataFrame API - dataset.df().filter(...).select(...).limit(50)

Node.js/Python (Native)

Full SQL - ORDER BY, GROUP BY, DISTINCT, all aggregations
Data Types - int32/64, float32/64, bool, string, timestamp (s/ms/us/ns), date32/64
Parameterized Queries - Bind values with ? placeholders
Drop-in APIs - better-sqlite3 (Node.js), pyarrow.parquet (Python)

Installation

Browser (WASM)

npm install lanceql  # WIP - not yet published

Node.js (Native) - WIP

npm install @lanceql/node  # WIP - not yet published

Drop-in replacement for better-sqlite3 with Lance columnar files:

// Instead of: const Database = require('better-sqlite3');
const Database = require('@lanceql/node');

const db = new Database('dataset.lance');
const rows = db.prepare('SELECT * FROM data WHERE id > ?').all(100);

Python - WIP

pip install metal0-lanceql  # WIP - not yet published

Drop-in replacement for pyarrow.parquet with Lance columnar files:

# Instead of: import pyarrow.parquet as pq
import metal0.lanceql as pq

table = pq.read_table('dataset.lance')
df = table.to_pandas()

Quick Start

Browser Demo

cd examples/wasm
python -m http.server 3000
# Open http://localhost:3000

Default dataset: 1M LAION images with text embeddings at https://data.metal0.dev/laion-1m/images.lance

SQL Examples

See SQL Reference for complete documentation.

-- Local uploaded file
SELECT * FROM read_lance(FILE) LIMIT 50
SELECT * FROM read_lance(FILE, 24) LIMIT 50  -- with version

-- Remote URL
SELECT * FROM read_lance('https://data.metal0.dev/laion-1m/images.lance') LIMIT 50
SELECT * FROM read_lance('https://...', 24) LIMIT 50  -- with version

-- Filter
SELECT url, text, aesthetic FROM read_lance(FILE)
WHERE aesthetic > 0.5
LIMIT 100

-- Aggregations
SELECT COUNT(*), AVG(aesthetic), MAX(aesthetic) FROM read_lance(FILE)

-- Vector search (see Vector Search Guide for more)
SELECT * FROM read_lance(FILE) NEAR 'sunset beach' TOPK 20
SELECT * FROM read_lance(FILE) NEAR embedding 'cat' TOPK 50
SELECT * FROM read_lance(FILE) WHERE aesthetic > 0.5 NEAR 'beach' TOPK 30

See Vector Search Guide for IVF-PQ indices, encoders, and performance tuning.

DataFrame Examples

import lanceql

dataset = lanceql.open("https://data.metal0.dev/laion-1m/images.lance")

# Vector search by text
result = (
    dataset.df()
    .search("cat playing", encoder="minilm", top_k=20)
    .select(["url", "text"])
    .collect()
)

# Vector search by row
result = (
    dataset.df()
    .search_by_row(0, column="embedding", top_k=10)
    .collect()
)

# Filter numeric columns
result = (
    dataset.df()
    .filter("aesthetic", ">", 0.6)
    .limit(50)
    .collect()
)

# Time travel
dataset = lanceql.open("https://...", version=24)

Architecture

src/
├── lanceql.zig          # Zig WASM module for Lance parsing
├── format/              # Lance file format (footer, columns)
├── proto/               # Protobuf decoder for manifests
├── io/                  # VFS abstraction (file, memory, HTTP)
└── encoding/
    ├── plain.zig        # Lance column decoders
    └── parquet/         # Parquet file reader
        ├── page.zig     # Page decoder (PLAIN, RLE_DICTIONARY)
        ├── snappy.zig   # Snappy decompression (pure Zig, SIMD)
        └── thrift.zig   # TCompactProtocol decoder

examples/wasm/
├── index.html           # Demo UI
└── lanceql.js           # JS wrapper, SQL parser, vector search

WASM Runtime

LanceQL uses an Immer-style Proxy pattern for WASM interop:

// Traditional WASM interop - verbose, error-prone
const ptr = wasm.alloc(str.length);
const mem = new Uint8Array(wasm.memory.buffer);
mem.set(encoder.encode(str), ptr);
const result = wasm.someFunc(ptr, str.length);
wasm.free(ptr);

// Immer-style - auto marshalling via Proxy (like metal0)
const lanceql = await LanceQL.load('./lanceql.wasm');
lanceql.someFunc("hello");       // strings auto-copied to WASM memory
lanceql.parseData(bytes);        // Uint8Array auto-copied too
lanceql.raw.someFunc(ptr, len);  // raw access when needed
lanceql.memory;                  // WASM memory

How it works:

// ~30 lines of runtime code handles all marshalling
const proxy = new Proxy({}, {
    get(_, name) {
        if (typeof wasm[name] === 'function') {
            return (...args) => wasm[name](...args.flatMap(marshal));
        }
        return wasm[name];
    }
});

// Marshal function - auto-converts strings and Uint8Array
const marshal = arg => {
    if (arg instanceof Uint8Array) {
        // Copy bytes to WASM memory, return [ptr, len]
        buffer.set(arg); return [ptr, arg.length];
    }
    if (typeof arg === 'string') {
        // Encode string to WASM memory, return [ptr, len]
        const bytes = encoder.encode(arg);
        buffer.set(bytes); return [ptr, bytes.length];
    }
    return [arg];  // Numbers pass through
};

Benefits:

Zero boilerplate - No manual alloc/free/copy for each call
Auto marshalling - Strings and Uint8Array automatically copied to WASM memory
Tiny runtime - ~30 lines, no dependencies
JS debugging - All logic stays in JS where DevTools works

Build

Requires Zig 0.13.0+

# Build WASM module
zig build wasm
# Output: zig-out/bin/lanceql.wasm

# Copy to demo
cp zig-out/bin/lanceql.wasm examples/wasm/

# Run tests
zig build test

Usage

<script type="module">
import { LanceQL } from './lanceql.js';

const lanceql = await LanceQL.load('./lanceql.wasm');

// Open remote dataset
const dataset = await lanceql.openDataset('https://data.metal0.dev/laion-1m/images.lance');

// Query
const strings = await dataset.readStrings(0, 50);  // First 50 rows of column 0
</script>

For TypeScript, the lanceql.d.ts file provides type definitions:

import { LanceQL } from './lanceql.js';
// Types are automatically picked up from lanceql.d.ts

Performance Optimizations

Zero-copy Arrow - Direct memory sharing via Arrow C Data Interface
Metal GPU (Apple Silicon) - Zero-copy unified memory, auto-switch at 100K+ vectors
- Batch cosine similarity: 14.3M vectors/sec (1M × 384-dim)
- Shaders compile at runtime - no Xcode required
Accelerate vDSP (macOS) - SIMD-optimized for small batches and single-vector ops
- Dot product: 42 ns/op (384-dim)
- Cosine similarity: 106 ns/op (384-dim)
Comptime SIMD - 32-byte vectors, bit-width specialization (1-20 bits)
IndexedDB Cache - Schema and column types cached for repeat visits
Sidecar Manifest - Optional .meta.json for faster startup
Fragment Prefetching - Parallel metadata loading on dataset open
Speculative Prefetch - Next page loaded in background

Vector Search Performance (384 dims)

Scale	Path	Throughput
10K vectors	CPU (Accelerate)	13.9M vec/s
100K vectors	GPU (Metal)	13.5M vec/s
1M vectors	GPU (Metal)	14.3M vec/s

Apple Silicon only - Intel Macs use CPU path (Accelerate still fast)

Platform Detection (comptime)

const builtin = @import("builtin");
const is_macos = builtin.os.tag == .macos;
const is_apple_silicon = is_macos and builtin.cpu.arch == .aarch64;
// Auto-switch: GPU at 100K+ vectors on Apple Silicon

Run vector benchmark: zig build bench-vector

Generate sidecar manifest:

cd fixtures
# Local dataset
python generate_sidecar.py /path/to/dataset.lance

# Remote S3/R2 dataset (requires aws profile 'r2')
python generate_sidecar.py s3://bucket/dataset.lance ./meta.json
aws s3 cp ./meta.json s3://bucket/dataset.lance/.meta.json --profile r2 --endpoint-url https://...

Format Support

Format	Function	Features
Lance	`read_lance()`	v2.0/v2.1, IVF-PQ indices, time travel, deletion vectors
Parquet	`read_parquet()`	Pure Zig, Snappy, RLE/PLAIN/DICTIONARY encoding
Delta Lake	`read_delta()`	Parquet + transaction log
Iceberg	`read_iceberg()`	Parquet + metadata layer
Arrow IPC	`read_arrow()`	.arrow, .arrows, .feather files
Avro	`read_avro()`	Deflate/Snappy compression
ORC	`read_orc()`	Snappy compression
Excel	`read_xlsx()`	Multi-sheet support

Supported Data Types: int32/64, float32/64, bool, string, timestamp[s/ms/us/ns], date32/64

See SQL Reference for complete data source documentation.

License

Apache-2.0 (same as Lance)

Name		Name	Last commit message	Last commit date
Latest commit History 596 Commits
.build		.build
.github		.github
benchmarks		benchmarks
clip_sample.lance		clip_sample.lance
deps		deps
dist		dist
docs		docs
examples		examples
fixtures		fixtures
packages		packages
python		python
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
JOIN_IMPLEMENTATION.md		JOIN_IMPLEMENTATION.md
LOGIC_TABLE.md		LOGIC_TABLE.md
README.md		README.md
TESTING_GUIDE.md		TESTING_GUIDE.md
TEST_RESULTS.md		TEST_RESULTS.md
build.zig		build.zig
build.zig.zon		build.zig.zon
package.json		package.json
test-join-node.js		test-join-node.js
test-local-database.js		test-local-database.js
test-sql-parser.js		test-sql-parser.js
test-sqljs-compat.js		test-sqljs-compat.js
test_stdout		test_stdout

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LanceQL

When to Use LanceQL

Features

All Platforms

Browser (WASM + JavaScript)

Node.js/Python (Native)

Installation

Browser (WASM)

Node.js (Native) - WIP

Python - WIP

Quick Start

Browser Demo

SQL Examples

DataFrame Examples

Architecture

WASM Runtime

Build

Usage

Performance Optimizations

Vector Search Performance (384 dims)

Platform Detection (comptime)

Format Support

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

teamchong/lanceql

Folders and files

Latest commit

History

Repository files navigation

LanceQL

When to Use LanceQL

Features

All Platforms

Browser (WASM + JavaScript)

Node.js/Python (Native)

Installation

Browser (WASM)

Node.js (Native) - WIP

Python - WIP

Quick Start

Browser Demo

SQL Examples

DataFrame Examples

Architecture

WASM Runtime

Build

Usage

Performance Optimizations

Vector Search Performance (384 dims)

Platform Detection (comptime)

Format Support

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages