boringdata.io — Kickstart your Iceberg journey with our data stack templates.
A lightweight, file-based Iceberg catalog implementation using a single JSON file (e.g., on S3, local disk, or any fsspec-compatible storage).
- No need to host or maintain a dedicated catalog service
- Easy to use, easy to understand, perfect to get started with Iceberg
- DuckDB CLI interface to easily explore your iceberg tables and metadata
Boring Catalog stores all Iceberg catalog state in a single JSON file:
- Namespaces and tables are tracked in this file
- S3 conditional writes prevent concurrent modifications when storing catalog on S3
- The
.ice/index
file in your project directory stores the configuration for your catalog, including:catalog_uri
: the path to your catalog JSON filecatalog_name
: the logical name of your catalogproperties
: additional properties (e.g., warehouse location)
pip install boringcatalog
ice init
That's it ! Your catalog is now ready to use.
2 files are created:
warehouse/catalog/catalog_boring.json
= catalog file.ice/index
= points to the catalog location (similar to a git index file, but for Iceberg)
Note: You can also specify a remote location for your Iceberg data and catalog file:
ice init -p warehouse=s3://mybucket/mywarehouse
More details on the Custom Init and Catalog Location section.
Note: If you are using an S3 path (e.g., s3://...
) for your catalog file or warehouse, make sure your CLI environment is authenticated with AWS. For example, you can set your AWS profile with:
export AWS_PROFILE=your-provider
You must have valid AWS credentials configured for the CLI to access S3 resources.
You can then start using the catalog:
# Get some data
curl https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-01.parquet -o /tmp/yellow_tripdata_2023-01.parquet
# Commit the table
ice commit my_table --source /tmp/yellow_tripdata_2023-01.parquet
ice log
ice duck
This opens an interactive DuckDB session with pointers to all your tables and namespaces.
Example DuckDB queries:
show; -- show all tables
select * from catalog.namespaces; -- list namespaces
select * from catalog.tables; -- list tables
select * from <namespace>.<table>; -- query iceberg table
from boringcatalog import BoringCatalog
# Auto-detects .ice/index in the current working directory
catalog = BoringCatalog()
# Or specify a catalog
catalog = BoringCatalog(name="mycat", uri="path/to/catalog.json")
# Interact with your iceberg catalog
catalog.create_namespace("my_namespace")
catalog.create_table("my_namespace", "my_table")
catalog.load_table("my_namespace.my_table")
import pyarrow.parquet as pq
df = pq.read_table("/tmp/yellow_tripdata_2023-01.parquet")
table = catalog.load_table(("ice_default", "my_table"))
table.append(df)
You can configure your Iceberg catalog in several ways, depending on where you want to store your catalog metadata (the JSON file) and your Iceberg data (the warehouse):
- The
warehouse
property determines where your Iceberg tables' data will be stored. - The
--catalog
option lets you specify the exact path for your catalog JSON file. - If you use both, the catalog file will be created at the path you specify, and the warehouse will be used for table data.
Command Example | Catalog File Location | Warehouse/Data Location | Use Case |
---|---|---|---|
ice init |
warehouse/catalog/catalog_boring.json |
warehouse/ |
Local, simple |
ice init -p warehouse=... |
<warehouse>/catalog/catalog_boring.json |
<warehouse>/ |
Custom warehouse |
ice init --catalog ... |
<custom>.json |
(to define when creating a table) | Custom catalog file |
ice init --catalog ... -p warehouse=... |
<custom>.json |
<warehouse>/ |
Full control |
ice init --catalog ... --catalog-name ... |
<custom>.json |
(to define when creating a table) | Custom name & file |
- Custom Catalog Name: By default, the catalog is named
"boring"
, but you can set a custom name with--catalog-name
. This name is used in the catalog JSON and for file naming if you don't specify a custom path. - Re-initialization: If you run
ice init
multiple times in the same directory, the.ice/index
file will be overwritten with the new configuration. This is useful if you want to re-point your project to a different catalog, but be aware that it will not migrate or merge any existing data. - Manual Editing: Advanced users can manually edit
.ice/index
to point to a different catalog file or change the catalog name. If you do this, make sure thecatalog_uri
andcatalog_name
fields are consistent with your actual catalog JSON file. If you set awarehouse
property but do not updatecatalog_uri
, Boring Catalog will always use thecatalog_uri
from the index file.
- Improve CLI to allow MERGE operation, partition spec, etc.
- Improve CLI to get info about table schema / partition spec / etc.
- Expose REST API for integration with AWS, Snowflake, etc.