<img src="./images/logo.svg" alt="lakeFS logo" width=300/> 

All of these notebooks can be run using the provided `docker-compose.yml` (unless otherwise specified). 

See "_Standalone demos_" below for those, including those for Airflow and Dagster, that run standalone.

## Sample Notebooks

* [**Integration of lakeFS with Spark and Python**](./spark-demo.ipynb) 
* [**Creating Dev-Test environments with lakeFS branches**](./dev-test.ipynb) 
* [**Data Lineage with lakeFS**](./data-lineage.ipynb) 
* [**Integration of lakeFS with Delta Lake and Apache Spark**](./delta-lake.ipynb) 
* [**Integration of lakeFS with Delta Lake and Python**](./delta-lake-python.ipynb)
* [**Displaying diff between Delta Tables**](./delta-diff.ipynb)<br/>_See also the [accompanying blog](https://lakefs.io/blog/lakefs-supports-delta-lake-diff/)_
* [**Only allow specific file formats in data lake**](hooks-webhooks-demo.ipynb) (with lakeFS webhooks)
* [**Prevent unintended schema change**](hooks-schema-validation.ipynb) (with lakeFS Lua hooks)
* [**Avoid leaking PII data**](hooks-schema-and-pii-validation.ipynb) (shows how to use multiple Lua hooks)
* [**Import into a lakeFS repository from multiple paths**](./import-multiple-buckets.ipynb) 
* [**ML Experimentation/Reproducibility 01 (Dogs)**](./ml-reproducibility.ipynb)
* [**ML Experimentation 02 (Wine Quality)**](./ml-experimentation-wine-quality-prediction.ipynb)</br>_See also the [accompanying blog](https://lakefs.io/blog/building-an-ml-experimentation-platform-for-easy-reproducibility-using-lakefs/)_
* [**RBAC demo**](./rbac-demo.ipynb) </br> _lakefS Cloud only_ 
* [**Version Control of multi-buckets pipelines**](./version-control-of-multi-buckets-pipelines.ipynb) 
* [**Reprocess and Backfill Data with new ETL logic**](./reprocess-backfill-data.ipynb) 
* **lakeFS and Apache Iceberg**
    * [Basic example](./iceberg-lakefs-basic.ipynb)
    * [NYC Film Permits example](./iceberg-lakefs-nyc.ipynb)
    * [What happens if you use Iceberg without the lakeFS support](./iceberg-lakefs-default.ipynb)
* **Using R with lakeFS**
    * [Basic usage](./R.ipynb)
    * [Weather data](./R-weather.ipynb)
    * [NYC Film permits example](./R-nyc.ipynb)
    * [Rough notes around R client](./R-client.ipynb)

### Write-Audit-Publish pattern

See https://lakefs.io/blog/data-engineering-patterns-write-audit-publish/

* [**Write-Audit-Publish / lakeFS**](write-audit-publish/wap-lakefs.ipynb) <br/> _With support for heterogeneous data formats and cross-collection consistency_
* [**Write-Audit-Publish / Apache Iceberg**](write-audit-publish/wap-iceberg.ipynb)
* [**Write-Audit-Publish / Project Nessie**](write-audit-publish/wap-nessie.ipynb)
* [**Write-Audit-Publish / Delta Lake**](write-audit-publish/wap-delta.ipynb)
* [**Write-Audit-Publish / Apache Hudi**](write-audit-publish/wap-hudi.ipynb)


## Standalone demos

These need to be run separately as they have dependencies that are not included in the base docker-compose.yml

* [**Airflow** (1)](https://github.com/treeverse/lakeFS-samples/blob/main/01_standalone_examples/airflow-01/) - Four examples of using lakeFS with Airflow: 
    * Versioning DAGs and running pipeline from hooks using a configurable version of DAGs 
    * Isolating Airflow job run and atomic promotion to production
    * Integration of lakeFS with Airflow via Hooks
    * Troubleshooting production issues
* [**Airflow** (2)](https://github.com/treeverse/lakeFS-samples/blob/main/01_standalone_examples/airflow-02/) - lakeFS + Airflow
* [Azure **Databricks**](https://github.com/treeverse/lakeFS-samples/blob/main/01_standalone_examples/azure-databricks/)
* [AWS **Databricks**](https://github.com/treeverse/lakeFS-samples/blob/main/01_standalone_examples/aws-databricks/)
* [AWS **Glue and Athena**](https://github.com/treeverse/lakeFS-samples/blob/main/01_standalone_examples/aws-glue-athena/)
* [lakeFS + **Dagster**](https://github.com/treeverse/lakeFS-samples/blob/main/01_standalone_examples/dagster-integration/)
* [lakeFS + **Prefect**](https://github.com/treeverse/lakeFS-samples/blob/main/01_standalone_examples/prefect-integration/)
* [*Labelbox* integration](https://github.com/treeverse/lakeFS-samples/blob/main/01_standalone_examples/labelbox-integration/)
* [How to **migrate or clone** a repo](https://github.com/treeverse/lakeFS-samples/blob/main/01_standalone_examples/migrate-or-clone-repo/)

### Notebooks for Demos and Conference Talks

_These may have existing environment or data requirements; they're included here so that you can see the contents and be inspired by them—but they may not run properly._

* [**Chaos Engineering: Books Demo (Data + AI Summit 2022)**](./demos-and-talks/chaos-engineering_dais22.ipynb)
* [**Delta Lake / lakeFS Multi-table transaction support**](./demos-and-talks/lakeFS-DeltaLake-multi-table-transaction-consistency.ipynb)
* [**lakeFS ❤️ Azure Synapse**](./demos-and-talks/lakeFSOnSynapse.ipynb)

## lakeFS Quickstart

* [⭐**lakeFS Quickstart**⭐](https://docs.lakefs.io/quickstart/)

## Got Questions or Want to Chat?

**👉🏻 Join the lakeFS Slack group - https://lakefs.io/slack**