# Overview

[LakeFS](https://docs.lakefs.io/) is a storage solution which provides version control for data lakes. It sits on top of a data lake as an "overlay filesystem" meaning it provides an api which maps version info to phsical data stored in the underlying data store.



# Features

A git-like interface

No data duplication

An S3 Compatible API

LakeFS allows users to leverage the following cloud provided storage services as its underlying data store:
- AWS S3
- S3 Compatible Stores like MinIO or Ceph
- Azure Blob Storage (ABS)
- Google Cloud Storage (GCS)
- Local Storage


LakeFS provides direct integration with popular data frameworks 
- Spark
- Hive
- dbt
- Trino
- and many others

# Architecture

In the simplest terms, LakeFS stores data and some metadata on the underlying datastore while other metadata (mostly associated with the version information) is stored in a PostreSQL database. But it has a number of other components to provide user comforts.

## Components

- **S3 Gateway** - LakeFS implements a compatible subset of the S3 API to ensure most data systems can use lakeFS as a drop-in replacement for S3.

- **OpenAPI Server** - The Swagger (OpenAPI) server exposes the full set of lakeFS operations including basic CRUD operations against repositories and objects, as well as versioning related operations such as branching, merging, committing and reverting changes to data.

- **Storage Adapter** - an abstraction layer for communicating with any underlying object store.

- **Graveler** - handles lakeFS versioning by translating lakeFS addresses to the actual stored objects.

- **Authentication & Authorization Service**

- **Hooks Engine** - enables CI/CD for data by triggering user defined actions (hooks) that will run during commit/merge

- **UI** - a simple browser-based client that uses the OpenAPI server to provides access to repositories, branches, commits and objects in the system.


The official documentation can be found [here](https://docs.lakefs.io/understand/architecture.html).

# Deployment

https://docs.lakefs.io/understand/architecture.html#ways-to-deploy-lakefs



# How Does LakeFS Versioning Work

With LakeFS, like git, our largest container is the repository. The repository is made up of branches and commits. And the branches and the commits point to files.

As such, when we want to access a particular file from we need to not only know it's relative path, but also the name of the repository which holds the file, and the branch or commit coresponding to the desired version of that file.

We will see that the LakeFS API builds this information into the path of a given file. As such, when we refer to data we would use paths resembling the following:

- Repositories: `lakefs://\<repo-name>`
- Commits: `lakefs://\<repo-name>@\<commit-id>`
- Branches: `lakefs://\<repo-name>@\<branch-id>`
- Files (objects): `lakefs://\<repo-name>@\<branch-id>/\<object path>`

And thus reading a file might resemble this

```python
df = spark.read.parquet('lakefs://<repo-name>@<branch-id>\<object path>')
```

## What Are LakeFS Commits
A LakeFS commit maps meta data and a LakeFS file path to an actual file path. In the documentation this is sometimes specified as mapping names to objects or keys to values. As we will see, if a file path changes, the two different keys in two commits will point to the same underlying file. But if the data changes, the two same keys will point to different files on the underlying filesystem. 

<center><img src='images/commit-example.png'></center>

The example above is a simplified representation of commits. In this section we go deeper down the worm hole. As we will see, LakeFS commits Are stored as a B+ Tree of SSTables.

### What are SSTables
TL;DR; In short an SSTable is a tree based key value store

SSTable refers to a data structure and the coresponding persistent file format. It is used by a number of NoSQL databases, specifically those which impliment Log Structured Merge Tree (LSM) based distributed database systems and key-value stores (like ScyllaDB, Apache Cassandra, and BigTable).

An SSTable provides a persistent, ordered, immutable map from keys to values, where both keys and values are arbitrary byte strings. Like any data structure, SSTable implimentation provide operations for accessing an managing the data. For example, methods to look up the value associated with a specified key or to iterate over all key/value pairs in a specified key range.

An SSTable is partitioned into blocks and provides a block index. The index is loaded into memory when the SStable file is opened and provides a lookup to locate a given block without excessive disk seeks (i.e. searching the disk). Additionally is resources allow the entire SStable can be loaded into memory avoiding the use of the disk in a search.

A helpful conversation on the topic can be found in [this article](https://stackoverflow.com/questions/2576012/what-is-an-sstable) or [this article](http://distributeddatastore.blogspot.com/2013/08/cassandra-sstable-storage-format.html) or [this one](https://en.wikipedia.org/wiki/Log-structured_merge-tree).

### Commits Are stored as a B+ Tree of SSTables (ie. Gravelers)

Each lakeFS commit is represented as a tree structue (for speed). Specifically a B+ tree with height 2. And this is done for speed. The way it works is that the namespace of keys (the list of file paths) is sorted and split up into blocks or ranges. Each range is mapped in its own SStable in level 2. Each range, because it is sorted, has a start key and and end key indicating the boundaries of the range (ranges do not overlap). The root of the tree (level 1) contains a sorted list of all the last keys from all the ranges and maps them to the coresponding range. With this structure, we can peform a faster lookup of a file in the repository. We do a seek on the root, get the range, and then seek on the range. This is faster than potentially seeking the entire table.

We can see an example commit below:

<center><img src='images/lakefs-commit-btree.png'</img></center>

The commit is stored in a standardized format called “Graveler”. Thus the SSTable files are referred to as graveler files. To be even morespecific, LakeFS uses the RocksDB SSTable file format and its implementation using the Pebble SSTable library from CockroachDB.

More information on this file format can be found [here](https://docs.lakefs.io/understand/versioning-internals.html) or [here](https://lakefs.io/concrete-graveler-committing-data-to-pebbledb-sstables) or [here](https://lakefs.io/concrete-graveler-splitting-for-reuse/).