# lakeFS ❤️ Azure Synapse

## 👩‍🔬 So what are we going to do today?

1. Learn how to read/write data from a lakeFS branch
1. Create our own isolated branch and play around with said data
1. Make a terrible mistake but then promptly undo it
1. Cleanse some data, commit and tag it! 
1. Prove that our tag is fully reproducible even if our branch changed

In [None]:
# Read data from a lakeFS branch!
df = spark.read.format("delta").load("lakefs://ml-data-repo/main/datasets/covid_delta/")
df.show(n=5)
print(f'we have a total of {df.count()} rows on main!')

### 🌴 Let's create an isolated branch to experiment on!

```sh
$ lakectl branch create lakefs://ml-data-repo/ozk-dev --source lakefs://ml-data-repo/main 
```

Or, you know, [do it through the UI](http://azure-demo.lakefs.io/repositories/ml-data-repo/branches)

In [None]:
# Let's read the same data from our dev branch!
df = spark.read.format("delta").load("lakefs://ml-data-repo/ozk-dev/datasets/covid_delta/")
df.show(n=5)
print(f'we have a total of {df.count()} rows on ozk-dev!')

In [None]:
# Time to make a terrible mistake!
df = df.filter("deaths > 1000")
df.write.format("delta").mode("overwrite").save("lakefs://ml-data-repo/ozk-dev/datasets/covid_delta/")  # Notice the "ozk-dev" branch!

### 🤔 OK, what changed?

We can run a diff and see uncommitted changes on our branch:

```sh
$ lakectl diff "lakefs://ml-data-repo/ozk-dev"
```

Or again, [through the UI](http://azure-demo.lakefs.io/repositories/ml-data-repo/changes?ref=ozk-dev)


In [None]:
# Let's explore the dataset again
df = spark.read.format("delta").load("lakefs://ml-data-repo/ozk-dev/datasets/covid_delta/")
df.show(n=5)
print(f'we have a total of {df.count()} rows on ozk-dev!')

### 😨 Reverting changes

OK, so that's not what we wanted - let's undo all uncommitted changes on our `ozk-dev` branch:

```sh
$ lakectl branch reset "lakefs://ml-data-repo/ozk-dev"
```

In [None]:
# Making sure we're good:
df = spark.read.format("delta").load("lakefs://ml-data-repo/ozk-dev/datasets/covid_delta/")
df.show(n=5)
print(f'we have a total of {df.count()} rows on ozk-dev!')

In [None]:
# Let's make the transformation we wanted:
df = df.filter("deaths > 0")  # Is it obvious I'm no data scientist?
df.write.format("delta").mode("overwrite").save("lakefs://ml-data-repo/ozk-dev/datasets/covid_delta/")
print(f'we have a total of {df.count()} rows on ozk-dev!')

## 🔖 committing and tagging

Great! we like this input data at its current state. Let's commit it and tag it, so we can refer to it later:

```bash
$ lakectl commit "lakefs://ml-data-repo/ozk-dev" -m "data cleaning: show only records with deaths > 0"
```

Now, let's tag it:

```bash
$ lakectl tag create "lakefs://ml-data-repo/ozk-experiment-covid-202206" "lakefs://ml-data-repo/ozk-dev"
```



In [None]:
# Reading from our tag
df = spark.read.format("delta").load("lakefs://ml-data-repo/ozk-experiment-covid-202206/datasets/covid_delta/")
df.show(n=5)
print(f'we have a total of {df.count()} rows on *ozk-experiment-covid-202206*!')

In [None]:
# Let's make a mess YET AGAIN:
df = df.filter("deaths > 1000") 
df.write.format("delta").mode("overwrite").save("lakefs://ml-data-repo/ozk-dev/datasets/covid_delta/")  # On our branch
print(f'we have a total of {df.count()} rows on ozk-dev!')

In [None]:
# In the meantime, on our tag...
df = spark.read.format("delta").load("lakefs://ml-data-repo/ozk-experiment-covid-202206/datasets/covid_delta/")
print(f'we have a total of {df.count()} rows on *ozk-experiment-covid-202206*!')

## Yay! What's next?

### * [Try lakeFS out](https://docs.lakefs.io/)
### * [Read more about how it works](https://docs.lakefs.io/understand/architecture.html)
### * [Star it on GitHub ⭐️❤️](https://github.com/treeverse/lakeFS)
### * [Join the lakeFS community on Slack](https://lakefs.io/slack)
