# Overview
In this notebook we are going to work with Pachyderm to setup our first repository and walk through a basic branching strategy.

Normally I would show the workflow with both the cli and the web ui but unfortunately it seems there are some buge in the web ui.

The basic workflow we will go through is:
- create a repo
- upload a document
- make a commit
- create a branch
- modify a branch
- deleting file
- merging branches

# 1. Prerequisites
This assumes we have a working installation and followed the setps in the [installation notebook](BInstalling%20Pachyderm.ipynb).

# 2. Create A Project

As mentioned in the [Introduction To Pachyderm notebook](Intro%20To%20Pachyderm.ipynb), the Project is the highest level "container". All the things we put into Pachyderm (pipelines, repos, branches, data) exist inside a Project.

When Pachyderm is initially deployed, it starts with a default project. Similarely, the CLI is configured to point to this default Project by default.

<center><img src="images/pachyderm-create-project.png"></center>


## 2.1. Using the GUI

We can create a project as follows:

<center><img src="images/pachyderm-create-project.png"></center>
<center><img src="images/pachyderm-create-project-2.png"></center>
<center><img src="images/pachyderm-create-project-3.png"></center>


## 2.1. Using the CLI

Note: When we create objects through the CLI they are also visible in the UI.

```
[root@os004k8-master001 ~]# pachctl create project my-new-project
[root@os004k8-master001 ~]# pachctl list projects
ACTIVE PROJECT        DESCRIPTION
       my-new-project -
       default        -
```

# 3. Create A Repository

Once we have created a Project, we can attach one or more repositories to the Project.


## 3.1. Using the GUI

It's hard to see in the relatively empty project, but Pachyderm represents the project as a DAG where the nodes are the data repositories and the edges are the pipelines associated with the project.

<center><img src="images/pachyderm-create-repo.png"></center>
<center><img src="images/pachyderm-create-repo-2.png"></center>
<center><img src="images/pachyderm-create-repo-3.png"></center>
<center><img src="images/pachyderm-create-repo-4.png"></center>

## 3.1. Using the CLI

The CLI uses a concept called context to inform the CLI as to what it is operating on. Specifically the context tells the CLI which pachyderm cluster it should communicate with and which project within that cluster we are operating on.

We can see the current context as well as which context is active using the following command:

```
[root@os004k8-master001 ~]# pachctl config list context
ACTIVE  NAME
*       default
[root@os004k8-master001 ~]# pachctl config get active-context
default
```

We see that our CLI is pointing to our default context. We can inspect the context as follows:

```
[root@os004k8-master001 ~]# cat ~/.pachyderm/config.json
{
  "user_id": "a8bc65a182ed4ab8abc6d3105a34e59d",
  "v2": {
    "active_context": "default",
    "contexts": {
      "default": {
        "cluster_deployment_id": "eBwzPUB61DxgnrFoBQ2g7JWeD3tbYTTu",
        "project": "default"
      }
    },
    "metrics": true
  }
}
```

We see that the pachctl has only one context currently (named default), and that it is pointing to a given cluster and targeting a given project (named default).

We now want to create a Repository in the Project we created in the previous step, to do this, we will need to change our context. There are two ways to do this, We can update our context so it now targets the desired project, or we can append the `--project <my-project>` to the cli commands we enter. I will opt for the former.

```
[root@os004k8-master001 ~]# pachctl list project
ACTIVE PROJECT        DESCRIPTION
       my-new-project -
       default        -
[root@os004k8-master001 ~]# pachctl config update context --project my-new-project
editing the currently active context "default"
[root@os004k8-master001 ~]# pachctl list project
ACTIVE PROJECT        DESCRIPTION
*      my-new-project -
       default        -
[root@os004k8-master001 ~]# cat ~/.pachyderm/config.json
{
  "user_id": "a8bc65a182ed4ab8abc6d3105a34e59d",
  "v2": {
    "active_context": "default",
    "contexts": {
      "default": {
        "cluster_deployment_id": "eBwzPUB61DxgnrFoBQ2g7JWeD3tbYTTu",
        "project": "my-new-project"
      }
    },
    "metrics": true
  }
}
```

We can see the CLI now indicates which project is being targeted and that the context config file has been updated. We can now create the repo:

```
[root@os004k8-master001 ~]# pachctl create repo my-new-repo
[root@os004k8-master001 ~]# pachctl list repos
PROJECT        NAME        CREATED       SIZE (MASTER) DESCRIPTION
my-new-project my-new-repo 5 seconds ago ≤ 0B
```

# 4. Create A Branch

The only way to create a branch is to commit a file. Like with git, empty directories are not supproted.

**Note**: Branch names are limited: only alphanumeric characters, underscores, and dashes are allowed.

## 4.1. Using the GUI (Don't, It's Broken)

The GUI intends to allow us to upload files and automatically commit them to a branch. Note the upload through the UI is fairly slow (I waited ~5 minutes to upload an empty file). Ultimately the upload will always show an empty file regardless if it has content or not. And the browser needs to be refreshed to show any new content.

<center><img src="images/pachyderm-create-branch.png"></center>
<center><img src="images/pachyderm-create-branch-2.png"></center>
<center><img src="images/pachyderm-create-branch-3.png"></center>

Note: This will take several minutes.

<center><img src="images/pachyderm-create-branch-4.png"></center>
<center><img src="images/pachyderm-create-branch-5.png"></center>

We see the UI is not updating with information about the new commit.

<center><img src="images/pachyderm-create-branch-6.png"></center>
<center><img src="images/pachyderm-create-branch-7.png"></center>
<center><img src="images/pachyderm-create-branch-8.png"></center>



## 4.2. Using The CLI

```
[root@os004k8-master001 ~]# echo "Testing..." > /tmp/test.txt
[root@os004k8-master001 ~]# pachctl put file my-new-repo@master:/test.txt -f /tmp/test.txt
/tmp/test.txt 11.00 b / 11.00 b [========================================================================] 0s 0.00 b/s
```

<center><img src="images/pachyderm-create-branch-cli.png"></center>
<center><img src="images/pachyderm-create-branch-cli-2.png"></center>
<center><img src="images/pachyderm-create-branch-cli-3.png"></center>


# 5. Update Branch
## 5.1. Using The GUI (Dont, Broken)
For some reason, when I upload any file, the GUI shows it as 0 bytes. And it takes a very long time (~5 min) to complete the upload.

## 5.2. Using The CLI
```
[root@os004k8-master001 ~]# echo "Some new text..." > /tmp/test.txt
[root@os004k8-master001 ~]# pachctl put file my-new-repo@master:/test.txt -f /tmp/test.txt
/tmp/test.txt 17.00 b / 17.00 b [========================================================================] 0s 0.00 b/s
```
<center><img src="images/pachyderm-update-branch-cli.png"></center>
<center><img src="images/pachyderm-update-branch-cli-2.png"></center>
<center><img src="images/pachyderm-update-branch-cli-3.png"></center>



# 6. Branching And Merging (Simple)
As mentioned in the [Introduction Notebook](Intro%20To%20Pachyderm.ipynb), there is no real concept of merging. If we want to actually diff and merge data, we must do that on our own. I will cover that scenario in the next section.

## 6.2. Using The CLI
We can see branch information in our repo as follows:
```
[root@os004k8-master001 ~]# pachctl list branch my-new-repo
BRANCH HEAD                             TRIGGER
master 32f79266112947e3b8cf829ad01b62b1 -
[root@os004k8-master001 ~]# pachctl list commit my-new-repo@master
PROJECT        REPO        BRANCH COMMIT                           FINISHED       SIZE ORIGIN DESCRIPTION
my-new-project my-new-repo master 32f79266112947e3b8cf829ad01b62b1 19 minutes ago 17B  USER
my-new-project my-new-repo master 19f4e83464564510991331ff69b5688f 22 minutes ago 11B  USER
```

We can then ask to create a new branch off of master.

```
[root@os004k8-master001 ~]# pachctl create branch my-new-repo@new-branch --head master
[root@os004k8-master001 ~]# pachctl list branch my-new-repo
BRANCH     HEAD                             TRIGGER
new-branch 32f79266112947e3b8cf829ad01b62b1 -
master     32f79266112947e3b8cf829ad01b62b1 -
[root@os004k8-master001 ~]# pachctl list commit my-new-repo@new-branch
PROJECT        REPO        BRANCH COMMIT                           FINISHED       SIZE ORIGIN DESCRIPTION
my-new-project my-new-repo master 32f79266112947e3b8cf829ad01b62b1 32 minutes ago 17B  USER
my-new-project my-new-repo master 19f4e83464564510991331ff69b5688f 34 minutes ago 11B  USER
```

We can see this branch retains the history of master. We can then make a new commit and see the history updated to reflect the new commit

```
[root@os004k8-master001 ~]# pachctl put file my-new-repo@new-branch:/test.txt -f /tmp/test.txt
/tmp/test.txt 16.00 b / 16.00 b [========================================================================] 0s 0.00 b/s
[root@os004k8-master001 ~]# pachctl list commit my-new-repo@new-branch
PROJECT        REPO        BRANCH     COMMIT                           FINISHED       SIZE ORIGIN DESCRIPTION
my-new-project my-new-repo new-branch 8de36303cce541db8bd8be09dbcd120a 2 seconds ago  16B  USER
my-new-project my-new-repo master     32f79266112947e3b8cf829ad01b62b1 39 minutes ago 17B  USER
my-new-project my-new-repo master     19f4e83464564510991331ff69b5688f 41 minutes ago 11B  USER
```

Now we can "merge" back into master by updating the branch pointer. This is done by creating a new branch named master:

```
[root@os004k8-master001 ~]# pachctl create branch my-new-repo@master --head 8de36303cce541db8bd8be09dbcd120a
[root@os004k8-master001 ~]# pachctl list commit my-new-repo@master
PROJECT        REPO        BRANCH     COMMIT                           FINISHED       SIZE ORIGIN DESCRIPTION
my-new-project my-new-repo new-branch 8de36303cce541db8bd8be09dbcd120a 3 minutes ago  16B  USER
my-new-project my-new-repo master     32f79266112947e3b8cf829ad01b62b1 42 minutes ago 17B  USER
my-new-project my-new-repo master     19f4e83464564510991331ff69b5688f 44 minutes ago 11B  USER
```


### 6.2.1. Warning: Forgetting To Specify --head When Creating New Branch

If we do not specify a branch or commit id with the --head argument, when we create a branch the branch will be empty and have no commit history. I can basically be thought of as a new integration branch.

```
[root@os004k8-master001 ~]# pachctl create branch my-new-repo@new-branch
[root@os004k8-master001 ~]# pachctl list branch my-new-repo
BRANCH     HEAD                             TRIGGER
new-branch 27e8a273ceab4f5cbf4292d6eb0fa915 -
master     32f79266112947e3b8cf829ad01b62b1 -
[root@os004k8-master001 ~]# pachctl list commit my-new-repo@new-branch
PROJECT        REPO        BRANCH     COMMIT                           FINISHED       SIZE ORIGIN DESCRIPTION
my-new-project my-new-repo new-branch 27e8a273ceab4f5cbf4292d6eb0fa915 17 seconds ago 0B   AUTO
[root@os004k8-master001 ~]# pachctl put file my-new-repo@new-branch:/test.txt -f /tmp/test.txt
/tmp/test.txt 16.00 b / 16.00 b [========================================================================] 0s 0.00 b/s
[root@os004k8-master001 ~]# pachctl list commit my-new-repo@new-branch
PROJECT        REPO        BRANCH     COMMIT                           FINISHED       SIZE ORIGIN DESCRIPTION
my-new-project my-new-repo new-branch dbf76266e8494b8080bcd0506a42cf42 2 seconds ago  16B  USER
my-new-project my-new-repo new-branch 27e8a273ceab4f5cbf4292d6eb0fa915 40 seconds ago 0B   AUTO
```

Notice the ORIGIN field is set to AUTO for the first commit

# 7. Branching And Merging (Complex)
In this example we assume that there is data in both branches that we want to keep. The scenario: someone merged into main while our branch was open. We want to keep their changes and our own. We will merge down, and then merge up.

## 7.1. Using The CLI

We setup our two branches in parallel. First we create branch a:

```
[root@os004k8-master001 ~]# pachctl list commit my-new-repo@master
PROJECT        REPO        BRANCH     COMMIT                           FINISHED       SIZE ORIGIN DESCRIPTION
my-new-project my-new-repo new-branch 8de36303cce541db8bd8be09dbcd120a 11 minutes ago 16B  USER
my-new-project my-new-repo master     32f79266112947e3b8cf829ad01b62b1 50 minutes ago 17B  USER
my-new-project my-new-repo master     19f4e83464564510991331ff69b5688f 52 minutes ago 11B  USER
[root@os004k8-master001 ~]# pachctl create branch my-new-repo@branch-a --head master
[root@os004k8-master001 ~]# echo "Change from a" > /tmp/test.txt
[root@os004k8-master001 ~]# pachctl put file my-new-repo@branch-a:/test.txt -f /tmp/test.txt
/tmp/test.txt 14.00 b / 14.00 b [========================================================================] 0s 0.00 b/s
[root@os004k8-master001 ~]# pachctl list commit my-new-repo@branch-a
PROJECT        REPO        BRANCH     COMMIT                           FINISHED       SIZE ORIGIN DESCRIPTION
my-new-project my-new-repo branch-a   a83548f37a664d03a69768b9df891e7f 7 seconds ago  14B  USER
my-new-project my-new-repo new-branch 8de36303cce541db8bd8be09dbcd120a 13 minutes ago 16B  USER
my-new-project my-new-repo master     32f79266112947e3b8cf829ad01b62b1 52 minutes ago 17B  USER
my-new-project my-new-repo master     19f4e83464564510991331ff69b5688f 54 minutes ago 11B  USER
```

Then we create the branch b

```
[root@os004k8-master001 ~]# pachctl create branch my-new-repo@branch-b --head master
[root@os004k8-master001 ~]# echo "Change from b" > /tmp/test.txt
[root@os004k8-master001 ~]# pachctl put file my-new-repo@branch-b:/test.txt -f /tmp/test.txt
/tmp/test.txt 14.00 b / 14.00 b [========================================================================] 0s 0.00 b/s
[root@os004k8-master001 ~]# pachctl list commit my-new-repo@branch-b
PROJECT        REPO        BRANCH     COMMIT                           FINISHED       SIZE ORIGIN DESCRIPTION
my-new-project my-new-repo branch-b   5041b2efc3864a6d87aa509757395643 5 seconds ago  14B  USER
my-new-project my-new-repo new-branch 8de36303cce541db8bd8be09dbcd120a 13 minutes ago 16B  USER
my-new-project my-new-repo master     32f79266112947e3b8cf829ad01b62b1 52 minutes ago 17B  USER
my-new-project my-new-repo master     19f4e83464564510991331ff69b5688f 54 minutes ago 11B  USER
```

Now we diff the branches

```
[root@os004k8-master001 ~]# pachctl diff file my-new-repo@branch-a:/test.txt my-new-repo@branch-b:/test.txt
diff --git a/tmp/test.txt_2719363711 b/tmp/test.txt_57355310
index 7185c3e..c990fdd 100644
--- a/tmp/test.txt_2719363711
+++ b/tmp/test.txt_57355310
@@ -1 +1 @@
-Change from b
+Change from a
```

```
We look at the file contents
[root@os004k8-master001 ~]# pachctl get file my-new-repo@branch-a:test.txt
Change from a
[root@os004k8-master001 ~]# pachctl get file my-new-repo@branch-b:test.txt
Change from b
```

We merge branch a into master

```
[root@os004k8-master001 ~]# pachctl create branch my-new-repo@master --head a83548f37a664d03a69768b9df891e7f
[root@os004k8-master001 ~]# pachctl get file my-new-repo@master:test.txt
Change from a
[root@os004k8-master001 ~]# pachctl list commit my-new-repo@master
PROJECT        REPO        BRANCH     COMMIT                           FINISHED       SIZE ORIGIN DESCRIPTION
my-new-project my-new-repo branch-a   a83548f37a664d03a69768b9df891e7f 4 minutes ago  14B  USER
my-new-project my-new-repo new-branch 8de36303cce541db8bd8be09dbcd120a 17 minutes ago 16B  USER
my-new-project my-new-repo master     32f79266112947e3b8cf829ad01b62b1 56 minutes ago 17B  USER
my-new-project my-new-repo master     19f4e83464564510991331ff69b5688f 59 minutes ago 11B  USER
```

Pachctl has a built in diff utility to show us the differences between the files in different branches.

```
[root@os004k8-master001 ~]# pachctl diff file my-new-repo@branch-b:/test.txt my-new-repo@master:/test.txt
diff --git a/tmp/test.txt_4214818009 b/tmp/test.txt_111447614
index c990fdd..7185c3e 100644
--- a/tmp/test.txt_4214818009
+++ b/tmp/test.txt_111447614
@@ -1 +1 @@
-Change from a
+Change from b
```

The output is showing us that if we merged branch b into master, we would overwrite the changes in master. The utility does not understand that a new commit occurred on top of master and that we might want to keep the change. Luckily, in our case, we can use the existing git functionality to compare our plain text files and show us if any merge conflicts exist.

```
[root@os004k8-master001 ~]# pachctl list commit my-new-repo@master                                                    PROJECT        REPO        BRANCH     COMMIT                           FINISHED          SIZE ORIGIN DESCRIPTION
my-new-project my-new-repo branch-a   a83548f37a664d03a69768b9df891e7f 13 minutes ago    14B  USER
my-new-project my-new-repo new-branch 8de36303cce541db8bd8be09dbcd120a 26 minutes ago    16B  USER
my-new-project my-new-repo master     32f79266112947e3b8cf829ad01b62b1 About an hour ago 17B  USER
my-new-project my-new-repo master     19f4e83464564510991331ff69b5688f About an hour ago 11B  USER
[root@os004k8-master001 ~]# pachctl get file my-new-repo@branch-a:test.txt > /tmp/__test_branch-a.txt
[root@os004k8-master001 ~]# pachctl get file my-new-repo@branch-b:test.txt > /tmp/__test_branch-b.txt
[root@os004k8-master001 ~]# pachctl get file my-new-repo@new-branch:test.txt > /tmp/__test_original_master.txt
[root@os004k8-master001 ~]# cat /tmp/__test_*.txt
Change from a
Change from b
My Third Commit
[root@os004k8-master001 ~]# git merge-file /tmp/__test_branch-b.txt /tmp/__test_original_master.txt /tmp/__test_branch-a.txt --stdout
<<<<<<< /tmp/__test_branch-b.txt
Change from b
=======
Change from a
>>>>>>> /tmp/__test_branch-a.txt
```

Here we can see that the changes from branches a and b are conflicting. We can now use an IDE that understands git merge file syntax or resolve the diff manually. In my case I will keep both lines. So I will upload the merge result as a new file and then merge.

```
[root@os004k8-master001 ~]# vi /tmp/merge-result.txt
[root@os004k8-master001 ~]# cat /tmp/merge-result.txt
Change from b
Change from a
[root@os004k8-master001 ~]# pachctl put file my-new-repo@branch-b:/test.txt -f /tmp/merge-result.txt
/tmp/merge-result.txt 28.00 b / 28.00 b [================================================================] 0s 0.00 b/s
[root@os004k8-master001 ~]# pachctl get file my-new-repo@branch-b:test.txt
Change from b
Change from a
[root@os004k8-master001 ~]# pachctl list commit my-new-repo@branch-b
PROJECT        REPO        BRANCH     COMMIT                           FINISHED           SIZE ORIGIN DESCRIPTION
my-new-project my-new-repo branch-b   0551fb2d07db4eb09638efabc7885d39 About a minute ago 28B  USER
my-new-project my-new-repo branch-b   5041b2efc3864a6d87aa509757395643 35 minutes ago     14B  USER
my-new-project my-new-repo new-branch 8de36303cce541db8bd8be09dbcd120a 48 minutes ago     16B  USER
my-new-project my-new-repo master     32f79266112947e3b8cf829ad01b62b1 About an hour ago  17B  USER
my-new-project my-new-repo master     19f4e83464564510991331ff69b5688f 2 hours ago        11B  USER
[root@os004k8-master001 ~]# pachctl create branch my-new-repo@master --head 0551fb2d07db4eb09638efabc7885d39
[root@os004k8-master001 ~]# pachctl list commit my-new-repo@master
PROJECT        REPO        BRANCH     COMMIT                           FINISHED           SIZE ORIGIN DESCRIPTION
my-new-project my-new-repo branch-b   0551fb2d07db4eb09638efabc7885d39 About a minute ago 28B  USER
my-new-project my-new-repo branch-b   5041b2efc3864a6d87aa509757395643 35 minutes ago     14B  USER
my-new-project my-new-repo new-branch 8de36303cce541db8bd8be09dbcd120a 49 minutes ago     16B  USER
my-new-project my-new-repo master     32f79266112947e3b8cf829ad01b62b1 About an hour ago  17B  USER
my-new-project my-new-repo master     19f4e83464564510991331ff69b5688f 2 hours ago        11B  USER
```