# Overview

In this notebook we will explore the options for installing pachyderm locally (i.e. in a self-hosted format). This means no commercial requirements like a cloud provider or any manages service or applicance.

Currently the latest stable version is [2.5.5](https://github.com/pachyderm/pachyderm/releases/tag/v2.5.5) which was released 4/27/2023.

# Installation Options

Looking through the documentation we see two options for installing pachyderm: locally or in the cloud. We then see that these two opions break down further. The table below tries to enumerate the options presented or implied in the documentation.

- [Local Installation](https://docs.pachyderm.com/2.3.x/getting-started/local-installation/)
    - [Docker Desktop](https://docs.pachyderm.com/latest/getting-started/local-deploy/docker/) - A commercial application that allows running containers as well as a single node kubernetes cluster
    - [Minikube](https://docs.pachyderm.com/latest/getting-started/local-deploy/minikube/) - An open source tool that allows running a single node kubernetes cluster locally
    - (implied) Self-hosted kubernetes cluster
- Cloud Installation
    - AWS
    - Azure
    - GCP

As the pachyderm instructions are assuming that local installation is performed on a desktop environment, they offer instructions tailored to Windows, Mac, and Linux.

As mentioned earlier, we will be installing to a self-hosted kubernetes cluster. 

The official instructions for the local installation can be found [here](https://docs.pachyderm.com/latest/getting-started/local-deploy/).

# Installation On Kubernetes Cluster

Regardless of which local installation method we have chosen, the steps should be reletively similar as we are essentially deployign an application to kubernetes.

## Install Homebrew (docker desktop / minicube only)

The first step in the installation is to install Homebrew (if using linux or mac; For Windows, do documentation lists manual steps that must be undertaken). 

Homebrew is a package manager. It is used to install other components including:
- The kubernetes environment (Docker Desktop / Minicube)
- pachctl
- helm

Originally named Linuxbrew, Homebrew was developed for macOS to provide users with a convenient way to install Linux applications. After the tool gained popularity for its large selection of applications and ease of use, Homebrew developers created a native Linux version.

Homebrew is an “add-on” package manager. Homebrew installs packages alongside whatever system it runs on.

As we are running on a full blown kubernetes cluster we do not need to use brew.

## Install pachctl
The pachctl is a command-line tool that you can use to interact with a Pachyderm cluster in your terminal. It is provided as a precompiled binary available from the [github releases page](https://github.com/pachyderm/pachyderm/releases/tag/v2.5.5).

```
[root@os004k8-master001 ~]# curl -L -O https://github.com/pachyderm/pachyderm/releases/download/v2.5.5/pachctl_2.5.5_linux_amd64.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 37.1M  100 37.1M    0     0  33.6M      0  0:00:01  0:00:01 --:--:-- 82.4M

```

**Note**:The official installation instructions can be found [here](https://docs.pachyderm.com/2.3.x/getting-started/local-installation/#install-pachctl).

## Install Helm (full kuberentes only)

We can think of helm as a package and deployment manager for kubernetes. Helm automates the creation, packaging, configuration, and deployment of Kubernetes applications. It does this through a packaging structure that combines your configuration files into a single reusable format that can be understood and managed by the utility.

### Helm Compatibility

In order to install helm, we need to figure out which version is comptible with the version of our kubernetes cluster. The helm documentation lists the [compatibility matrix](https://helm.sh/docs/topics/version_skew/) as seen below:


|Helm Version|Supported Kubernetes Versions|
|------------|-----------------------------|
|3.11.x |1.26.x - 1.23.x|
|3.10.x|1.25.x - 1.22.x|
|3.9.x|1.24.x - 1.21.x|
|3.8.x|1.23.x - 1.20.x|
|3.7.x|1.22.x - 1.19.x|
|3.6.x|1.21.x - 1.18.x|
|3.5.x|1.20.x - 1.17.x|
|3.4.x|1.19.x - 1.16.x|


In my case my case my kubernetes cluster was running version 1.21.14:
```
[root@os004k8-master001 ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.9", GitCommit:"b631974d68ac5045e076c86a5c66fba6f128dc72", GitTreeState:"clean", BuildDate:"2022-01-19T17:51:12Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.14", GitCommit:"0f77da5bd4809927e15d1658fb4aa8f13ad890a5", GitTreeState:"clean", BuildDate:"2022-06-15T14:11:36Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}

```

So this means I can run helm 3.6 to 3.9. I will go with 3.9 as it's the newest version which has had the most burn in time with my version of k8.

### Download Binaries
The official installation instructions can be found [here](https://helm.sh/docs/intro/install/). Every version of helm is distributed as a binary built for x64 arhchitectures. The binaries can be doenloaded from the [github releases page](https://github.com/helm/helm/releases).

In my case, [3.9.4](https://github.com/helm/helm/releases/tag/v3.9.4) is the latest version available.


```
[root@os004k8-master001 ~]# curl -O https://get.helm.sh/helm-v3.9.4-linux-amd64.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13.3M  100 13.3M    0     0  20.1M      0 --:--:-- --:--:-- --:--:-- 20.1M

[root@os004k8-master001 ~]# tar -zxvf helm-v3.9.4-linux-amd64.tar.gz
linux-amd64/
linux-amd64/helm
linux-amd64/LICENSE
linux-amd64/README.md

[root@os004k8-master001 ~]# linux-amd64/helm version
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config
version.BuildInfo{Version:"v3.9.4", GitCommit:"dbc6d8e20fe1d58d50e6ed30f09a04a77e4c68db", GitTreeState:"clean", GoVersion:"go1.17.13"}

[root@os004k8-master001 ~]# cp linux-amd64/helm /usr/bin/
[root@os004k8-master001 ~]# helm version
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config
version.BuildInfo{Version:"v3.9.4", GitCommit:"dbc6d8e20fe1d58d50e6ed30f09a04a77e4c68db", GitTreeState:"clean", GoVersion:"go1.17.13"}

```


### Connect helm to kubernetes cluster
In order to allow helm to install packages on kuernetes, it needs to be able to access information about the cluster. This is typically done via the kube config file. A plain text file that contains the configurations and secrets necessary for a cli to connect and authenticate against a kubernetes cluster. For example, the kubectl and kubeadm programs use this file.

Helm will default to using whatever your current Kubernetes context is, as specified in the $HOME/. kube/config file. 

### Add Helm Chart Repository

The heml package format is referred to as a chart. Similar to regular OS packages, helm charts are provided by repositories. The package manager (helmp) is configured to point to repositories to allow users to download and install packages from those repositories. Artifact Hub is a public repository providing open source helm charts.




We want to add the repo for the pachyderm repo so we can install the app on our cluster.

```
[root@os004k8-master001 ~]# helm repo add pachyderm https://helm.pachyderm.com
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config
"pachyderm" has been added to your repositories

[root@os004k8-master001 ~]# helm repo update
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "pachyderm" chart repository
Update Complete. ⎈Happy Helming!⎈

```

## Install Pachyderm using helm

Pachyderm uses the client server model. The pachD damon is packaged as a kubernetes pod and the pachctl cli connects to the server (the daemon running in the pod) to execute commands etc.

We can ask helm to list all the charts it can find. In my case I only have one repo (the pachyderm repo) and we can list out all the charts available for a given repo:

```
[root@os004k8-master001 ~]# helm repo list
NAME            URL
pachyderm       https://helm.pachyderm.com

[root@os004k8-master001 ~]# helm search repo -l -r pachyderm 2>/dev/null
NAME                    CHART VERSION   APP VERSION     DESCRIPTION
pachyderm/pachyderm     2.5.5           2.5.5           Explainable, repeatable, scalable data science
pachyderm/pachyderm     2.5.4           2.5.4           Explainable, repeatable, scalable data science
pachyderm/pachyderm     2.5.3           2.5.3           Explainable, repeatable, scalable data science
pachyderm/pachyderm     2.5.2           2.5.2           Explainable, repeatable, scalable data science
pachyderm/pachyderm     2.5.1           2.5.1           Explainable, repeatable, scalable data science
pachyderm/pachyderm     2.5.0           2.5.0           Explainable, repeatable, scalable data science
pachyderm/pachyderm     2.4.6           2.4.6           Explainable, repeatable, scalable data science


```

We can then ask helm to install our preferred version:

```
[root@os004k8-master001 ~]# helm install pachd pachyderm/pachyderm --set deployTarget=LOCAL --set proxy.enabled=true --set proxy.service.type=LoadBalancer --version 2.5.5
W0501 12:39:42.848489   21435 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0501 12:39:43.115008   21435 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
NAME: pachd
LAST DEPLOYED: Mon May  1 12:39:40 2023
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:
```

We can then ask kubernetes about the status of the pods it has spun up to host the pachyderm solution.

```
[root@os004k8-master001 ~]# kubectl get pods -A
NAMESPACE     NAME                                                   READY   STATUS              RESTARTS   AGE
default       console-587c67787f-cm8sg                               0/1     ContainerCreating   0          38s
default       etcd-0                                                 0/1     Pending             0          38s
default       pachd-bd45db8cd-4bh5l                                  0/1     ContainerCreating   0          35s
default       pachd-loki-0                                           0/1     Pending             0          38s
default       pachd-promtail-64qpn                                   0/1     ContainerCreating   0          39s
default       pachd-promtail-jtrkt                                   0/1     ContainerCreating   0          39s
default       pachd-promtail-kjlcq                                   0/1     ContainerCreating   0          38s
default       pachd-promtail-nljnp                                   0/1     ContainerCreating   0          38s
default       pachd-promtail-xp82l                                   0/1     ContainerCreating   0          39s
default       pachd-promtail-z6n65                                   0/1     ContainerCreating   0          38s
default       pachyderm-kube-event-tail-6c6598cd5-pcthr              0/1     ContainerCreating   0          38s
default       pachyderm-proxy-7f4545985c-zf2bv                       0/1     ContainerCreating   0          38s
default       pg-bouncer-88dbc966b-l4xzs                             0/1     ContainerCreating   0          38s
default       postgres-0                                             0/1     Pending             0          38s
kube-system   coredns-558bd4d5db-478nt                               1/1     Running             1          17h
kube-system   coredns-558bd4d5db-x42cd                               1/1     Running             1          17h
kube-system   etcd-os004k8-master001.foobar.com                      1/1     Running             125        17h
kube-system   kube-apiserver-os004k8-master001.foobar.com            1/1     Running             1          17h
kube-system   kube-controller-manager-os004k8-master001.foobar.com   1/1     Running             1          17h
kube-system   kube-proxy-2l94p                                       1/1     Running             1          16h
kube-system   kube-proxy-2m6hr                                       1/1     Running             1          17h
kube-system   kube-proxy-f4vbn                                       1/1     Running             1          16h
kube-system   kube-proxy-kzz98                                       1/1     Running             1          17h
kube-system   kube-proxy-plhkk                                       1/1     Running             1          17h
kube-system   kube-proxy-sm9wf                                       1/1     Running             1          17h
kube-system   kube-proxy-wcl4n                                       1/1     Running             1          16h
kube-system   kube-scheduler-os004k8-master001.foobar.com            1/1     Running             1          17h
kube-system   weave-net-472vx                                        2/2     Running             4          17h
kube-system   weave-net-4lmgh                                        2/2     Running             4          17h
kube-system   weave-net-6w7qx                                        2/2     Running             4          16h
kube-system   weave-net-fkzm5                                        2/2     Running             3          16h
kube-system   weave-net-g4sjk                                        2/2     Running             2          17h
kube-system   weave-net-k7l4s                                        2/2     Running             3          17h
kube-system   weave-net-qs929                                        2/2     Running             4          16h
```

**Note**: This will take some time before all the pods are running. Remember, the kubernetes cluster is going to download a bunch of docker images from dockerhub. This may take some time. It then neets to start the pods (containers) and wait for their internal services to come online.