I wrote [a few months back](http://tdhopper.com/blog/2016/Nov/15/data-scientists-need-more-automation/) about how data scientists need more automation. In particular, I suggested that data scientists would be wise to learn more about automated system configuration and automated deployments. 

In an attempt to take my own advice, I've finally been making myself learn [Ansible](https://www.ansible.com/). It turns out that a great way to learn it is to sit down and ready through the docs, front to back; I commend that tactic to you. I also put together this tutorial to walk through a practical example of how a working data scientist might use this powerful tool. 

What follows is an Ansible guide that will take you from installing Ansible to automatically deploying a long-running Python to a remote machine and running it in a [Conda environment](https://conda.io/docs/using/envs.html) using [supervisord](http://supervisord.org/).

Ansible provides "human readable automation" for "app deployment" and "configuration management". Unlike tools like Chef, it doesn't require an agent to be running on remote machines. In short, it translates declarative YAML files into shell commands and runs them your machines over SSH.

### Installing Ansible with Homebrew 

First, you'll need to [install Ansible](http://docs.ansible.com/ansible/intro_installation.html). On a Mac, I recommend doing this with [Homebrew](https://brew.sh/).

In [3]:
brew install ansible

brew install ansible
We do not provide support for this pre-release version.
You may encounter build failures or other breakages.


: 1

### Quickstart

Soon, I'll show you how to put write an Ansible YAML file. However, Ansible also allows you specify tasks from the command line. 

Here's how we could use Ansible ping our local host:

In [44]:
ansible -i 'localhost,' -c local -m ping all

ansible -i 'localhost,' -c local -m ping all
[0;32mlocalhost | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}[0m


: 1

This command calls ansible and tells it:
* To use `localhost` as it's inventory (`-i`). Inventory is Ansible speak for machine or machines you want to be able to run commands on. 
* To connect (`-c`) locally (`local`) instead of over SSH. 
* To run the [`ping` module](http://docs.ansible.com/ansible/ping_module.html) (`-m`) to test the connection.
* To run the command on `all` hosts in the inventory (in this case, our inventory is just the `localhost`).

[Michael Booth](http://www.mechanicalfish.net/start-learning-ansible-with-one-line-and-no-files/) has a [post](http://www.mechanicalfish.net/start-learning-ansible-with-one-line-and-no-files/) that goes into more detail about this command.

Behind the scenes, Ansible is turning this `-m ping` command into shell commands. (Try running with the `-vvv` flag to see what's happening behind the scenes.) It can also execute arbitrary commands; by default, it'll use the Bourne shell `sh`. 

In [51]:
ansible all -i 'localhost, ' -c local -a "/bin/echo hello" 

ansible all -i 'localhost, ' -c local -a "/bin/echo hello"
[0;32mlocalhost | SUCCESS | rc=0 >>
hello
[0m


: 1

### Setting up an Ansible Inventory

Instead of specifying our inventory with the `-i` flag each time, we should specify an Ansible inventory file. This file is a text file specifying machines you have SSH access to; you can also group machines under bracketed headings. For example:

```
mail.example.com

[webservers]
foo.example.com
bar.example.com

[dbservers]
one.example.com
two.example.com
three.example.com
```

Ansible has to be able to connect to these machines over SSH, so you will likely need to have relevant entries in your [`.ssh/config` file](http://nerderati.com/2011/03/17/simplify-your-life-with-an-ssh-config-file/).

By default, the Ansible CLI will look for a system-wide Ansible inventory file in `/etc/ansible/hosts`. You can also specify an alternative path for an intentory file with the `-i` flag.

For this tutorial, I'd like to have an inventory file specific to the project directory without having to specify it each time we call Ansible. We can do this by creating a file called `./ansible.cfg` and set the name of our local inventory file:

In [19]:
cat ./ansible.cfg

cat ./ansible.cfg
[defaults]
inventory = ./hosts

: 1

You can check that Ansible is picking up your config file by running `ansible --version`.

In [24]:
ansible --version

ansible --version
ansible 2.1.0.0
  config file = /Users/tdhopper/repos/automating_python/ansible.cfg
  configured module search path = Default w/o overrides


: 1

For this example, I just have one host, a [Digital Ocean VPS](https://www.digitalocean.com/). To run the examples below, you should create a VPS instance on Digital Ocean, [Amazon](https://amazonlightsail.com), or elsewhere; you'll want to configure it for [passwordless authentication](https://www.digitalocean.com/community/tutorials/how-to-set-up-ssh-keys--2). I have an entry like this in my `~/.ssh/hosts` file: 

```
Host digitalocean
  HostName 45.55.395.23
  User root
  Port 22
  IdentityFile /Users/tdhopper/.ssh/id_rsa
  ForwardAgent yes
```
  
and my intentory file (`~/hosts`) is just

```
digitalocean
```

Now I can verify that Ansible can connect to my machine by running the ping command. 

In [39]:
ansible all -m ping

ansible all -m ping
[0;32mdigitalocean | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}[0m


: 1

We told Ansible to run this command on `all` specified hosts in the inventory. It found our inventory by loading the `ansible.cfg` which specified `./hosts` as the inventory file.

It's possible that this will fail for you even if you can SSH into the machine. If the error is something like ` /bin/sh: 1: /usr/bin/python: not found`, this is because your VPS doesn't have Python installed on it. You can [install it with Ansible](http://stackoverflow.com/questions/32429259/ansible-fails-with-bin-sh-1-usr-bin-python-not-found), but you may just want to manually run `sudo apt-get -y install python` on the VPS to get started.

### Writing our first Playbook

While adhoc commands will often be useful, the real power of Ansible comes from creating repeatable sets of instructions called [Playbooks](http://docs.ansible.com/ansible/playbooks.html).

A playbook contains a list of "plays". Each play specifies a set of tasks to be run and which hosts to run them on. A "task" is a call to an Ansible module, like the "ping" module we've already seen. Ansible [comes packaged with about 1000 modules](http://docs.ansible.com/ansible/list_of_all_modules.html) for all sorts of use cases. You can also extend it with your own [modules](http://docs.ansible.com/ansible/dev_guide/developing_modules.html) and [roles](http://docs.ansible.com/ansible/playbooks_roles.html#roles).

Our first playbook will just execute the ping module on all our hosts. It's a playbook with a single play comprised of a single task.

In [3]:
cat ping.yml

---
- hosts: all
  tasks:
  - name: ping all hosts
    ping:

We can run our playbook with the `ansible-playbook` command.

In [4]:
ansible-playbook ping.yml

 ____________ 
< PLAY [all] >
 ------------ 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

 ______________ 
< TASK [setup] >
 -------------- 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

[0;32mok: [digitalocean][0m
 _______________________ 
< TASK [ping all hosts] >
 ----------------------- 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

[0;32mok: [digitalocean][0m
 ____________ 
< PLAY RECAP >
 ------------ 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

[0;32mdigitalocean[0m               : [0;32mok[0m[0;32m=[0m[0;32m2[0m    changed=0    unreachable=0    failed=0   



You might wonder why there are cows on your screen. You can find out [here](https://michaelheap.com/cowsay-and-ansible/). However, the important thing is that our task was executed and returned successfully.

We can override the hosts list for the play with the `-i` flag to see what the output looks like when Ansible fails to run the play because it can't find the host.

Let's work now on installing the dependencies for our Python project. 

### Installing supervisord

"Supervisor is a client/server system that allows its users to monitor and control a number of processes on UNIX-like operating systems." We'll use it to run and monitor our Python process. 

On a Debian-like system, we can install it with APT. In the Ansible DSL that's just:

```
- name: Install supervisord
  sudo: yes
  apt:
    name: supervisor
    state: present
    update_cache: yes
```

You can read more about the [apt module here](http://docs.ansible.com/ansible/apt_module.html). 

Once we have it installed, we can start it with this task:

```
- name: Start supervisord
  sudo: yes
  service:
    name: "supervisor"
    state: running
    enabled: yes
```

This uses the [service](http://docs.ansible.com/ansible/service_module.html) module.

We could add these these tasks to a playbook file (like ping.yml), but what maybe we will want to share it among multiple playbooks? For this, Ansible has a construct called [Roles](http://docs.ansible.com/ansible/playbooks_roles.html). A role is a collection of "variable values, certain tasks, and certain handlers â€“ or just one or more of these things". (You can learn more about variables and handlers in the Ansible docs.)

Roles are organized as subfolders of a folder called "Roles" in the working directory. The rapid proliferation of folders in Ansible organization can be overwhelming, but a very simple rule is just a file called `main.yml` nestled several folders deep. In our case, it's in `./roles/supervisor/tasks/main.yml`.

Check out [the docs](http://docs.ansible.com/ansible/playbooks_roles.html#roles) to learn more about role organiation.

Here's what our roll looks like:

In [13]:
cat ./roles/supervisor/tasks/main.yml

---

- name: Install supervisord
  become: true
  apt:
    name: supervisor
    state: present
    update_cache: yes
  tags:
    supervisor
- name: Start supervisord
  become: true
  service:
    name: "supervisor"
    state: running
    enabled: yes
  tags:
    supervisor



Note that I added `tags:` to the task definitions. [Tags](http://docs.ansible.com/ansible/playbooks_tags.html) just allow you to run a porition of a playbook instead of the whole thing with the `--tags` flag for `ansible-playbook`.

Now that we have the supervisor install encapsulated in a role, we can write a simple playbook to run the roll.

In [8]:
cat supervisor.yml

---
- hosts: digitalocean
    - role: supervisor


In [12]:
ansible-playbook supervisor.yml

 _____________________ 
< PLAY [digitalocean] >
 --------------------- 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

 ______________ 
< TASK [setup] >
 -------------- 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

[0;32mok: [digitalocean][0m
 _________________________________________ 
< TASK [supervisor : Install supervisord] >
 ----------------------------------------- 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

[0;32mok: [digitalocean][0m
 _______________________________________ 
< TASK [supervisor : Start supervisord] >
 --------------------------------------- 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

[0;32mok: [digitalocean][0m
 ____________ 
< PLAY RECAP >
 --

### Installing Conda with Ansible Galaxy

Next we want to ensure that Conda installed on our system. We could write our own role to follow the [recommended process](https://www.continuum.io/downloads). However, Ansible has a helpful tool to help us avoid reinventing the wheel by allowing users to share roles; this is called [Ansible Galaxy](https://galaxy.ansible.com/). 

You can search the Galaxy website for [miniconda](https://galaxy.ansible.com/list#/roles?page=1&page_size=10&autocomplete=miniconda) and see that a handful of roles for installing Miniconda exist. I liked [this one](https://galaxy.ansible.com/andrewrothstein/miniconda/). 

We can install the roll locally using the `ansible-galaxy` command line tool.

In [21]:
ansible-galaxy install -f andrewrothstein.miniconda

- downloading role 'miniconda', owned by andrewrothstein
- downloading role from https://github.com/andrewrothstein/ansible-miniconda/archive/v3.0.0.tar.gz
- extracting andrewrothstein.miniconda to /usr/local/etc/ansible/roles/andrewrothstein.miniconda
- andrewrothstein.miniconda was installed successfully
yaml/explicit format instead..
This feature will be removed in a future release.
ansible.cfg.[0m
- adding dependency: andrewrothstein.unarchive-deps
- adding dependency: andrewrothstein.bash
- adding dependency: andrewrothstein.alpine-glibc-shim
- downloading role 'unarchive-deps', owned by andrewrothstein
- downloading role from https://github.com/andrewrothstein/ansible-unarchive-deps/archive/v1.0.6.tar.gz
- extracting andrewrothstein.unarchive-deps to /usr/local/etc/ansible/roles/andrewrothstein.unarchive-deps
- andrewrothstein.unarchive-deps was installed successfully
- downloading role 'bash', owned by andrewrothstein
- downloading role from https://github.com/andrewrothstein/a

You can have the roll installed wherever you want (run `ansible-galaxy install --help` to see how, but by default they'll go to `/usr/local/etc/ansible/roles/`. 

In [28]:
ls -lh /usr/local/etc/ansible/roles/andrewrothstein.miniconda

total 32
-rw-rw-r--  1 tdhopper  admin   1.1K Jan 16 16:52 LICENSE
-rw-rw-r--  1 tdhopper  admin   666B Jan 16 16:52 README.md
-rw-rw-r--  1 tdhopper  admin   973B Jan 16 16:52 circle.yml
drwxrwxr-x  3 tdhopper  admin   102B Mar 21 11:33 [34mdefaults[39;49m[0m
drwxrwxr-x  3 tdhopper  admin   102B Mar 21 11:33 [34mhandlers[39;49m[0m
drwxrwxr-x  4 tdhopper  admin   136B Mar 21 11:33 [34mmeta[39;49m[0m
drwxrwxr-x  3 tdhopper  admin   102B Mar 21 11:33 [34mtasks[39;49m[0m
drwxrwxr-x  3 tdhopper  admin   102B Mar 21 11:33 [34mtemplates[39;49m[0m
-rw-rw-r--  1 tdhopper  admin    57B Jan 16 16:52 test.yml
drwxrwxr-x  3 tdhopper  admin   102B Mar 21 11:33 [34mvars[39;49m[0m


You can look at the `tasks/main.yml` to see the core logic of installing Miniconda. It has tasks to download the installer, run the installer, delete the installer, run `conda update conda`, and make `conda` the default system Python. 

In [26]:
cat /usr/local/etc/ansible/roles/andrewrothstein.miniconda/tasks/main.yml

---
# tasks file for miniconda
- name: download installer...
  become: yes
  become_user: root
  get_url:
    url: '{{miniconda_installer_url}}'
    dest: /tmp/{{miniconda_installer_sh}}
    timeout: '{{miniconda_timeout_seconds}}'
    checksum: '{{miniconda_checksum}}'
    mode: '0755'

- name: installing....
  become: yes
  become_user: root
  command: /tmp/{{miniconda_installer_sh}} -b -p {{miniconda_parent_dir}}/{{miniconda_name}}
  args:
    creates: '{{miniconda_parent_dir}}/{{miniconda_name}}'

- name: deleting installer...
  become: yes
  become_user: root
  when: miniconda_cleanup
  file:
    path: /tmp/{{miniconda_installer_sh}}
    state: absent
    
- name: link miniconda...
  become: yes
  become_user: root
  file:
    dest: '{{miniconda_parent_dir}}/miniconda'
    src: '{{miniconda_parent_dir}}/{{miniconda_name}}'
    state: link

- name: conda updates
  become: yes
  become_user: root
  command: '{{miniconda_parent_dir}}/miniconda/bin/conda update -y --all'

- name: make