In [1]:
name = "2020-11-05-ways-of-python"
title = "Many ways to run Python"
tags = "basics, anaconda, command line, hpc, jupyter"
author = "Callum Rollo, Nele Reyniers"

In [2]:
from nb_tools import connect_notebook_to_post
from IPython.core.display import HTML

html = connect_notebook_to_post(name, title, tags, author)

Unlike other programs that have a single programming interface (matlab) or a dominant interface de jour (R with RStudio), Python has a whole ecosystem of programs for writing it. This can be confusing at first, with so much choice, what should you use for your project?

This presentation will cover some of the most popular Python interfaces, their pros and cons, and some situations in which one may be preferable to another. We will also discuss some operational details of the Anaconda package management system.

You can see a recording of this presentation [here](https://eu-lti.bbcollab.com/recording/c55b51dda6e04d1595d15937bf7a76ff)


In [None]:
from pygments import highlight
from pygments.lexers import PythonLexer
from pygments.formatters import HtmlFormatter
import IPython

## 1. Jupyter notebooks

These are our primary teaching tool.

### Pros
- Web based interface, easy to maintain
- Inline figures and markdown cells make great workbooks
- Encourages self documenting code
- "magic" functions to interact with operating system
- Can share interactive notebooks online, e.g. via [Binder](https://mybinder.org/)

### Cons
- Harder to automate the scripts
- Makes a mess in git
- Requires a GUI to run/efficiently examine the notebooks


Also check out [jupyterlab](https://jupyter.org/). This is the new standard for jupyter. Much more powerful and integrated. All projects written in notebooks can be continued in lab with no changes needed

In [2]:
# Demo some jupyter stuff here

## 2. Integrated Development Environments (IDEs)
These are full featured tools for code development. [Spyder](https://www.spyder-ide.org/) is very popular among scientists. Especially if you are coming from a matlab or RStudio background, the appearance of this IDE is very familiar and comforting. The whole thing isitself made in Python which is pretty cool

[Pycharm](https://www.jetbrains.com/pycharm/) is like Spyder on steriods

### Pros
- See variables, file system, command line and code at a glance
- Loads of plugins (especially Pycharm)
- Smart autocompletion
- Code highlighting for e.g. unused imports, missing whitespace
- Can handle outside programs like git

### Cons
- Heavy on OS resources, especially RAM
- Can be slow to start

In [3]:
# Demo an IDE, including code hints and autocompletion

## 3. Python fresh from the command line

Just open up a Python prompt and start coding

This is a farily rare use case unless you are doing something very short. However, it's good to remember that this is availble. On pretty much any unix system (Linux, Mac etc) you can get straight to Python from the command line. This can be useful if you're logged in to a remote server and need to execute some Python in a hurry.

If you're writing more than a couple of lines however, you'll want to write some `.py` files and run them

In [4]:
# Python command line demo

## 4. Python in files


You can write Python in any text editor program. On UNIX systems vim and emacs remain popular after several decades. Atom is a more user friendly GUI based option. Windows users can try notepad++ for Python support

### Pros
- Simple and lightweight
- Always there for you (especially vim)
- Super portable scripts
- Easy to automate with tools like cron

### Cons
- Limited autocompletion and error checking
- No easy way to check workspace (variables, path etc)
- Working with figures can be difficult (need to save to file and display)

### Providing inputs to Python scripts run from the command line
There are different ways to turn your Python program (.py) into a commandline tool. We will demonstrate two of these options below.

#### sys.argv
The sys module is part of the standard Python library and contains functions to access and modify variables of the Python runtime environment. In this tutorial, we're only demonstrating one of its functions: `sys.argv`. 

Let's look at the contents of a python script called ```halloween_sysargv.py``` below. It is a very simple demonstration of how to provide numerical, string (for example filenames!) or list inputs to a python program. 

In [5]:
# Show content of a python script with syntax highlighting. Shamelessly copied from jgosmann's answer on 
# stackoverflow.com/questions/19197931/how-to-show-as-output-cell-the-contents-of-a-py-file-with-syntax-highlighting
with open('halloween_sysargv.py') as f:
        code = f.read()
formatter = HtmlFormatter()
IPython.display.HTML('<style type="text/css">{}</style>{}'.format(
    formatter.get_style_defs('.highlight'), highlight(code, PythonLexer(), HtmlFormatter())))

Now we can run this script called halloween.py in the shell as follows:

In [6]:
! python3 halloween_sysargv.py 13 pumpkin cat,bat,spider  # the exlamation mark tells Jupyter we're running a shell command.

13 is a <class 'int'>
pumpkin is a <class 'str'>
['cat', 'bat', 'spider'] is a <class 'list'>
This program is called halloween_sysargv.py


So, everything after `python3 halloween.py` ends up as a `string` in a `list` returned by `sys.argv`. The first element of sys.argv is always the name of the program that is being run. 

#### argparse
With argparse, you can easily supply your Python program with input from commandline in a more user friendly way. Inputs are supplied to your python program in the following format: 
```
python myprogram.py -a avalue -b bvalue --option-c cvalue -f
```

The predecessor of argparse is optparse.

Content of halloween_argparse.py:

In [7]:
# Show content of a python script with syntax highlighting. Shamelessly copied from jgosmann's answer on 
# stackoverflow.com/questions/19197931/how-to-show-as-output-cell-the-contents-of-a-py-file-with-syntax-highlighting
with open('halloween_argparse.py') as f:
        code = f.read()
IPython.display.HTML('<style type="text/css">{}</style>{}'.format(
    formatter.get_style_defs('.highlight'), highlight(code, PythonLexer(), HtmlFormatter())))

In the terminal, we can provide inputs using the flags we specified: 

In [8]:
! python3 halloween_argparse.py -n 13 --animals=cat,bat,spider,wolf -c  # ! running in shell

13.0 is a <class 'float'>
pumpkin is a <class 'str'>
cat,bat,spider,wolf is a <class 'str'>, but ['cat', 'bat', 'spider', 'wolf'] is a <class 'list'>
True is a <class 'bool'>


One of the advantages of argparse is that a help function is automatically generated from the "help" argument you supply when adding options:

In [9]:
! python halloween_argparse.py --help  # ! running in shell

usage: halloween_argparse.py [-h] [-n NUMBER] [-l LANTERN] [-a ANIMALS] [-c]

optional arguments:
  -h, --help            show this help message and exit
  -n NUMBER, --number NUMBER
                        An unlucky number (float).
  -l LANTERN, --lantern LANTERN
                        Material to carve a lantern out of (string, default is
                        pumpkin).
  -a ANIMALS, --animals ANIMALS
                        A comma separated list of animals typically associated
                        with halloween. Example: bat,cat,rat (string)
  -c, --christmas       Indicate whether it is Christmas yet (bool, default is
                        False).


See the [documentation](https://docs.python.org/3/library/argparse.html) and [tutorial](https://docs.python.org/3/howto/argparse.html) to find out what else you can do with argparse.

## 5. Python on the HPC
Depending on your research, your data and your computer, you may want to consider running some or most of your analyses and experiments on a High Performance Computer (HPC). While the HPC is running your Python programs, your own machine is not burdened, so you can freely use it for other tasks or shut it off. 

UEA has its own HPC for research: the new ADA Cluster. 
This provides me with an excellent excuse to insert an image of 19th century visionary Ada Lovelace.

![Picture of Ada Lovelace](../figures/alovelace.png)

For more introduction on high performance computing and ADA, please see [the UEA Research and Specialist Computing Support help pages](https://rscs.uea.ac.uk/high-performance-computing1). The HPC Team offers to meet with all new users to help you get started. 
You can use Conda to manage Python environments on ADA. Information on how to build and activate conda python environments on ADA can be found [here](https://rscs.uea.ac.uk/ada/using-ada/software/conda-python).

On a HPC, you can either work **interactively** or **submit batch jobs**. 

When submitting batch jobs (after code development and testing locally or in an interactive session), only the fourth way of Python above is available to you. Providing inputs from the command line will come in handy when submitting (array) jobs. Note that in batch jobs, you need to activate conda environments with `source activate myenv` instead of the otherwise recommended `conda activate myenv`.

In an interactive session, the recommended ways to work with Python on ADA are options 3 and 4 from above (from the UEA HPC team: "Jupyter Notebooks and IDEs rely on graphical interfaces that have high overheads and therefore generally don't work well on a cluster environment"). The [file editors available on ADA](https://rscs.uea.ac.uk/ada/using-ada/editing-files) are nano, nedit, emacs, Vi and gvim.

-------------------------------

# Anaconda

If you are not already familiar with Anaconda, it is a distribution of Python geared toward data scientists that aims to make it quick and easy to manage multiple projects with differing dependencies.

With Anaconda you can maintain seperate **environments** for all your projects.

Why would you want to do this? Different projects require different packages, and not all of these packages are able to interoperate. Particularly in science, we often need to use legacy software dependant on older modules. If you want to work on one project built in Python 2.7 and your new stuff in 3.8, you'll need to keep them seperate on your system so they don't interfere with each other. Anaconda is a very user friendly way to acheive this.

![Schematic of Anaconda operation](../figures/conda.jpg)

The key to anaconda is **environments**. These are collections of Python modules, non Python programs (like jupyter notebooks, GDAL or Spyder) and a specific version of Python itself. There is no limit to the number of environments you can have. The only requirement is that each one has a unique name on your system.

Here's an example environment from our PPD Python course

```yml
name: ppd_python
channels:
   - defaults
   - conda-forge
dependencies:
   - python=3.8
   - ipython
   - jupyter
   - numpy
   - matplotlib
   - pandas
   - cartopy
   - xarray
   - netcdf4
   - seaborn
   - spyder
   - tqdm
   - scipy
   - iris
   - plotly
   - cftime
```

The environment is created from a textfile. You need to specify a names, sources and the modules (dependencies) you need. In this case we specified `Python=3.8`, `jupyter` to run notebooks and a bunch of modules including `numpy`, `matplotlib` and `scipy`. This should be all anyone needs to replicate the same environment on their machine and run the scripts succesfully. If you are sharing code with others, always include an environment file so it runs correctly.

We will do a more detailed demo of package management with Anaconda in the future

# How I start a Python project

![Masterful flow chart of Python decision making](../figures/flow-chart.png)

\**Other Hosting Services Are Available*

-------------------
### Reading

- If you want a good science environment file to start from, try the one from [ppd_python](https://github.com/ueapy/ppd_python). You'll find some handy conda instruction in the repo description. [Click to download the zip](https://github.com/ueapy/ppd_python/archive/master.zip) You want the environment.yml file. The environment is based on Python 3.8 which will be supported until [October 2024](https://python-release-cycle.glitch.me/)
- A solid [intro to git](https://swcarpentry.github.io/git-novice/) by Software Carpentry

- A [cool trick with conda](https://www.leouieda.com/blog/conda-envs.html) for bash users by Leo Uieda. N.B. `conda activate` is preferred to `source activate` these days.

### Sources
- [Python on the ADA HPC](https://rscs.uea.ac.uk/ada/using-ada/software/conda-python)

#### Images
- Conda image: https://www.imperial.ac.uk/admin-services/ict/self-service/research-support/rcs/support/applications/conda/ 
- Ada Lovelace: https://blogs.scientificamerican.com/observations/ada-lovelace-day-honors-the-first-computer-programmer/
- Flow chart made with [graphviz](https://graphviz.org/)

In [3]:
HTML(html)