# INF200 Lecture No J01

###  7 January 2019

## Today's topics

1. Your repositories
1. Project directory structure
1. Modules and packages (and packages)

## Project overview

### Exam information
- Oral exam **Monday 27th & Tuesday 28th of January**
- Presentation and discussion, approximately 30 minutes
- Each pair of students will **present and discuss together**
- Notify Yngve by email **today** if you have date constraints due to other exams or similar

#### Schedule

- Mandatory attendance Monday-Friday 09.15-15.00.

| Date | Event / Milestone |
|:---- |:----------------- |
| 06 Jan 09:15 | Start, T434 |
| 07 Jan 15:00 | Bitbucket Team Repo and PyCharm Project ready |
| 07 Jan 15:00 | Problem and requirements analysis |
| 10 Jan 15:00 | Demonstration of a first functional simulation of *herbivores* in one place (no migration) |
| 13 Jan 15:00 | Project plan for remaining work |
| 17 Jan 15:00 | Demo of working simulation of herbivores and carnivores, all types of terrain and simple visualization |
| 22 Jan **12:00** | Full simulation code incl documentation |
| 25 Jan **12:00** | PDF and animation for oral presentation |
| 27 & 28 Jan | Oral exam |


#### Preliminary list of lectures

- Lectures will be given as required, usually at 09.15
- The lecture list below is indicative and subject to change on short notice

| Date | Topic |
|:-----|:------|
| 06 Jan | Python packages, GitHub issues |
| 07 Jan | Class and static methods, branches |
| 08 Jan | Testing: mocking, fixtures, statistical tests |
| 09 Jan | Documenting source code with Sphinx |
| 13 Jan | Model dynamics, Efficient visualization |
| 14 Jan | Packaging for distribution, Automated testing |
| 15 Jan | Delivery and exam information |
| 16 Jan | A very brief introduction to Cython and C |

## GitHub Repositories

- In the PA exercises, you collaborated by pulling code from each other's repos
- Does not scale beyond small groups
- Solution: team repositories
    - A team has multiple members
    - Members can have different rights
    - Typically, all developers can push to the team repo
    - On GitHub, team repos are created through organisations
    - Organisations cost money unless your repositories are open
- Alternative: shared repository
    - One of you create a repository and add your partner as collaborator
    - Not ideal, but we will use that here
    - Add me and the TA's as well
    - Both users pull and push to SAME repo.
- Important
    - Always pull before you push
    - **NEVER force a push**
        
### Work planning on GitHub

- Use `Issues` to keep track of all you need and want to do
- Create an issue for each thing to do
- Close an issue when the task is done or the problem solved

## Project directory structure

- In your `BioSim_Gxx_<Name1>_<Name2>` repository, create PyCharm project `biosim`
- Set up the following directory structure

```
BioSim_Gxx_<Name1>_<Name2>
    README.md
    setup.py
    setup.cfg
    tox.ini
    src
       biosim
          __init__.py
    examples
       check_sim.py
    tests
       test_biosim_interface.py
```

### Notes
- `__init__.py` is a necessary file that is run whenever someone writes import biosim
- Your modules go into directory `biosim`
- Scripts using the `biosim` package go into `examples`
- All tests go into `tests`

## Modules and packages (and packages)


- We could put all our code in a single file, but this is far from optimal
    - Large files are difficult to work with
    - We can only keep 5-7 things on our mind, hierarchical structure makes it easier to keep an overview
    - We may want to re-use different code in different places

Python's solution: *Modules* and *Packages*

* **Module:** A single Python file
* **Package:** A collection of python files (can think of it like a folder)

[Python Tutorial, ch 6](https://docs.python.org/3.6/tutorial/modules.html)

## Modules

Each Python file is a module

What is the difference between a python *module* and a python *script*?
* Technically: No difference
* In practical usage:
    * Script: Python code to be run, not imported
    * Module: Python code to be imported, not run

#### Example Python module

We will create a Python module live through Jupyter notebook, using [cell magic](https://ipython.readthedocs.io/en/stable/interactive/tutorial.html#magic-functions). Before that, we remove files and directories that will be written by this notebook so we start with a clean slate.

In [1]:
%rm -rf *.py *.pyc

In [2]:
%%writefile mystats.py

"""mystats provides some statistical functions."""

import math

def _square(data):
    return [x**2 for x in data]

def mean(data):
    """Returns arithmetic mean of sample data."""
    return sum(data) / float(len(data))

def var(data):
    """Returns variance of data."""
    return mean(_square(data)) - mean(data)**2

example_data = [1, 3, 2, 4, 5, 8, 1]

Writing mystats.py


Now that there is a file called `mystats.py`, we can import the module

In [3]:
import mystats
print(mystats.mean([1, 2, 3]))
print(mystats.var([1.5, 3, 4.5]))
print(mystats.example_data)

2.0
1.5
[1, 3, 2, 4, 5, 8, 1]


* Importing a module doesn't bind any names other than the module name itself
* Variables can however by imported directly from a module aswell

In [4]:
from mystats import mean
mean([1, 2, 3])

2.0

* We can also import all functions from a module
* Ok for interactive work, not recommended in scripts and modules
* Names beginning with _ are not imported

In [5]:
from mystats import *
print(example_data)
print(_square(9))

[1, 3, 2, 4, 5, 8, 1]


NameError: name '_square' is not defined

- The `dir` command lets us look at all names defined in the module

In [6]:
dir(mystats)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_square',
 'example_data',
 'math',
 'mean',
 'var']

Note that the module contains variables we have not set, such as `__doc__` and `__name__`

In [7]:
mystats.__doc__

'mystats provides some statistical functions.'

In [8]:
mystats.__name__

'mystats'

These variables contain *meta-information*, such as the modules docstring

The `__name__`-variable is a bit special:
- If a module is imported, it is set to the name of the file (without the `.py` extension)
- If a modul is executed directly, it is set to `__main__`

This is why we include a `if __name__ == '__main__':` at the bottom of our scripts, it is only true if the script is executed, not if it is imported

#### Example Python script

- A script typically does some work and is not intended for import
- An alternative to writing a script could be to create a jupyter notebook instead

We now create a python script using cell magic

In [9]:
%%writefile run_stats.py

from mystats import mean, var
import random
import math

for sample_size in [10, 100, 1000, 10000]:
    sample = [random.random() for _ in range(sample_size)]
    print('{:6}{:12.8f} +- {:12.8f}'.format(sample_size, mean(sample), 
                                            math.sqrt(var(sample)/(len(sample)-1))))

Writing run_stats.py


In [10]:
%run run_stats.py

    10  0.32636787 +-   0.08623897
   100  0.46119940 +-   0.02693256
  1000  0.49633620 +-   0.00905034
 10000  0.50110513 +-   0.00287773


- Note that we must import `math` explicitly, it is not "inherited" from `mystats`
- The `%run` magic executes the script
- Note that we can easily import from our `mystats` module, because the script and the module are stored in the same directiory

## Running vs importing

A program can be executed in many ways:
- Execute `python run_stats.py` on a command line/terminal
- Running it through PyCharm
- Running it from the notebook using `%run`
    - When a program is executed this way, `__name__` is set to `__main__`

All code in the script is executed sequentially

We import a package by using the `import` keyword
- In a python module or script
- In a python shell
- In a notebook

When we import a module, all the code is executed sequentially, but only the first time we import it

We can 'hide' code we do not want to be executed inside an `if __name__ == '__main__'` test

In [11]:
%%writefile my_verbose_module.py

print("This is a verbose module")

def foo():
    print("It prints lots of things")
    
if __name__ == '__main__':
    foo()

Writing my_verbose_module.py


In [12]:
import my_verbose_module
my_verbose_module.foo()

import my_verbose_module
import my_verbose_module
import my_verbose_module

This is a verbose module
It prints lots of things


The code in the module is only executed the first time the module is imported. The code in the main check is not executed when we import the module, so we call `foo` manually instead

If we run the code instead, the whole code is executed, including the main check

In [13]:
%run my_verbose_module

This is a verbose module
It prints lots of things


### Reloading packages

Multiple imports do not re-execute code to be efficient. In larger codes the same modules might be imported several times by accident, so any future imports simply do nothing.

If we truly want to re-import a package, if for example the module has changed since the first import, we need `reload` from `importlib`

In [14]:
from importlib import reload

In [15]:
reload(my_verbose_module);
reload(my_verbose_module);
reload(my_verbose_module);

This is a verbose module
This is a verbose module
This is a verbose module


* Using `%run` in a notebook will run the newest version of the script, but will *not* reload imported modules
* The same is true for interactive sessions

This can lead to some confusion when working interactively, but should rarely be a problem when writing scripts. You can use the [`autoreload` extension for IPython](http://ipython.readthedocs.io/en/stable/config/extensions/autoreload.html?highlight=autoreload) to ensure modified modules are reloaded

### Where can we import from?

If we try to import a module named `spam`, Python will have to search to find a module with the correct name

It searches in the same way as it does for variables, it first looks locally, and then extends outwards
* built-ins
* the directory containing the input script (or the current directory)
* PYTHONPATH (a list of directory names, with the same syntax as the shell variable PATH)
* the installation-dependent default


The fact that Python looks locally first is important, because this means you should *not* create files with the same names as modules you are going to import, because it can lead to headaches

You can see where Python looks for modules by looking at `sys.path`

In [1]:
import sys
sys.path

['/home/yngvem/anaconda3/lib/python36.zip',
 '/home/yngvem/anaconda3/lib/python3.6',
 '/home/yngvem/anaconda3/lib/python3.6/lib-dynload',
 '',
 '/home/yngvem/anaconda3/lib/python3.6/site-packages',
 '/home/yngvem/Programming/tensorly',
 '/home/yngvem/Programming/jobb/mask_stats/src',
 '/home/yngvem/Programming/morro/blog/mean-girls/face_morpher',
 '/home/yngvem/Programming/jobb/code_printer/src',
 '/home/yngvem/anaconda3/lib/python3.6/site-packages/IPython/extensions',
 '/home/yngvem/.ipython']

Importing from your own module is automatic as long as they are in the same directory, but what happens if you want to import from a different directory?

 * **Bad solution:** Manipulate sys.path
 * **Good solution:** Install it in editable mode

## Manipulating `sys.path`

Sometimes, you just want to import a file from another directory once. Then, it is ok to manipulate sys.path. To do this, use the following code

```python
import sys

sys.path.append(r'C:\path\to\module')
```

## Editable installs

If you think that you may import a file more than once, then an editable install is really the way to go. To learn more about this, we first need to learn about packages.

## Packages

A package is, simply put, a collection of python modules

They exist to create structuring and sharing larger projects easier




#### Example - Sound Effects

This example is taken from the [Python documentation](https://docs.python.org/3.6/tutorial/modules.html)

Let's say you want to create code that takes sound files or data and applies various sound effects too them. To make the project more structured you choose to add a new module for each type of sound effect you want to add

You project structure can then look like this
- `/effects/`
    - `__init__.py`
    - `echo.py`
    - `surround.py`
    - `reverse.py`
    - `autotune.py`
    - ....



To create the project we simply gather all the different modules (`echo.py`, `surround.py`, and so on) in a single directory, and then we create a `__init__.py` file

The `__init__.py` file specifies to Python that the `effects` directory should be interpreted as a *package*. The file itself can be empty

We can then use the package as follows
```python
import effects
effects.echo.add_echo(sound)

import effects.echo
effects.echo.add_echo

from effects.echo import add_echo
add_echo(sound)
```

These packages are sometimes referred to as 'multi-file modules'. You have probably used them without thinking about it:
- matplotlib.pyplot
- numpy.random

They are also referred to as 'import packages', as they are primarily used to define how importing the various modules should be done

### Sub-packages

Packages can be defined in a nested hierarchy. Let us say we extend our 'sound effects' project to also include other handling of sound files, such are changing formats or adding filters

After some work our project might look like this

- `sound/`
    - `__init__.py`
    - `formats/`
        - `__init__.py`
        - `wavread.py`
        - `wavwrite.py`
        - `aiffread.py`
        - `aiffwrite.py`
        - `auread.py`
        - `auwrite.py`
        - `...`
    - `effects/`
        - `__init__.py`
        - `echo.py`
        - `surround.py`
        - `reverse.py`
        - `...`
    - `filters/`
        - `__init__.py`
        - `equalizer.py`
        - `vocoder.py`
        - `karaoke.py`

Here, `sound` is a package (because it is a folder with a `__init__.py` file), that contains three sub-packages: 
- `formats`
- `effects`
- `filters`

Each of the subpackages contain their own `__init__.py` file to signify that they are to be treated as packages aswell

### Relative imports

- Modules within a package often depend on each other
    - must import each other
    - must avoid importing modules of same name elsewhere in `sys.path`

- Solution: relative imports
    - always in the form
    
      ```Python
      from <module or package> import <something>
      ``` 
    - always start with a dot `.` (or several dots)
    - see [Tutorial, ch 6.4.2](https://docs.python.org/3.6/tutorial/modules.html#intra-package-references) and [PEP 328](http://legacy.python.org/dev/peps/pep-0328/)

#### Relative import properties
- The `.` marks import as *relative* 
- Python looks for modules only within the package
- does not look in directories in `sys.path`
- avoids confusion with modules/packages of same name elsewhere
- **work only inside packages**

#### Modules with using relative imports cannot be run
- Modules that are part of packages and therefore use relative imports cannot be executed directly (as scripts)
- You cannot "Run" them in PyCharm
- This is intentional: packages are to be imported
- To use the code from such modules, create a script that imports the module
    - create the script outside the package
- See `J01/chutes_project/examples/chutes_demo`


## Distribution Packages

Let us say you have spent the last year creating some really great Python code, and now you want to share it with others. What do we need to do?
- Need to put "everything together" into a nice "parcel"
- Need to handle *dependencies* (e.g., that our code needs NumPy)
- Need to "spread the word (code)"

**Python solution**: *Packaging*

## Creating distribution packages

### Setuptools
The most common tool to create distribution packages is through `setuptools`. With `setuptools`, you specify the project dependencies in a `setup.py` and `setup.cfg` file. The `setup.py` file may look like this.

```python
from setuptools import setup

setup()
```

When this file is run, you will install the package, using the configuration specified in `setup.cfg`.

### Poetry
A relatively new but increasingly popular way to create a distribution package is through poetry, which some of you may want to look into.

## In the biosim project, we will use setuptools.

## Editable installs

When you install a Python package, all files will be copied to the `site-packages` directory. However, sometimes, we don't wish to install a set version of the package, but install an *editable* version of it. That is, a version where any change to the source directory affects the code that you can run. To install a file in editable mode, you open Anaconda prompt (or a terminal window for Linux or Max), navigate to the package directory (the directory with the `setup.py` file) and write `pip install -e .`. The `-e` means editable mode and the `.` means the current directory. This wil create a link from the site-packages directory to the current directory so you may import the package from any other Python file on your system.

#### Packages *vs* Packages

You might have noted that we now have to different things called *packages*, they are either
- Collections of modules (import packages)
- A collection of code neatly packaged for sharing with others (distribution packages)

Yes, having the same name for two different things is confusing. Programmers are horrible at naming conventions, we just have to deal with that

The [Python Packaging User Guide Glossary](https://packaging.python.org/glossary/) defines a Distribution Package as

"A versioned archive file that contains Python packages, modules, and other resource files that are used to distribute a Release. The archive file is what an end-user will download from the internet and install."

**Details on distribution packages will follow in a later lecture.**

## Useful links
 - [Short guide of best practices for Python packaging](https://github.com/yngvem/python-project-structure)
 - [An opinionated blogpost that discusses best practices for packaging](https://blog.ionelmc.ro/2014/05/25/python-packaging/)
 
## Useful tools
 - [Trello](https://trello.com)
     - Alternative builtin in GitK