In [None]:
%matplotlib inline

import traceback
import sys
import os

import numpy as np
import matplotlib.pyplot as plt
from IPython.core.magic import register_cell_magic
from IPython.display import display, Audio, Image

# Python and numpy

* Organize Python modules
* Use numpy for array-oriented code
* Python tools for numerical analysis

# Organize Python modules

* One file containing Python code as a <font color="red">**script**</font>
* One Python file is a “<font color="red">**module**</font>”
* One directory containing Python files (satisfying some rules) is a “<font color="red">**package**</font>”
* “Module” is usually used in a loose way to refer to things that may be imported by Python “import” statement.  Then a “module” can mean (the strictly defined) module or a package.

See examples in the [current directory](./)

# What is a script and how it works

* A script is a text file that the program loader sends to an engine (usually interpreter) to execute the content.
* Executable permission needs to be set for the shell to run it:

```
$ ls -al step0.py      # executable bit not set
-rw-r--r--  1 yungyuc  staff  574 Apr  7 22:18 step0.py
$ ./step0.py pstake.py # can't run without permission
-bash: ./step0.py: Permission denied
$ chmod a+x step0.py   # set the executable bit
$ ls -al step0.py      # executable bit is set
-rwxr-xr-x  1 yungyuc  staff  574 Apr  7 22:18 step0.py*
$ ./step0.py pstake.py # properly it runs
811 lines in pstake.py
```

# What’s a script

* Scripts usually are for automating repetitive work.
* Scripts should be short for quick implementation.

The leading line in a script that starts with `#!` is called the *shebang*.  It tells the program loader which executable to run for the script.

```python
#!/usr/bin/env python3
# ...
```

It won't work if executable permission isn't set on the script.

# Example: line counting (`step0`)

This is the first example: `step0.py` file in the [current directory](./).  It counts the number of lines in the file specified with the first argument.

In [None]:
!chmod u+x step0.py
!./step0.py pstake.py

Another way, regardless the executable bit, to run the script is to explicitly call the Python executable.

In [None]:
!chmod u-x step0.py
!python3 step0.py pstake.py

# One-liner

Python executable supports the <font color="red">**`-c`**</font> argument for one-liner.  The content of the script is passed from the command line.  It's called one-liner because it usually only takes one line.

One-liners are convenient for code that is only run once.  Quick to write but hard to read.

In [None]:
!python3 -c 'print(len(open("pstake.py").readlines()), "lines")'

# Make a module

See the example file `step1.py` in the [current directory](./).  It factors out the line-counting code to a distinct function:

```python
def count_line(fname):
    if os.path.exists(fname):
        with open(fname) as fobj:
            lines = fobj.readlines()
        sys.stdout.write('{} lines in {}\n'.format(len(lines), fname))
    else:
        sys.stdout.write('{} not found\n'.format(fname))
```

The other code is for processing command-line arguments.  It's only useful for a script, so we move it into an `if` test:

```python
# This tests whether the code is evaluated as a script.
if __name__ == '__main__':
    if len(sys.argv) < 2:
        sys.stdout.write('missing file name\n')
    elif len(sys.argv) > 2:
        sys.stdout.write('only one argument is allowed\n')
    else:
        count_line(sys.argv[1])
```

# Different behaviors on import

Because **`step1`** checks for `__main__`, when it is imported as a module, nothing happens:

In [None]:
!python3 -c 'import step1'

But importing **`step0`** runs the code:

In [None]:
!python3 -c 'import step0' pstake.py

To run the code, we now need to call the function defined in the **`step1`** module:

In [None]:
!python3 -c 'import step1 ; step1.count_line("pstake.py")'

But when running as a script, both behave the same:

In [None]:
!python3 step0.py pstake.py

In [None]:
!python3 step1.py pstake.py

# Run module as script

Python executable supports the **`-m`** argument.  It imports the script as a module, and still runs it as a script.

In [None]:
!python3 -m step1 pstake.py

With `python -m`, `step0.py` and `step1.py` again behave the same:

In [None]:
!python3 -m step0 pstake.py

# Make the module more like a library

It's common to further factor out the code for script to a **`main`** function.  See the example file `step2.py` in the [current directory](./).

```python
def main():
    if len(sys.argv) < 2:
        sys.stdout.write('missing file name\n')
    elif len(sys.argv) > 2:
        sys.stdout.write('only one argument is allowed\n')
    else:
        count_line(sys.argv[1])


# This tests whether the file is evaluated as a script.
if __name__ == '__main__':
    main()
```

The behavior is the same as **`step1`**:

In [None]:
# run as a script
!python3 step2.py pstake.py

In [None]:
# run the module as a script
!python3 -m step2 pstake.py

In [None]:
# only import the module
!python3 -c 'import step2'

In [None]:
# import and then run the new main function
!python3 -c 'import step2 ; step2.main()' pstake.py

# Make a package

When the code grows to a point, you may need a directory to house it.  Let's use our simple example to show how to make a package.  See the [step3 example directory](./step3/).

No file in the package version `step3` can be run as a script.

In [None]:
# The package __init__.py doesn't work like a module.
!python3 step3/__init__.py numpy.ipynb

Everything else remains working, including the `-m` option of Python executable.

In [None]:
!python3 -m step3 numpy.ipynb

In [None]:
!python3 -c 'import step3 ; step3.main()' numpy.ipynb

# A really useful script

Here I show a real-world example (`pstake.py` in the [current directory](./)) for how to write a useful script: convert [pstricks](http://tug.org/PSTricks/main.cgi/) to an image file.

In [None]:
!rm -f cce.png
!ls cce.png
!./pstake.py cce.tex cce.png 2>&1 > /dev/null
!ls -al cce.png
Image(url="cce.png", width=800)

# Numpy for array-centric code

* Arrays are the best tool to manage homogeneous data.
* The [numpy](http://www.numpy.org/) library provides everything we need for arrays in Python.
* Arrays use contigous memory, sequence doesn't.

In [None]:
# Make a list (one type of Python sequence) of integers.
lst = [1, 1, 2, 3, 5]
print('A list:', lst)

In [None]:
# Import the numpy library. It's a universal convention to alias it to "np".
import numpy as np
# Make an array from the sequence.
array = np.array(lst)
print('An array:', np.array(array))

# Key meta-data

In [None]:
array = np.array([[0, 1, 2], [3, 4, 5]])
print("shape:", array.shape)
print("size:", array.size)
print("nbytes:", array.nbytes)
print("itemsize:", array.itemsize)
print("dtype:", array.dtype)

# Data type

The numpy array is of type `numpy.ndarray`.  An `ndarray` has a property `dtype` for the data type the array uses:

In [None]:
print(type(array))
print(array.dtype)

`numpy.array` is the most basic `ndarray` constructor.  It detects the types in the input sequence data and choose the appropriate dtype for the constructed array.

In [None]:
array1 = np.array([1, 1, 2, 3, 5]) # only integer
print("only int:", array1, type(array1), array1.dtype)
array2 = np.array([1.0, 1.0, 2.0, 3.0, 5.0]) # only real
print("only real:", array2, type(array2), array2.dtype)
array3 = np.array([1, 1, 2, 3, 5.0]) # integer and real
print("int and real:", array3, type(array3), array3.dtype)

* A Python list doesn't know the type it contains, but an array does.
* The type information allows numpy to process the array data using pre-compiled C code.

# Construction

Numpy provides a lot of helpers to construct arrays ([see here](https://www.numpy.org/devdocs/reference/routines.array-creation.html)).  The 3 most common constructors are `empty`, `zeros`, and `ones`:

In [None]:
empty_array = np.empty(4)
print("It will contain garbage, but it doesn't waste time to initialize:", empty_array)
zeroed_array = np.zeros(4)
print("The contents are cleared with zeros:", zeroed_array)
unity_array = np.ones(4)
print("Instead of zeros, fill it with ones:", unity_array)
print("All of their data types are float64 (double-precision floating-point):",
      empty_array.dtype, zeroed_array.dtype, unity_array.dtype)

* `full` is a shorthand for `empty` and `fill`:

In [None]:
empty_array = np.empty(4)
empty_array.fill(7)
print("It's the same as creating an empty array and fill the value:", empty_array)
filled_array = np.full(4, 7)
print("Build an array populated with an arbitrary value:", filled_array)
filled_real_array = np.full(4, 7.0)
print("Build an array populated with an arbitrary real value:", filled_real_array)

* `arange` builds a monotonically increasing array:

In [None]:
ranged_array = np.arange(4)
print("Build an array with range:", ranged_array)
ranged_real_array = np.arange(4.0)
print("Build with real range:", ranged_real_array)

`linspace` returns an array whose elements are evenly placed in a closed interval:

In [None]:
linear_array = np.linspace(11, 13, num=6)
print("Create an equally-spaced array with 6 elements:", linear_array)

# Multi-dimensional arrays

* Multi-dimensional arrays are the building-block of matrices and linear algebra.  Much more useful than one-dimensional arrays.
* Create multi-dimensional arrays by stacking 1D:

In [None]:
ranged_array = np.arange(10)
print("A 1D array:", ranged_array)
hstack_array = np.hstack([ranged_array, ranged_array])
print("Horizontally stacked array:", hstack_array)
vstack_array = np.vstack([ranged_array, ranged_array])
print("Vertically stacked array:", vstack_array)

* `ndarray` by default is row-majoring:

\begin{align*}
A = \left(\begin{array}{ccc}
a_{00} & a_{01} & a_{02} \\
a_{10} & a_{11} & a_{12}
\end{array}\right)
= \left(\begin{array}{ccc}
0 & 1 & 2 \\
3 & 4 & 5
\end{array}\right)
\end{align*}

In [None]:
original_array = np.arange(6)
print("original 1D array:", original_array)

In [None]:
print("reshaped 2D array:", original_array.reshape((2,3)))

* Column-majoring:

In [None]:
print("reshaped 2D array:", original_array.reshape((2,3), order='f'))

Example for 3D arrays:

In [None]:
original_array = np.arange(24)
print("original 1D array:", original_array)

In [None]:
reshaped_array = original_array.reshape((2,3,4))
print("reshaped 3D array:", reshaped_array)

For multi-dimensional arrays, operations can be done along any of the axes.  Take 2D for example.  First sum along the 0th-axis:

\begin{align*}
A_{\mathrm{along } 0} = \left(\begin{array}{ccc}
a_{00} + a_{10} & a_{01} + a_{11} & a_{02} + a_{12}
\end{array}\right)
= \left(\begin{array}{ccc}
3 & 5 & 7
\end{array}\right)
\end{align*}

Do it along the 1st-axis:

\begin{align*}
A_{\mathrm{along } 1} = \left(\begin{array}{cc}
a_{00} + a_{01} + a_{02} & a_{10} + a_{11} + a_{12}
\end{array}\right)
= \left(\begin{array}{ccc}
3 & 12
\end{array}\right)
\end{align*}

In [None]:
print("Summation along 0th axis:", reshaped_array.sum(axis=0))

In [None]:
print("Summation along 1st axis:", reshaped_array.sum(axis=1))

# Selection: extract sub-array

There are 3 ways to create sub-arrays:
1. Slicing
2. Integer indexing
3. Boolean indexing

# Slicing

The array created from slicing shares the buffer of the original one:

In [None]:
array = np.arange(10)
print("This is the original array:", array)

sub_array = array[:5]
print("This is the sub-array:", sub_array)

sub_array[:] = np.arange(4, -1, -1)
print("The sub-array is changed:", sub_array)

print("And the original array is changed too (!):", array)

New buffer can be created by copying the returned array:

In [None]:
array = np.arange(10.0)
print("Recreate the original array to show how to avoid this:", array)

# Make a copy from the slice.
sub_array = array[:5].copy()
sub_array[:] = np.arange(4, -1, -1)
print("The sub-array is changed, again:", sub_array)
print("But original array remains the same:", array)

Slice one dimension in a multi-dimensional array:

In [None]:
array = np.arange(24).reshape((2,3,4))
print("orignal:\n%s" % array)
array[:,1,3] = np.arange(300,302)
print("find 300, 301:\n%s" % array)

Slice two dimensions in a multi-dimensional array:

In [None]:
array = np.arange(24).reshape((2,3,4))
print("orignal:\n%s" % array)
array[:,0,:] = np.arange(200,208).reshape((2,4))
print("find the number [200,208):\n%s" % array)

# Integer indexing

In [None]:
array = np.arange(100, 106)
slct = np.array([1, 3])
print("select by indice 1, 3:", array[slct])
slct = np.array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5])
print("new array is bigger than the old one:", array[slct])
array2 = array.reshape((2,3))
slct = np.array([1])
print("select by indice 1:", array2[slct])
slct = np.array([[0,0], [0,1], [1,2]])
print("select by indice (0,0), (0,1), (1,2):", array2[slct[:,0],slct[:,1]],
      "using", slct)

# Boolean selection

The Boolean arrays filter wanted or unwanted elements in another array.

In [None]:
less_than_5 = ranged_array < 5
print("The mask for less than 5:", less_than_5)
print("The values that are less than 5", ranged_array[less_than_5])

all_on_mask = np.ones(10, dtype='bool')
print("All on mask:", all_on_mask)

all_off_mask = np.zeros(10, dtype='bool')
print("All off mask:", all_off_mask)

# Broadcasting

[Broadcasting](https://docs.scipy.org/doc/numpy-1.16.1/reference/ufuncs.html#broadcasting) handls arrays of different shapes participating in an operation.

1. All input arrays with number of dimension smaller than the input array of largest number of dimension, have 1’s prepended to their shapes.
2. The size in each dimension of the output shape is the maximum of all the input sizes in that dimension.
3. An input can be used in the calculation if its size in a particular dimension either matches the output size in that dimension, or has value exactly 1.
4. If an input has a dimension size of 1 in its shape, the first data entry in that dimension will be used for all calculations along that dimension.

Broadcasting is to handle arrays of different shapes.

In [None]:
a = np.arange(2); print("a =", a)
b = np.arange(10,12); print("b =", b)
print("a+b =", a+b) # good: same shape
c = np.arange(3); print("c =", c)
try:
    print(a+c)
except ValueError:
    print("cannot do a+c: a.shape %s != c.shape %s" % (a.shape, c.shape))

In [None]:
a = np.arange(5,7).reshape((2,1))
b = np.arange(10,13).reshape((1,3))
c = np.arange(100,103)
d = np.array(1000)
print(a, b, c, d)
print(a.shape, b.shape, c.shape, d.shape)
v = np.arange(2*3).reshape((2,3))
print(v, v.shape)
print(a*b)

# Python tools for numerical analysis

There are two equally important activities for software development.  One is to write code.  We will need to learn some basic concepts to write meaningful code.

The other is to use code written by other people.  Especially in the early stage of development, we want to quickly see the results.  We may just use the results of other software.  We may directly incoporate the foreign (usually, also called "third-party") software, if the situation allows.  Otherwise, we can replace the quick prototype in a later phase.

In this lecture, I will introduce 4 useful tools for numerical analysis that you may use thoughout the course and your future work.

# Jupyter (notebook)

"Project Jupyter exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages." -- https://jupyter.org

* This presentation is done by using Jupyter
* Show the code and run it in the same time
* Terminal access
* File management
* JupyterLab integrated environment

Jupyter is a client-server system.  What we are touching and playing is its "frontend", the interactive user interface.  It talks to the "backend", which is called a Jupyter kernel.  See the following image ([source](https://jupyter.readthedocs.io/en/latest/architecture/how_jupyter_ipython_work.html)):

<center><img src="https://jupyter.readthedocs.io/en/latest/_images/notebook_components.png" alt="Jupyter distributed architecture" /></center>

The system is distributed.  The browser and the Jupyter server run on different computers and HTTP is used to connect them.  The kernel can also be configured to run on a different computer than the server.

Jupyter has 3 types of cells:

1. Code.  The content will be executed.
2. [Markdown](https://jupyter-notebook.readthedocs.io/en/stable/notebook.html#markdown-cells).  Use a mark-up language called "[markdown](https://daringfireball.net/projects/markdown/)" to format text.
3. Raw nbconvert.  Jupyter skip processing the content and pass is through to other converting tools.

Most of the time we only care about the interactive computing capability provided by the code cell.

# Python code

In [None]:
import numpy as np
v1 = np.array([1,1,1], dtype='float64')
v2 = np.array([1,-1,0], dtype='float64')
print("dot product between v1 and v2:", v1.dot(v2))
print("v1 length:", np.sqrt((v1**2).sum()))

In [None]:
# simple math
d = 30.*np.pi/180.
print("trigonometric function at 30 degree:", np.sin(d), np.cos(d), np.tan(d))

# IPython magic

[IPython](https://ipython.readthedocs.io/) provides the Jupyter kernel for enhanced interactive execution.  The "magic" are part of the enhancements.  There are two types of magic commands: line and cell.  A line magic is a line starting with "`%`".

In [None]:
import sys
print(sys.path) # show python import paths
%pwd # print current directory

# Cell magic

A line starting with "`%%`" indicates a magic that taks all the content of a cell.

In [None]:
%%script bash
whoami
pwd
ls -l

# Other features

* Escape to shell in a line starting with "`!`":

In [None]:
!uptime

* Editor
* Terminal

# Drawing using matplotlib

[Matplotlib](https://matplotlib.org) is a powerful library for 2D plotting.  It can be used standalone or integrated with Jupyter notebook.  The following magic enables the integration.

In [None]:
%matplotlib inline

# Linear algebra with numpy and scipy

Use a singular value problem to demonstrate how to use the linear algebra tools provided in numpy and scipy.

# Package managers

To study this course you would need a runtime environment that has the software tools installed.  Although manually building all the dependencies from source is sometimes unavoidable, it's too time-consuming to do it always.

Usually we will use a package manager to help.  A package manager provides recipes for building package from source, and also pre-built binary packages.  It defines the dependencies between the packages.  For example, for scipy to work, numpy needs to be installed beforehand.  A package manager should allow automatic installation of numpy when you request scipy.

In the numerical analysis world, [conda](https://conda.io/) is one of the most versatile package manager that we will use.

# conda

* Anaconda
* Conda-forge

# pip

"[pip](https://pip.pypa.io/) is the package installer for Python.  You can use pip to install packages from the [Python Package Index](https://pypi.org/) and other indexes."

# Exercises

1. List all primitive types supported by `numpy.ndarray` on x86-64.
2. Port "`step0.py`" to use bash.
3. Modify the script "`step0.py`" so that it reads the enrionment variable named "`PYTHON_BIN`" that specifies the location of the Python executable for the script.  Hint: play a trick using bash.

# References

* [Broadcasting arrays in Numpy](https://eli.thegreenplace.net/2015/broadcasting-arrays-in-numpy/) by Eli Bendersky