# A brief introduction to modules in Python

https://docs.python.org/3/tutorial/modules.html

We've already used lots of modules

- math
- pandas
- matplotlib

They enable us to

- reuse code between python programs and notebooks
- share our code with others
- modularize our software --- important

Modularity is _very_ important --- if we don't divide things up into bite-sized chunks complexity can overwhelm.

Also, it is easier to share and reuse small things rather than big complicated things.

Also, in Python testing is much easier and much more effective if done on small units (hence the name unit testing).

To make a module we just need to

- save the code we want to use in a file named `myhello_1.py` (or whatever you want to call it),
- put the file somewhere Python can find it (for now we will use the current directory)
- import it

Let's start with something simple ... write a function that prints "Hello"


In [None]:
%%writefile myhello_1.py
def hello():
    print("Hello")


hello()
print("XXXXXXXXXXXX")

Once you are happy your code is working we need to save it into a file --- for this we need a file editor

- In the jupyter notebook file browser window from the `New` drop down menu select `Text file`.
- Or you can use your favorite other editor ... but you **CANNOT** use a word processor. It must be a plain text file.

Paste in your working function

- change the name of the file to be `myhello.py`
  - it should automatically select Python as the language for the file
  - there is already a module called `hello` so you must pick another name
- change the message slightly so you can be sure you are using your saved version
- save the file

Now try to import it and run the provided hello function


In [None]:
import myhello_1

In [None]:
myhello_1.hello()

Mmmm ... we did not really want it to print all that stuff ... we just wanted the `hello` function to be defined.

On the other hand it is convenient to keep some test code at the bottom of the module so you can quickly test things are working.

When a module is imported what is actually happening is that the code in the file is executed and the resulting symbols imported into the namespace defined by the import statement ... for our module this is `myhello`.

The name of that namespace is stored in `__name__`

In the notebook what is this value?


In [None]:
%%writefile myhello_2.py
def hello():
    print("Hello")


if __name__ == "__main__":
    hello()
    print("XXXXXXXXXXXX")

Change the module to print out the name (edit, save, import the module again)


In [None]:
import myhello_2

In [None]:
myhello_2.hello()

Err??????? Nothing printed ... why?

Modules are only ever imported once for efficiency (and also because many modules just cannot be imported twice).

A second import statement is just ignored.

To reimport a module we have to restart the kernel. Do that and try again.


In [None]:
import myhello_1

In [None]:
myhello_1.hello()

So can u figure out how to make the fix?

If we are in the main program we want to run the print code (and any other tests for the module). But if we are not then we just want to define the function(s) and anything else the module wants to share.

```
if __name__ == "__main__":
   print("testing ...")
   dotests()
   etc.

```

Edit the file, save, restart kernel, reimport, try it out


We've already seen you can look at what's inside a module using `dir` --- look inside `myhello`


In [None]:
import sys

dir(sys)

In [None]:
dir(myhello_1)

Mmmm ... there's a doc string ... let's define it by putting a string at the very top of the file

Edit the file, save, restart kernel, reimport, try it out ... got the idea yet?

Use `help` or `?` to look at your docstring in action


In [None]:
import myhello_1

In [None]:
help(myhello_1)

Now let's try running the module as a standalone Python program ... for that we need to launch a python terminal (also called console or command window).

- In the jupyter notebook file browser window from the `New` drop down menu select `Terminal`

Alternatively

- On Windows --- from the Anaconda installation folder open `Anaconda Prompt`
  - You are now "talking" to windows via the command line
  - Use `chdir` to change directory (aka folder) to where your notebook (and `myhello.py`) is stored
- On Mac or Linux --- start a terminal and change directory (using `cd`) as necessary

Now run your standalone Python script with `python myhello.py`

You should get output similar to

```
Hello
XXXXXXXXXXXX
__name__ = __main__
```


Congratulations! You just escaped the Jupyter notebook.

While the Jupyter notebook is very powerful for interactive data exploration, graphics, development, and learning, it is not suitable for most "production" computing tasks.

- It is not readily automated (need to open in a browser and use your mouse)
- It is hard to make it reproducible (too easy to hop around the note book and you cannot remember which cell was executed in which order)
- It is hard to collaborate with others
- It is not easy to run on remote machines (for which you usually only have command-line access)

Stand alone Python programs (scripts) stored as files and organized into modules (and packages) are much more powerful.

- Easily automated --- `python script.py` will run it
- Fully reproducible --- same result every time unless you edit the program or change other data in files
- Easily shared and collaborated on
- Downsides are graphics and interactive programs become harder

Pick the right tool for the right job.


### Introduction to command line

- Linux
  - https://ubuntu.com/tutorials/command-line-for-beginners#1-overview
  - https://www.linuxjournal.com/content/linux-command-line-interface-introduction-guide
  - https://www.hostinger.com/tutorials/linux-commands
- Macintosh
  - Very similar to Linux
  - https://macpaw.com/how-to/use-terminal-on-mac
  - https://www.makeuseof.com/tag/mac-terminal-commands-cheat-sheet/
- Windows
  - Less common to use command line on windows but still useful.
  - You can also install the [Linux subsystem for Windows](https://docs.microsoft.com/en-us/windows/wsl/about) which I recommend if you are serious about scientific computing
  - https://www.makeuseof.com/tag/a-beginners-guide-to-the-windows-command-line/
  - Reference --- https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/windows-commands
  - Terse --- https://www.cs.princeton.edu/courses/archive/spr05/cos126/cmd-prompt.html
- Some of everything
  - https://tutorial.djangogirls.org/en/intro_to_command_line/
  - https://developer.mozilla.org/en-US/docs/Learn/Tools_and_testing/Understanding_client-side_tools/Command_line


### Command line arguments from Python

So far we can only run a Python script that does not take any arguments form the user --- but this is not very useful.

We need to access the arguments provided on the command line.

These are available through the `sys` module in data member `argv` (for argument vector).

Paste the following code into a file called `testargs.py` and try running it in a terminal window using different numbers of arguments (e.g., `python testargs.py a b c d`).


In [None]:
import sys

print("sys.argv =", sys.argv)
print()
for index, arg in enumerate(sys.argv):
    print("%2d  %s" % (index, arg))
print()
print("sys.argv[1:] =", sys.argv[1:])

Got it figured out?

The first argument is always the name of the Python script. Subsequent arguments are the ones provided by the user --- so we want to use `sys.argv[1:]`.

Make a file containing a Python program that "bakes" a cake (using the below function) using a list of ingredients provided on the command line. E.g.,

`python bakecake.py flour cheese cherries`


In [None]:
def bakecake(ingredients):
    print("Here's a delicious cake that contains:", *ingredients)

In [None]:
bakecake(["flour", "cheese", "cherries"])

### Making a new command

Having to type `python script.py [argument list]` is a bit clunky --- it would be nice to be able to just use the file name like any other command.

**Linux and Macintosh** make this straightforward

1. Name your file as you wish your command to be name (e.g., `bakecake` instead of `bakecake.py`)
2. Insert the [shebang](https://scriptingosx.com/2017/10/on-the-shebang/) at the _very_ top of your file

`#!/usr/bin/env python`

3. Mark your program as executable

`chmod 755 bakecake`

4. Run your program

`./bakecake flour cheese cherries`

You have to use the `./` (meaning in the current directory) since for security reasons the system does not look for commands in the current directory.

It is common to put frequently used scripts in a directory and put that directory in the list of directories used by the system to find commands --- this is the `PATH` environment variable. You can look at it as follows

`echo $PATH`

You will need the full path of the directory containing your command --- change into that directory and use the `pwd` (print working directory) command to get the full directory name.

`export PATH="DIRECTORYNAME:$PATH"`

replacing `DIRECTORYNAME` with the full path of your directory. If you mess this up you may have to start a new terminal and try again.

The command `which bakecake` will tell you if system can find your new command.

If you want this to work in every new terminal (or shell) you need to put the `export` command into your `.bashrc` file in your home directory.

**Windows** is harder. Indeed, you need to install a new Python module to build a "fake" executable from your Python script. Just once in each Anaconda environment (i.e., `not` in every terminal) you need to install `pyinstaller` using the following command (in a command-line terminal started within your Anaconda environment).

`conda install pyinstaller`

Then to make your script (`bakecake.py`) into an executable, do the following

`pyinstaller --onefile bakecake.py`

This will make two new directories `build` (which you can delete) and `dist` that will contain your command in a file named `bakecake.exe`. If you change into that directory you can run the command using just

`bakecake flour cheese cherries`

Setting environment variables is more complicated in Windows and more fraught with things that can go wrong, so I won't go over it while class is remote.


## Importing Python modules from other directories

Above we imported the custom `myhello` module into a Python script --- but this only worked because that module was in the same directory (or folder). If it is in a different directory, you need to tell Python where to look.

Python maintains a list of directories where it looks for modules to import


In [None]:
print(sys.path)

It's just a list. You can append the full path to the directory containing your module using the Python operation


In [None]:
sys.path.append("/home/username/mymodules")  # Use your directory name

In [None]:
import myhello_2

myhello_2.hello()

After doing that you should be able to import Python modules from that directory no matter which directory your Python process is running in.

**Linux and Macintosh** have an environment variable called `PYTHONPATH` that contains the list of directories used to initialize the `sys.path` list. If you append your variable to that, you don't need to modify your script.

`export PYTHONPATH=/home/username/mymodules:$PYTHONPATH`

[Change `/home/username/mymodules` to be the path (directory) that you want to use.]

And to remember this for future terminals you can again put this into your `.bashrc` or `.bash_profile` file.

**Windows** also has a `PYTHONPATH` environment variable but setting it depends on which version of windows you are running --- look online for instructions if you want to do this.

- Be careful --- don't set environment variables you don't understand
- Windows 10 --- https://docs.oracle.com/en/database/oracle/machine-learning/oml4r/1.5.1/oread/creating-and-modifying-environment-variables-on-windows.html
- Windows 11 --- https://www.computerhope.com/issues/ch000549.htm


## A command to make PDF plots from multiple files

Write a command that given one or more files containing tables of data generates for each file a PDF file containing a plot of the data. I.e., a command like

`python plot.py file1.txt file2.txt file3.txt`

makes 3 files (`file1.txt.pdf`, `file2.txt.pdf` and `file3.txt.pdf`) containing the PDFs. In each file we assume that the first column contains the `x` (independent variable) and that the subsequent columns (1 or more) contains the `y` (dependent variables) values.

We will use the Jupyter notebook to write and test all the bits and pieces --- the final step will be to assemble the tested components into the command script.

First, let's make some dummy data in files `file1.txt`, `file2.txt` and `file3.txt`. Each can have different numbers of columns.


In [None]:
import numpy as np

xlo = 0  # so that each plot is different
for filename in ["file1.txt", "file2.txt", "file3.txt"]:
    x = np.linspace(xlo, xlo + 10, 20)
    y1 = x
    y2 = x * x
    with open(filename, "w") as file:
        for data in zip(x, y1, y2):
            file.write("%.4f %.4f %.4f\n" % data)
    xlo += 2

Now let's make and test a function that plots the data in a single file


In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
def plotfile(filename):
    colors = ["k","r","b","g","y"]
    data = np.loadtxt(filename)
    nrow, ncol = data.shape
    for ycol in range(1,ncol):
        plt.plot(data[:,0],data[:,ycol],colors[ycol-1])

plotfile("file3.txt")

Now it is working, modify it to save a PDF


In [None]:
def plotfile(filename):
    colors = ["k", "r", "b", "g", "y"]
    data = np.loadtxt(filename)
    nrow, ncol = data.shape
    figure = plt.figure()  # must be BEFORE the plotting commands
    for ycol in range(1, ncol):
        plt.plot(data[:, 0], data[:, ycol], colors[ycol - 1])
    figure.savefig(filename + ".pdf", bbox_inches="tight")


plotfile("file3.txt")

OK ... that's the hard bit done. Now assemble the script by pasting the above tested fragments of code and inserting a loop over command line arguments.

The below code will NOT execute correctly in the notebook --- it needs to be put into a standalone Python script and run with the command

`python plotfiles.py file1.txt file2.txt file3.txt`


In [None]:
%%writefile plotfiles.py
#!/usr/bin/env python3

import sys
import numpy as np
import matplotlib.pyplot as plt


def plotfile(filename):
    colors = ["k", "r", "b", "g", "y"]
    data = np.loadtxt(filename)
    nrow, ncol = data.shape
    figure = plt.figure()
    for ycol in range(1, ncol):
        plt.plot(data[:, 0], data[:, ycol], colors[ycol - 1])
    figure.savefig(filename + ".pdf", bbox_inches="tight")


for filename in sys.argv[1:]:
    plotfile(filename)