# Python standard libraries and scripts

More details: [The Python standard library](https://docs.python.org/3/library/index.html).

## Key points of the previous lectures

Let's study a class which knows how to count words in a provided text.  
Note, that the class provides more functionality than just counting words.

In [None]:
from collections import defaultdict          # a dictionary with default values

class WordCounter:
    def __init__(self):
        """Initialize the word counter (all words get the count of 0)."""
        self._word2count = defaultdict(int)  # this attribute will keep the words and their counts
    
    def countText(self, text):               # a method of the class
        """Split the text into words and count them."""
        for word in text.split():
            self._word2count[word] += 1
    
    def getWordCount(self, word):            # another method of the class
        """Return the count of the given word."""
        return self._word2count[word]
    
    def getWordCountDict(self):              # self denotes an instance of the class
        """Return the dictionary with the words and their counts."""
        return self._word2count
    
    def getTotalWordCount(self):
        """Return the total count of all words."""
        return sum(self._word2count.values())

# ----- using the word counter, example -----
names = """
    Jennifer James William John Linda David Elizabeth Mary Mary
    William David James Michael James Robert Robert Patricia Jennifer Linda
    John Jennifer Patricia Robert Robert John Jennifer Michael Elizabeth
    James Linda Jennifer Mary Patricia James Mary David Elizabeth Mary
    Linda David James John Elizabeth Mary Linda Linda Robert Linda John
    Linda Michael William Elizabeth David Elizabeth Jennifer Michael
    Linda Mary Jennifer William Michael Patricia Mary Patricia William
    Jennifer Mary Elizabeth William Linda Mary David Linda David John
    Michael Robert Linda John Patricia Patricia Mary Robert John Linda
"""

wc = WordCounter()                    # create an instance of the class
wc.countText(names)                   # call the class method in the instance
print( f'James has been counted {wc.getWordCount("James")} times.' )
print( f'The total number of counted words is {wc.getTotalWordCount()}.' )
for word, count in wc.getWordCountDict().items():
    print( f'{word}: {count}' )

# ----- let's count some more names in the previous counter -----
print( '---' )
wc.countText("James James James James")
print( f'James has been counted {wc.getWordCount("James")} times.' )
print( f'The total number of counted words is {wc.getTotalWordCount()}.' )


## Exceptions

### Observe several exceptions

There are [many exceptions](https://docs.python.org/3/library/exceptions.html#exception-hierarchy) defined in Python.  
Here are a few examples of code raising exceptions:

In [None]:
# 1/0                   # ZeroDivisionError

In [None]:
# dct = { "a":1 }
# dct["b"]              # KeyError

In [None]:
# print( wrongVarName ) # ValueError

In [None]:
# int( "a" )            # ValueError

In [None]:
# personAge = 199
# if personAge >= 120:
#    raise ValueError( f"Value of personAge must be below 120, not {personAge}." )

### Flow control of exceptions

Using `try`/`except`/`except`/`finally` blocks it is possible to write code which can react to exceptions.  
The `try` statement declares a block of code in which exceptions are monitored.  
The `except` clauses specify how to react to provided types of exceptions.  
The `final` clause allows to provide a block of code which is always executed (even if an exception has happened in the `try` block).

Let's define a function `fun` which may generate an exception depending on the value of `div` argument:

In [None]:
def fun(div):
    print( f"fun({div}) start" )

    try:
        print( f"fun({div}) try start" )
        1/div
        print( f"fun({div}) try done" )
    except ZeroDivisionError:            # if ZeroDivisionError-exception-happened:
        print( f"fun({div}) except ZeroDivisionError" )
    finally:
        print( f"fun({div}) finally" )

    print(f"fun({div}) end")

To understand `try`/`expect`/`finally` study the differences between outputs of the following code cells:

In [None]:
print("main start")
fun(1)
print("main end")

In [None]:
print("main start")
fun(0)
print("main end")

In [None]:
print("main start")
fun("A")
print("main end")

Another example which forces a user to enter a positive number.  
This code does not crash when the user enters some text instead of a number.

In [None]:
while True:
    try:
        val = float( input( "Please enter a positive number" ) )
        if val > 0:
            break
        print( "Not a positive number. Try again..." )
    except ValueError:
        print( "Not a number. Try again..." )

print( f"You entered {val}. Thanks." )


## Files

A *file* can be thought of as a (large) vector of bytes.  
Due to technical limitations of *(physical) devices* which preserve the files, access to the content might be slow
(this is an old device keeping files: [tape device](https://en.wikipedia.org/wiki/Tape_drive)).  
Usually, the most optimal way of accessing the full content of a file is to read or write it progressively without any seeking (jumping).  

A *file system* is a standard which defines how to organize files stored on a device and how to access them.  
Typically, files are identified by (short) file names. These names are further organized in a hierarchical *directory* structures.  
When a *full path* (i.e. a file name in a directory) is provided the file system can provide access to the file.

It depends on the program which uses the file how the file content is interpreted:
- A *text file* is intended to be processed line-by-line and usually can be opened in simple text editors.  
    (often there might be some extra characters added or removed to make the file transferrable between different operating systems and character encoding environments):
    - A *simple text file* (popular ending of the file name, called *extension*: `.txt`).
    - A *markdown file* (extension: `.md`).
    - A *python script file* (extension: `.py`).
    - A *[json](https://www.json.org/json-en.html)* file (extension: `.json`): good for storing nested collections.
    - A *python notebook file* (extension: `.ipynb`): a collection of data describing python jupyter notebook cells.
- A *binary file* is usually processed in blocks of unmodified file bytes:
    - A *gzip* file (extension: `.gz`): another file compressed
    - A *zip* file (extension: `.zip`): multiple other files compressed

Example:

```text
.                                                # The top-level (root) directory
├── 01_python
│   ├── git_github_intro.md                      # A text file, filename: "git_github_intro.md", extension: "md"
│   ├── git_simple_cmds.jpg                      # Binary file, an image in "jpg" format
│   ├── memory_organization.md
│   ├── memory_pointers.jpg
│   ├── memory_units.jpg
│   ├── python_basic.ipynb
│   └── python_lists_tuples.ipynb
├── 02_python
│   ├── git_practice.md
│   ├── python_sets_dicts.ipynb
│   └── set_operations.png                       # Binary file, an image in "png" format
├── 03_python                                    # A directory
│   └── python_flow_control.ipynb                # A file named "python_flow_control.ipynb" in directory "03_python"
├── 04_python
│   ├── food_classes.png                         # File extension: .png
│   ├── git_assignment.md
│   ├── python_oop.ipynb                         # Absolute path from the root: /04_python/python_oop.ipynb
│   └── two_tables.png                           # Relative path from the current directory: ../04_python/two_tables.png
├── 05_python                                    # THIS IS THE CURRENT DIRECTORY (see below)
│   └── python_rest.ipynb                        # THIS IS THIS FILE 
├── LICENSE
└── README.md                                    # A markdown file, describes what is in this directory
```

### Writing text files

Let's start with writing some texts from the list `ls` to the console:

In [None]:
ls = [ "oat flakes", "sugar", "half milk", "water", "banana", "blueberries" ]

for l in ls:
    print( l )

Several small adjustments of the code redirect the `print` output to a file.  
The name of the file is provided in the `fileName` variable.  
(*Note:* this code will be improved in the next cells).

In [None]:
ls = [ "oat flakes", "sugar", "half milk", "water", "banana", "blueberries" ]

fileName = "testfile.txt"                    # This is the path where the file will be written.
f = open(file=fileName, mode="w")            # 1. Creates (or re-creates) an empty file with the provided fileName (path);
                                             #    mode="w" requests "write" access.
                                             # 2. Provides an object (f) allowing access to the file.
                                             # 3. Reserves memory buffers for quick access.
for l in ls:
    print( l, file=f )                       # This (usually) writes to the buffers, not directly to the file.

f.close()                                    # Forces the buffers to be written; closes access to the file.

The `with` command surrounds its code block by a *context manager*.  
The manager guarantees that the file `f` will always be closed, even if the code block raises an exception:

In [None]:
ls = [ "oat flakes", "sugar", "half milk", "water", "banana", "blueberries" ]

fileName = "testfile.txt"                    # This is the path where the file will be written.
with open(file=fileName, mode="w") as f:     # f is the result of open(...) but now with auto-close after the block
    for l in ls:
        print( l, file=f )

In some situations you may prefer to use the `write` method.  
Then, ends of lines will need to be added.

In [None]:
ls = [ "oat flakes", "sugar", "half milk", "water", "banana", "blueberries" ]

fileName = "testfile.txt"
with open(file=fileName, mode="w") as f:
    for l in ls:
        f.write( l )                         # Let's use the write method of f, instead of the print command
        f.write( "\n" )                      # The write method does not add new lines

### Reading text files

Use `mode="r"` in `open(...)` to get reading access to a file.  
The `readlines()` method reads all lines from a file into a list of `str` texts.

In [None]:
fileName = "testfile.txt"                    # Note: write this file first with one of the writing examples
with open(file=fileName, mode="r") as f:     # Note: now mode="r" for reading access!!!
    lines = f.readlines()                    # Texts from all lines of the file will be read and returned as a list

lines                                        # Note: each text has end-of-line characters at the end

It is possible to process a file line-by-line without full reading into memory.  
Here, single lines are iterated over in a comprehension:

In [None]:
fileName = "testfile.txt"                    # Note: write this file first with one of the writing examples
with open(file=fileName, mode="r") as f:     
    lineLens = [len(l) for l in f]

lineLens

And here, single lines are iterated over in a `for` loop:

In [None]:
lineLens = []

fileName = "testfile.txt"                    # Note: write this file first with one of the writing examples
with open(file=fileName, mode="r") as f:
    for l in f:
        lineLens.append( len(l) )

lineLens

### Writing/reading JSON files

Simple/nested lists, dictionaries of texts and numbers can be easily written in the [JSON format](https://en.wikipedia.org/wiki/JSON):

In [None]:
import json

assignmentsInfo = {
    "firstName": "John",
    "lastName": "Smith",
    "studentId": "s12345678",
    "sshGitHub": "git@github.com:LUMC/EfDS.git"
}

fileName = "assignments.json"
with open( fileName, "w" ) as f:
    json.dump( obj = assignmentsInfo, fp = f )

Also reading `.json` files is straightforward:

In [None]:
import json

fileName = "assignments.json"
with open( fileName, "r" ) as f:
    ai = json.load( fp = f )
ai

## Scripts (not Python notebooks) and the command line

### Python script vs. Python notebook

Python scripts are simple text files (usually with extension `.py`) containing only Python statements.  
Simple pure-text editors may be used to write Python scripts, although dedicated code editors are recommended.

Python jupiter notebooks are JSON documents describing cells which contain Python code, free user text written in Markdown, calculation output, etc.

### Start a Python script in the console/terminal/cmd

Operating systems usually have a shell program which allows to type commands for the system.  
Here are some options:
- `cmd` on Windows, `Cygwin`
- `Terminal.app` on Mac
- `xterm` on Linux
- (if you use Visual Studio Code): `Terminal` in the `View` menu

The following command typed in the system console should start Python interactive mode:
```text
> python                                                                  # system console
Python 3.8.5 (default, Sep  4 2020, 02:22:02) 
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> sum([1,2,3])                                                          # python console
6
>>> exit()                                                                # close python console
>                                                                         # back to system console
```

Assume that in the current directory you have a Python script `my_script.py` with the following content:
```python
print( "Hello, world!" )
```

Then, in the system console you may start your script as follows:
```text
> python my_script.py
Hello, world!
>
```

### A Python script with command line arguments

It is possible to pass text arguments from the system console to the Python script.  
Assume that `my_script.py` has the following code:

```python
import sys

if __name__ == "__main__":                  # a suggested way to define the main part of a script
    print( "Hello, world!" )
    for a in sys.argv:                      # sys.argv contains the script name
        print(a)                            # and additional arguments
```

Here is how you can pass some arguments:
```text
> python my_script.py A BB CCC 1234          
Hello, world!                               
my_script.py                                # the script name (my_script.py)
A                                           #      and 4 arguments are passed to Python
BB                                          #      in the sys.argv variable
CCC                                         #      as strings
1234
>
```

## Math

Scan quickly [the reference page of the `math` (mathematical functions) module](https://docs.python.org/3/library/math.html).  
(It is important to get an overview of available functionality...)  
Some examples are shown below (import first the module):

In [None]:
from math import *                                                 # import all functions from the module

### Sums, products

In [None]:
vs = [1,2,3,4]
sum(vs)

In [None]:
vs = [1,2,3,4]
prod(vs)

### Powers, exponents, logs

Some examples to study:

In [None]:
[ log10(1), log10(10), log10(0.1) ]
[ log2(2), log2(1024), log2(0.5) ]
exp(1)
e                                                                    # math.e
[ log(exp(1)), exp(log(12345)) ]
[ sqrt(4), pow(4,0.5), 4**0.5 ]

### Trigonometry

Some examples to study:

In [None]:
pi                                                                  # math.pi
degrees(pi)
[ sin(0), sin(pi) ]
[ cos(0), cos(pi) ]
[ atan2(0,1), atan2(1,0), atan2(0,-1), atan2(-1,0) ]                # nearly anticlockwise


### Rounding

In [None]:
[ floor( 2.2 ), floor( 2.99999 ), floor( -2.99999 ) ]              # rounding in lower direction

In [None]:
[ ceil( 2.2 ), ceil( 2.99999 ), ceil( -2.99999 ) ]                 # rounding in upper direction

In [None]:
[ trunc( 2.2 ), trunc( 2.99999 ), trunc( -2.99999 ) ]              # rounding towards zero

In [None]:
[ round( 2.2 ), round( 2.99999 ), round( -2.99999 ) ]              # not in math package
{ x:round(x) for x in [-2.5,-1.5,-0.5,0.5,1.5,2.5] }               # "halfs" are rounded to the closest even number

### Combinatorics

In [None]:
comb(4, 2)                                                         # number of 2-element subsets out of a 4-element set

In [None]:
perm(4, 2)                                                         # number of 2-element lists out of a 4-element set

### Missing or infinite values

In [None]:
vs = ( 0, 1, nan, inf, -inf )                                      # math.nan, math.inf
{v:isnan(v) for v in vs}

In [None]:
vs = ( 0, 1, nan, inf, -inf )
{v:isinf(v) for v in vs}

In [None]:
vs = ( 0, 1, nan, inf, -inf )
{v:isfinite(v) for v in vs}

## (Pseudo-) random numbers

Scan quickly [the reference page of the `random` (generate pseudo-random numbers) module](https://docs.python.org/3/library/random.html).  
(It is important to get an overview of available functionality...)  
*Note:* do not use these functions for security purposes.  
Some examples are shown below (import first the module):

In [None]:
from random import *                                               # import all functions from the module

### Integers, seed

In [None]:
[randint(10, 20) for i in range(10)]                               # random integers, both ends inclusive

The generated numbers are only pseudo-random. They are based on an internally generated sequence of numbers.  
The `seed(num)` function allows to move the internal sequence to points defined by an integer `num`.  
The following code will always generate the same pseudorandom sequence.  
*Note:* `seed` works with all random numbers/choices (not only with integers).

In [None]:
seed( 123 )                                                        # change 123 to another integer to get different random nums
[randint(10, 20) for i in range(10)]                               # random integers, both ends inclusive

### Real numbers

In [None]:
[uniform(0, 1) for i in range(10)]                                 # from a uniform distribution on [0,1]

In [None]:
[gauss(mu=0, sigma=1) for i in range(10)]                          # from the normal distribution mu=0, sd=1

### Collections

Let's create a standard French `deck` representing 52 playing cards.  
It is possible to represent the cards by their [Unicode symbols](https://en.wikipedia.org/wiki/Playing_cards_in_Unicode).

In [None]:
suits = [ "\u2660", "\u2665", "\u2666", "\u2663" ]
nums = [ "A", "2", "3", "4", "5", "6", "7", "8", "9", "10", "J", "Q", "K" ]
deck = [n+s for n in nums for s in suits]
print( " ".join( deck ) )

This is how a list can be shuffled (order of the elements gets randomly changed):

In [None]:
copiedDeck = deck.copy()                                           # let's make a copy, so deck stays unchanged
for i in range(3):
    shuffle( copiedDeck )                                          # permute the elements of copiedDeck in place
    print( " ".join( copiedDeck ) )

Use `sample` to randomly select without returning elements of a list:

In [None]:
for i in range(3):
    d = sample(deck, k=5)                                         # randomly picked 5 cards (without returning back to the deck)
    print( str.join( " ", d ) )

# sample(deck, 100)                                               # ValueError, no returns, so can't pick 100 elements from 52

Use `choice` for choosing with returning:

In [None]:
d = choices(deck, k=100)
d.sort()
print( str.join( " ", d ) )                                      # note repetitions

## Statistics

Scan quickly [the reference page of the `statistics` (mathematical statistics functions) module](https://docs.python.org/3/library/statistics.html).  
(It is important to get an overview of available functionality...)  
Some examples are shown below (import first the module):

In [None]:
from statistics import *                                         # import all functions from the module

Let's generate some normally distributed numbers:

In [None]:
vs = [gauss(mu=0, sigma=1) for i in range(100)]

In [None]:
{ "min":min(vs), "max":max(vs), "mean":mean(vs), "median":median(vs) }

In [None]:
{ "sd":stdev(vs), "var":variance(vs) }

## Self-study tasks

### Python script to generate normally distributed random numbers (random, statistics, file write, script)

Write a Python script `gen_norm_nums.py` which generates normally distributed random numbers and writes them to a text file.  
The script should accept four parameters: 
- `fileName`: name of the file to be written
- `size`: number of numbers to generate
- `mu`: the mean for the `gauss` generator
- `sd`: the `sigma` for the `gauss` generator
The script should also report on the console what are the actual mean and standard deviation of the generated numbers.

You may add code to generate exceptions when parameters have nonsense values.  
The following command line should work in a console:

```text
> python ./gen_norm_nums.py nums.txt 10 0 1         # 10 numbers, mean 0, sd 1, write to nums.txt
Size:   10
Mean:   requested=0.0, generated=0.11592080981585763
Stddev: requested=1.0, generated=0.9322987440171838
```

### Unidirected graph (class, writing to file/console, exceptions)

The Wikipedia page [Graph (discrete mathematics)](https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)) provides an example illustration of *a (undirected) graph with six vertices and seven edges*.  
Implement a class `Graph` allowing to memorize a graph. Use the following names:
- `vs` and `_vs` should represent vertices
- `es` and `_es` should represent edges
- `addV` and `addE` should be the methods to add a vertex/edge
- it should be an error to add an edge before adding its vertices first

The class `Graph` is expected to be used as follows (the following code should work).  
When you work on the class, implement it in small steps and test it after each step.  

In [None]:
# ----- step 1: create an empty graph -----
g = Graph()

# ----- step 2: make adding vertices possible -----
g.addV("a")
g.addV("b")
# g.addV("b")                        # ValueError: Vertex b is already in the graph.

# ----- step 3: make adding edges possible -----
g.addE("a","b")
# g.addE("a","b")                    # ValueError: Edge a<->b is already in the graph.
# g.addE("b","a")                    # ValueError: Edge a<->b is already in the graph.
# g.addE("a","x")                    # ValueError: Add x first as a vertex.

# ----- step 4: allow chaining (return self) -----
g.addV("c").addV("d")
g.addE("c","c").addE("c","d")

# ----- step 5: provide getters of the graph data -----
print( g.vs() )                      # Sorted list of node names: ['a', 'b', 'c', 'd']
print( g.es() )                      # List of tuples (return a copy so it does not get modified)
                                     # {('c', 'd'), ('c', 'c'), ('a', 'b')}

# ----- step 6: provide checker of an edge -----
print( g.hasE( "a", "b" ) )          # True
print( g.hasE( "a", "c" ) )          # False
print( g.hasE( "???", "???" ) )      # False; too costly to always check vertices

# ----- step 7: implement writer of adjacency matrix -----
g.writeAdjacencyMatrix()             # Write adjacency matrix to console (f=sys.stdout)
with open(file="g.mx", mode="w") as f:
    g.writeAdjacencyMatrix(f=f)      # Write dot graphViz to the file

# Here is the adjacency matrix:
#    	a	b	c	d
#   a	0	0	1	0
#   b	0	1	0	0
#   c	1	0	0	1
#   d	1	0	0	0

### Shuffling lines of a file (random, statistics, file read/write, script)

Write a Python script `shuffle_lines.py` which reads a text file, shuffles its lines and writes them back to another text file.  
The script should accept two parameters: 
- `inFileName`: text file to be read
- `outFileName`: text file to be written

You may add code to generate exceptions when parameters have nonsense values.  
The following command line should work in a console:

```text
> python shuffle_lines.py nums.txt nums_sh.txt       # read lines from nums.txt, shuffle, write to nums_sh.txt
```

### Fibonacci numbers (functions)

Write a function `fib(pos)` which calculates [Fibonacci number](https://en.wikipedia.org/wiki/Fibonacci_number) at position `pos` in the sequence.  
The expected output for the first 19 elements:
```python
fib(pos) for pos in range(19)]
# [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584]
```