In [1]:
import course;course.header()

# Advanced Python Course 
## Mobi Heidelberg WS 2021/22
### by Christian Fufezan 

christian@fufezan.net

https://fufezan.net

<img src="./images/cc.png" alt="drawing" width="200" style="float: left;"/>


# pathlib

a platform independent way to work with paths. Read more [here](https://docs.python.org/3/library/pathlib.html)
You like os.path, then you will move pathlib.

Windows:
 - C:\Documents\Newsletters\Summer2018.pdf

Others (also known as POSIX):
 - /mounts/c/Documents/Newsletters/Summer2018.pdf

## Intro

In [3]:
from pathlib import Path

In [35]:
course_title_md = Path("course_title.md")

In [27]:
course_title_md

PosixPath('course_title.md')

In [28]:
course_title_resolved = course_title_md.resolve()
course_title_resolved

PosixPath('/Users/fu/dev/_teaching/advanced_python_2021-22_HD_pre/notebooks/course_title.md')

In [29]:
course_title_resolved.parent

PosixPath('/Users/fu/dev/_teaching/advanced_python_2021-22_HD_pre/notebooks')

In [30]:
course_title_resolved.stem

'course_title'

In [31]:
course_title_resolved.name

'course_title.md'

In [32]:
course_title_resolved.suffix

'.md'

In [33]:
# changin suffix?
course_title_resolved.with_suffix(".new_suffix")

PosixPath('/Users/fu/dev/_teaching/advanced_python_2021-22_HD_pre/notebooks/course_title.new_suffix')

## Creating Paths

In [54]:
root = Path(".")

In [55]:
new_file = root / ".." / "new_file.txt"

In [56]:
new_file.resolve()

PosixPath('/Users/fu/dev/_teaching/advanced_python_2021-22_HD_pre/new_file.txt')

In [57]:
new_file.exists()

True

In [58]:
with open(new_file, "w") as oo:
    print("New File!", file=oo)

In [59]:
new_file.exists()

True

## Deleting files

In [61]:
new_file.unlink(missing_ok=False)
#                  ^--- most favorit feature ... again less of the let me check ...
#                           works with python3.8+ ...

TypeError: unlink() got an unexpected keyword argument 'missing_ok'

Does not work with folders (straight away). Use shutils.rmtree, more info can be found [here](https://docs.python.org/3/library/shutil.html#shutil.rmtree).

## Iterating directories, global and selective

In [66]:
root = Path(".").resolve()

In [69]:
root.iterdir()

<generator object Path.iterdir at 0x7f9478adf4d0>

In [72]:
for entry in root.parent.iterdir():
    print(entry.name)

.DS_Store
LICENSE
.pytest_cache
tests
README.md
.gitignore
.venv
.git
.vscode
data
notebooks


In [73]:

for entry in root.parent.glob(".*"):
    print(f"{entry.name} is file? {entry.is_file()}")


.DS_Store is file? True
.pytest_cache is file? False
.gitignore is file? True
.venv is file? False
.git is file? False
.vscode is file? False


# Storing Python objects

Sometimes we want to store Python objects and later retrive the same objects again. This call serializing and deserializing objects, respectively.

There are many ways of serializing a python objects. The probably most basic version is provided by the python pickle module.

In [74]:
import pickle

In [105]:
import csv
lookup = {}
with open("../data/amino_acid_properties.csv") as aap:
    aap_reader = csv.DictReader(aap, delimiter=",") 
    for line_dict in aap_reader:
        lookup[line_dict['1-letter code']] = line_dict

In [90]:
pkl_file_path = Path(".") / "test.pkl"
with open(pkl_file_path, "wb") as pkl_file:
    pickle.dump(lookup, pkl_file)

Note: file object interface must be open in write + binary mode, ie "wb"

In [92]:
lookup_2 = pickle.load(open(pkl_file_path, "rb"))

In [93]:
lookup_2 == lookup

True

Pickling objects helps developing functions or segements in a rapid and agile way.

Block A:
   * Reading files
Block B:
   * Transforming data
Block C:
   * Normalizing data
Block D:
   * _*New way of analysing results*_

You do not want to go through A-C everytime, while you develop D. 

# Compressions

In [96]:
pkl_file_path.stat().st_size

4015

In [99]:
import bz2

In [100]:
pkl_file_path = Path(".") / "test.pkl"
with bz2.open(pkl_file_path, "wb") as pkl_file:
    pickle.dump(lookup, pkl_file)

In [101]:
pkl_file_path.stat().st_size

1881

In [107]:
lookup_3 = pickle.load(bz2.open(pkl_file_path, "rb"))

In [108]:
lookup == lookup_3

True

Simply exchanging the file open interface from open to bz2.open enables compression.

Python supports several compression algorithms out of the box (read more [here](https://docs.python.org/3/library/archiving.html)) so there is no need to decompress files before reading.
