# Lecture 8

4.4.2023

by **Martin Hronec** 

Contents:
1. [How to structure your projects](#Repository-structure)
2. [Python packaging](#Packaging)
3. [Documentation](#Documentation)



## Repository structure

* “structure” means making clean code whose logic and dependencies are clear as well as how the files and folders are organized in the filesystem

* a repository template:

    ```
    README.md
    LICENSE
    setup.py
    requirements.txt
    app/__init__.py
    app/main.py
    app/helpers.py
    docs/conf.py
    docs/index.rst
    tests/test_basic.py
    tests/test_advanced.py
    data/
    .gitignore ```
    
* `./app/`
    * module package (if module consists of only a single file, it can be placed in the root of your repository
    ( `./sample.py`)
* `./LICENSE`
    * the full license text and copyright claims
    * you are also free to publish code without a license, but this would prevent many people from potentially using or contributing to your code
    * more on licenses [here](https://choosealicense.com/licenses/)
* `./setup.py`
    * package and distribution management
    * more in [the next section](#Packaging)
* `./requirements.txt`
    * a pip requirements file
    * should be placed at the root of the repository
    * should specify the dependencies required to contribute to the project (testing, building, and generating documentation)
* `./docs/`
    * package reference documentation
    * more in [the documentation section](#Documentation)
* `./tests/`

    * more in [the testing section](#Tests)
* `./Makefile`
    * for generic management tasks
    * other generic management scrips (e.g. `manage.py`) belong at the root of the repository as well

## Packaging 

* why packaging?
    * because we want modular programming
    
* why modularing (modules)?
    * simplicity
    * maintainability
    * reusability
    * scoping - separate namespace

* functions, modules and packages already offer modularization

* Python is a general-purpose programming language => can be used in many ways
    * scientific computing
    * websites
    * scraping, etc.
    
* this flexibility is the reason you need to think about:
    * the project's customers/users
    * the environment where the project will run

* not necessary bad idea to think about packaging before starting to code
* what is a package? ... a collection of:
    * modules 
    * documentation
    * tests
    * tools to build and install it, etc. 

### Deployment 
* projects (packages) exist to be deployed (installed)
* before you package anything, ask questions like:

    * who are your users?  (software (python) developers, business people)
    * where will your software run? (servers, desktops, mobiles)
    * how is your software deployed? (part of the large software stack, individually, etc.)
* packaging libraries and tools (technical audience) vs. packaging applications (non-technical audience)

### Packaging libraries and tools

* you've probably heard about PyPI, `setup.py` and [wheels](https://pythonwheels.com/) 

* **modules**
    * simply a python file - can be distributed 
        * care about the right version of Python (and only relies on the standard library)
    * great for sharing simple scripts and snippets (email, StackOverflow, [GitHub gists](https://gist.github.com/)
    * ! this does not scale for projects with multiple files, need additional libraries or specific Python versions

* let's look at what's going on with modules
    * look at the objects defined in example_module.py (below)
        * text (string)
        * f (function)
        * AClass (class)

In [1]:
# %load example_module.py
text = "modularity is the key"

def f(arg):
    print(f'This function takes as an argument: {arg}')

class AClass:
    pass

* (if example_module.py is in appropriate location) these objects can be imported using `import` call in python
    * (delete them before trying with import)

In [2]:
del AClass, f, text

In [3]:
f

NameError: name 'f' is not defined

In [4]:
import example_module

In [5]:
dir(example_module)

['AClass',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'f',
 'text']

In [7]:
example_module.text

'modularity is the key'

* what happens when the interpreter executes the above `import` statement? 
* interpreter searches for *example_module.py* in **the module search path** (list of directories ):
    * the current working directory
    * the list of directories contained in the PYTHONPATH environment variable
    * an installation-dependent list of directories configured at the time Python is installed
* the resulting search path is accessible in the Python variable `sys.path`

In [9]:
import sys
sys.path

['c:\\Users\\Martin Hronec\\Projects\\phd\\teaching\\PythonDataIES\\08_packages_docs_tests',
 'c:\\Users\\Martin Hronec\\AppData\\Local\\Programs\\Python\\Python310\\python310.zip',
 'c:\\Users\\Martin Hronec\\AppData\\Local\\Programs\\Python\\Python310\\DLLs',
 'c:\\Users\\Martin Hronec\\AppData\\Local\\Programs\\Python\\Python310\\lib',
 'c:\\Users\\Martin Hronec\\AppData\\Local\\Programs\\Python\\Python310',
 '',
 'C:\\Users\\Martin Hronec\\AppData\\Roaming\\Python\\Python310\\site-packages',
 'C:\\Users\\Martin Hronec\\AppData\\Roaming\\Python\\Python310\\site-packages\\win32',
 'C:\\Users\\Martin Hronec\\AppData\\Roaming\\Python\\Python310\\site-packages\\win32\\lib',
 'C:\\Users\\Martin Hronec\\AppData\\Roaming\\Python\\Python310\\site-packages\\Pythonwin',
 'c:\\Users\\Martin Hronec\\AppData\\Local\\Programs\\Python\\Python310\\lib\\site-packages',
 'c:\\users\\martin hronec\\projects\\phd\\teaching\\dd',
 'c:\\Users\\Martin Hronec\\AppData\\Local\\Programs\\Python\\Python310\\l

* to ensure that your module is found, you need to do one of the following:
    * put example_module.py in the directory where the input script is located or the current working directory
    * add directory where `example_module.py` is located to PYTHONPATH environment variable 
    * put example_module.py anywhere you like and modify `sys.path` at runtime so that it contains that directory (see below)

In [10]:
sys.path.append(r'C:\Users\Martin Hronec\Projects')
sys.path

['c:\\Users\\Martin Hronec\\Projects\\phd\\teaching\\PythonDataIES\\08_packages_docs_tests',
 'c:\\Users\\Martin Hronec\\AppData\\Local\\Programs\\Python\\Python310\\python310.zip',
 'c:\\Users\\Martin Hronec\\AppData\\Local\\Programs\\Python\\Python310\\DLLs',
 'c:\\Users\\Martin Hronec\\AppData\\Local\\Programs\\Python\\Python310\\lib',
 'c:\\Users\\Martin Hronec\\AppData\\Local\\Programs\\Python\\Python310',
 '',
 'C:\\Users\\Martin Hronec\\AppData\\Roaming\\Python\\Python310\\site-packages',
 'C:\\Users\\Martin Hronec\\AppData\\Roaming\\Python\\Python310\\site-packages\\win32',
 'C:\\Users\\Martin Hronec\\AppData\\Roaming\\Python\\Python310\\site-packages\\win32\\lib',
 'C:\\Users\\Martin Hronec\\AppData\\Roaming\\Python\\Python310\\site-packages\\Pythonwin',
 'c:\\Users\\Martin Hronec\\AppData\\Local\\Programs\\Python\\Python310\\lib\\site-packages',
 'c:\\users\\martin hronec\\projects\\phd\\teaching\\dd',
 'c:\\Users\\Martin Hronec\\AppData\\Local\\Programs\\Python\\Python310\\l

* once a module has been imported, you can determine the location where it was found with the module's `__file__` attribute

In [12]:
import example_module
example_module.__file__

'c:\\Users\\Martin Hronec\\Projects\\phd\\teaching\\PythonDataIES\\08_packages_docs_tests\\example_module.py'

* possible to do `from <module_name> import *`
    * this is not recommended (especially in production code)
* also possible to use aliases
    * `import pandas as pd` - `pd` is alias
* ! modules are loaded only once per session
    * if you make a change to a module and need to reload it, you need to either restart the interpreter

Continuing with the distribution options you have ...

* **PACKAGES**

    * a "package" is essentially a module with other modules (potentially in it)
        *   ↑ number of modules =>   ↑ mess
    * packages allow hierarchical structuring of the module namespace

* package = a directory with an `__init__.py` and any number of other python files or other package directories
    ```
    a_package
       __init__.py
       module_a.py
       a_sub_package
         __init__.py
         module_b.py
    ```

* `__init__.py` can be empty or not (it will be run when the package is imported)
* example project from the Python Packaging Authority (real thing) [here](https://github.com/pypa/sampleproject)

### setuptools

* `setup.py` tells setuptools how to package, build and install the package


In [15]:
# %load setup.py
from setuptools import setup

setup(
    name='PackageName',
    version='0.1',
    author='YoursTruly',
    author_email='yourstruly@fsv.cuni.cz',
    #packages=['package_name','package_name.test'],
    url='',
    license='LICENSE.txt',
    description='Exemplatory package.',
    #long_description=open('README.md').read(),
   install_requires=[
   "Django >= 1.1.1",
   "pytest",
   ],)

* with a `setup.py` script, setuptools can:
    * build a source distribution `python setup.py sdist`
    * build wheels `./setup.py bdist_wheel` (the wheel package needed)
    * build from source `python setup.py build`
    * install `python setup.py install`
    
* we can also install in develop/editable mode: `python setup.py develop` or `pip install -e ./`
    * your package is installed, but any changes will immediately take effect
    * no `sys.path` manipulation!

* you can also upload your package to [PyPI](https://pypi.org/)

* **Quick exercise**: Create a new package
    1. create the basic package structure
    2. write a setup.py
    3. install the package with a `setup.py`
    4. import it from somewhere else

* **Notes**:
    * for larger projects, it is good idea tu use templates, e.g. from [Cookie Cutter](https://cookiecutter.readthedocs.io/en/latest/)
    * quality packaging materials:
        * from the Python Packaging authority [here](https://packaging.python.org/)
        * [practical tutorial](https://python-packaging-tutorial.readthedocs.io/en/latest/setup_py.html)

* **Discussion**: Is data science different?
    * https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview

## Documentation

* why documentation?
    * let's ask [write-the-docs community](https://www.writethedocs.org/guide/writing/beginners-guide-to-docs/)

* write docstrings at minimum:

* example from [sphinx](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) below


In [1]:
def function_with_types_in_docstring(param1, param2):
    """Example function with types documented in the docstring.

    `PEP 484`_ type annotations are supported. If attribute, parameter, and
    return types are annotated according to `PEP 484`_, they do not need to be
    included in the docstring:

    Args:
        param1 (int): The first parameter.
        param2 (str): The second parameter.

    Returns:
        bool: The return value. True for success, False otherwise.

    .. _PEP 484:
        https://www.python.org/dev/peps/pep-0484/

    """


def function_with_pep484_type_annotations(param1: int, param2: str) -> bool:
    """Example function with PEP 484 type annotations.

    Args:
        param1: The first parameter.
        param2: The second parameter.

    Returns:
        The return value. True for success, False otherwise.

    """


def module_level_function(param1, param2=None, *args, **kwargs):
    """This is an example of a module level function.

    Function parameters should be documented in the ``Args`` section. The name
    of each parameter is required. The type and description of each parameter
    is optional, but should be included if not obvious.

    If \*args or \*\*kwargs are accepted,
    they should be listed as ``*args`` and ``**kwargs``.

    The format for a parameter is::

        name (type): description
            The description may span multiple lines. Following
            lines should be indented. The "(type)" is optional.

            Multiple paragraphs are supported in parameter
            descriptions.

    Args:
        param1 (int): The first parameter.
        param2 (:obj:`str`, optional): The second parameter. Defaults to None.
            Second line of description should be indented.
        *args: Variable length argument list.
        **kwargs: Arbitrary keyword arguments.

    Returns:
        bool: True if successful, False otherwise.

        The return type is optional and may be specified at the beginning of
        the ``Returns`` section followed by a colon.

        The ``Returns`` section may span multiple lines and paragraphs.
        Following lines should be indented to match the first line.

        The ``Returns`` section supports any reStructuredText formatting,
        including literal blocks::

            {
                'param1': param1,
                'param2': param2
            }

    Raises:
        AttributeError: The ``Raises`` section is a list of all exceptions
            that are relevant to the interface.
        ValueError: If `param2` is equal to `param1`.

    """
    if param1 == param2:
        raise ValueError('param1 may not be equal to param2')
    return True

### mkdocs
* nothing wrong with sphinx, however mkdocs more user-friendly -> we will look at the example
* if you want to use markdown, look at [mkdocs](https://www.mkdocs.org/)
* example config

```
site_name: example
nav:
  - "Home" : index.md
  - "About" : about.md
  - "Pipeline" : pipeline.md

docs_dir: docs
plugins:
    - search
    - mkdocstrings:
        default_handler : python
        handlers:
            python:
                setup_commands:
                - import sys
                - sys.path.append("app/")
                rendering:
                    show_source: true
                    show_root_heading: true
extra_css:
    - stylesheets/extra.css
```

* **Ex**: build basic docs structure for yourself
    * (later) host it on GitHub pages - https://pages.github.com/
    

### Sphinx

* [Sphinx](https://www.sphinx-doc.org/en/master/index.html) is a documentation generator
* standard in producing documentation because:
    * simple, yet powerfull 
    * HTML, LaTeX, ePub, plain text output formats
    * extensive cross-references - automatic links for functions, classes, citations, etc. 
    * a lot of extensions
* [example projects](https://www.sphinx-doc.org/en/master/examples.html)
* it uses [reStructuredText](docutils.sourceforge.net/rst.html) as the markup language ([cheatsheet](https://thomas-cokelaer.info/tutorials/sphinx/rest_syntax.html#restructured-text-rest-and-sphinx-cheatsheet))
 * see [reStText_example.rst](reStText_example.rst)
 
 * how to create documentation:
    * `sphinx-quickstart`
        * creates a source directory with *conf.py* and a master document, *index.rst.* (+ *Makefile*)
        * the master document serves a welcome page and contains the root of the "table of contents tree" (or toctree).
        * this allows connecting multiple  reStructuredText files to a single hierarchy of documents
    * running the build: `sphinx-build -b html <sourcedir> <builddir>`
        * or more simply (since we've used `sphinx-quickstart` to initiate the repo - `make html` (uses *Makefile* and *make.bat*)


* *conf.py* file controls how Sphinx processes your documents
    * just like any other Python script (you can run anything when you want to run Spinx
    
* `toctree` adds structure to your documentation

* **autodoc** (extension)
    * inclusion of docstrings from your modules
    * to use it, activate it in *conf.py* 
        `extensions = ['sphinx.ext.autodoc']`
        
* if you want to use markdown, look at [mkdocs](https://www.mkdocs.org/)