# Python Modules and Packages
© Explore Data Science Academy

## Learning Objectives
In this train you will learn how to:
- Write a module and import all or specific objects from a module; 
- Import modules using an alias name and explore the use of built-in modules; and,
- Define a package and its hierarchical organization.


## Outline
To do this, we will:
- Define the importance of modules and packages in programming;
- Review the syntax and conventions for importing modules; 
- Consider the use of built-in Python modules and packages; and, 
- Review the standard list of Python libraries/packages and their applications.

## Introduction
Modular programming is a style of programming that promotes code reusability; allowing us to generate single pieces of code which can be used in multiple parts of a project - thus saving time and resources. The resulting code can be simpler to understand and maintain since components can be considered in isolation. In Python, **modularization** is implemented using functions, modules and packages.

A **module** is a file consisting of Python code. A module can define functions, classes, variables and runnable code. These modules can be imported and referenced from other python code. A **python package** (also referred to as a library) is a collection of hierarchically structured directories of python code consisting of sub-packages and modules.

Modules and packages are two mechanisms that facilitate modular programming. 

## Why modularization? 

* **Reusability**: Eliminates the need to write new code, as functionality defined in a single module can be easily reused.


* **Simplicity**: Modules generally tend to focus on a selected area of the problem which is usually small, rather than focusing on the entire problem at hand. Integrating the use of selected modules will result in you systematically dealing with each small problem in your code making development easier and less error-prone.


* **Maintainability**: Modules in Python are often designed to be self-supporting. In this sense, one module does not depend entirely upon other modules to work. Therefore it is unlikely that modifying a single module of a program will affect other parts of the program. This allows a team of many programmers or data scientists to work collaboratively on a large application.

## Working with modules  

### Creating modules
Creating a module is as simply as saving a Python script with functions, classes, variables and running code. The file name is the module name with the suffix `.py` appended.

### Importing modules 
We can access this module and its elements from a different python file by using the `import` statement. 
* We can import a single module/package:

    ```python
    import <module_name>
    ```

* import multiple modules using individual import statements:

    ```python
    import <module_1_name>
    import <module_2_name>
    import <module_3_name>
    ...```
    
    
                      
The same rules apply when dealing with packages. We can import specific modules within a package by using dot notation. For this to work, we have to structure packages and modules in a way that reflects the hierarchy in the package directory.

   ```python
    import <package_name>.<sub_package_name>.<module_name>.<...>
   ```

Let's go ahead and create our own module, with the following 3 types of elements:

*   A variable `s`
*   A function `say_hi()`
*   A class `Greet`

In [8]:
# Contents of the module we are creating 
content = """
s = 'Hello ' 

def say_hi(name):
    print(s+name)

class Greet:
    pass
"""

# Write the above text to a file called my_module.py
# within our current working directory. 
with open('./my_module.py', 'w') as fp:
    fp.write(content)

In [9]:
#Import the module we've just made! 
import my_module 

Even though we've imported our module, note that its contents (the variables and functions we've defined within the module) are not directly accessible to us. As such, attempting to access these elements will result in (namespace) errors being thrown. We can safely see such errors using a `try-except` block:

In [10]:
try:
    # Try to print the variable s
    print(s)
except NameError:
    # We've caught a NameError exception (error!)
    # We inform the programmer (you) that the variable does not exist
    print("Variable 's' does not exist!")

Variable 's' does not exist!


Learning from this experience, it is important to know that objects in a module are only accessible when prefixed with via dot notation, as illustrated below. 

In [11]:
#(2)
my_module.s 

'Hello '

In [12]:
#(3)
my_module.say_hi('Nelson')

Hello Nelson


## Importing modules using an alias
The `import` statement in python also allows for the use of aliases when referencing a module. Using the `as` keyword, we can save ourselves from having to type otherwise long package names each time we need to access an object from a given module/package. This usually follows the following syntax:

```python
import <module_name> as <new_model_name>
```
or 

```python 
from <package_name> import <module_name> as <new_model_name>
```

for example:

In [13]:
#(4)
import my_module as md

We can thus treat the alias as the new name for the module. 

In [14]:
md.s

'Hello '

In [15]:
md.say_hi('Jabulani')

Hello Jabulani


In [16]:
md.Greet

my_module.Greet

Another way to access specific objects in a module is to use the `from` keyword and import them directly:

```python 
from <module_name> import <x, y, z>
``` 

In [17]:
from my_module import s, say_hi, Greet

In [18]:
s

'Hello '

In [19]:
say_hi('Jabulani')

Hello Jabulani


In [20]:
Greet

my_module.Greet

To select all objects from a module you can use the following command, where the asterisk **( * )** *signifies all* :

```python
from <module_name> import*
```
Let's see this in practice one more time: 

In [21]:
from my_module import * 

Now we have access to all our module contents

In [23]:
say_hi('Joanne')

Hello Joanne


In [22]:
Greet

my_module.Greet

## Built-in modules 
Python contains a large number of what are known as 'built-in' modules. These modules can be accessed in Python programs by simply importing them using their name preceded by the keyword `import`. 

Each built-in module contains resources for certain system-specific functionalities such as Operating System management, disk Input-Output, etc. Python scripts(with the **.py** extension) containing useful utilities are embedded within the standard library. 

To **display a list of all available modules**, use the following command:

`help('modules')`

In [None]:
help('modules')

Alternatively, the `dir()` function is a built-in function that can be used to **list all the function names (or variable names) in a module**:

`dir(module_name)`

In [26]:
# Import math module 
import math 

# Use the sqrt function in the math module
x = math.sqrt(81)
print('The square root of 81 is equal to {}'.format(x))

# List all functions in math module 
list_all= dir(math)
print('Functions in the math module: {}'.format(list_all))

The square root of 81 is equal to 9.0
Functions in the math module: ['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose', 'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2', 'modf', 'nan', 'pi', 'pow', 'radians', 'remainder', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'tau', 'trunc']


## What are Packages in Python?

For obvious reasons, we can't really store all of our files on our computer in the same location. We, therefore, make use of well-organized directory structures for easier accessibility.

A specific directory is designated to files that share similarities, for example, we may keep all the photos in the "Pictures" directory. In this same way, **directories are considered as Python packages** and **files as modules**. 

As our program grows larger in size with an increased number of modules, we can cluster similar modules in one package and other clusters of similar modules in different packages. This in turn will allow for the efficient management of our project (program), making it conceptually clear. Similarly, as a directory can contain subdirectories and files, a Python package can also contain sub-packages and modules.

In order for a directory to be considered as a package by Python, it must contain a file named `__init__.py`. This file can be left empty but the initialization code for that package is generally placed in this file.

The figure below presents a possible organization of packages and modules present if we were developing a game:

<br></br>

<div align="center" style="width: 800px; font-size: 80%; text-align: center; margin: -20 auto">
<img src="https://github.com/Explore-AI/Pictures/blob/master/python_modules_and_packages.png?raw=true"
     alt="Python modules for a hypothetical game"
     style="float: center; padding-bottom=0.5em"
     width=800px/>
   
Using a game to represent the package structure which can be used in Python to logically structure our code as it grows in complexity. 
</div>

## Standard Python Packages

Python distributions are shipped with a standard list of libraries/packages, some of these include:

**NB! You do not need to know any of these packages or what they do right now, but can be referenced later on!**

**Text Processing Services**

* string — Common string operations
* re — Regular expression operations
* unicodedata — Unicode Database

**Data Types**

* datetime — Basic date and time types
* calendar — General calendar-related functions
* array — Efficient arrays of numeric values
* copy — Shallow and deep copy operations
* pprint — Data pretty printer

**Numeric and Mathematical Modules**

* numbers — Numeric abstract base classes
* math — Mathematical functions
* cmath — Mathematical functions for complex numbers
* decimal — Decimal fixed point and floating point arithmetic
* fractions — Rational numbers
* random — Generate pseudo-random numbers
* statistics — Mathematical statistics functions

**File and Directory Access**

* pathlib — Object-oriented filesystem paths
* fileinput — Iterate over lines from multiple input streams
* stat — Interpreting stat() results
* filecmp — File and Directory Comparisons
* tempfile — Generate temporary files and directories
* shutil — High-level file operations

**Data Persistence**

* pickle — Python object serialization
* copyreg — Register pickle support functions
* shelve — Python object persistence
* marshal — Internal Python object serialization
* dbm — Interfaces to Unix “databases”
* sqlite3 — DB-API 2.0 interface for SQLite databases

**Data Compression and Archiving**

* zlib — Compression compatible with gzip
* gzip — Support for gzip files
* bz2 — Support for bzip2 compression
* lzma — Compression using the LZMA algorithm
* zipfile — Work with ZIP archives
* tarfile — Read and write tar archive files

**File Formats**

* csv — CSV File Reading and Writing
* configparser — Configuration file parser
* netrc — netrc file processing
* xdrlib — Encode and decode XDR data
* plistlib — Generate and parse Mac OS X .plist files

**Cryptographic Services**

* hashlib — Secure hashes and message digests
* hmac — Keyed-Hashing for Message Authentication
* secrets — Generate secure random numbers for managing secrets

**Generic Operating System Services**

* os — Miscellaneous operating system interfaces
* io — Core tools for working with streams
* time — Time access and conversions
* errno — Standard errno system symbols
* ctypes — A foreign function library for Python

**Concurrent Execution**

* threading — Thread-based parallelism
* multiprocessing — Process-based parallelism
* subprocess — Subprocess management

**Networking and Interprocess Communication**

* asyncio — Asynchronous I/O
* socket — Low-level networking interface
* ssl — TLS/SSL wrapper for socket objects
* signal — Set handlers for asynchronous events
* mmap — Memory-mapped file support

**Internet Data Handling**

* email — An email and MIME handling package
* json — JSON encoder and decoder
* mailcap — Mailcap file handling
* mailbox — Manipulate mailboxes in various formats
* Graphical User Interfaces with Tk

* tkinter — Python interface to Tcl/Tk

Python packages can also be installed from local or online repositories such as the **Package Index (PyPI)**, this is a repository of software for the Python programming language. 

PyPI helps you find and install software developed and shared by the Python community. For specific applications such as scientific computing, packages can be installed using package managers such as [anaconda](https://www.anaconda.com/products/individual).

## Conclusion

In this train, we learned how to create, import and use modules. We also explored the use of built-in modules in Python and how to determine what functions are present within modules. We further looked at Python packages; their creation and hierarchical organization. There are endless uses of modules and packages and each can be tailored to your area of expertise and shared along with other programmers. 

You can apply the skills you have gained from this train to create your own custom built modules and packages. You can go ahead and create a package that may assist you with data analysis, or you can even be adventurous and create your very own data-driven game!

## Appendix

Below are additional useful resources to help you further under python modules and packages:

-  [Official Python Tutorial for Modules](https://docs.python.org/3/tutorial/modules.html)