 # Programming and Database Fundamentals for Data Scientists - EAS503


## Python Modules
In a typical `Python` program, you define functions. By default, a user defined function is only available to the file in which it is defined. If you want to use the function in another code file (or in other interactive `Python` session), you will have to _copy_ that function there as well.

Clearly, that is not efficient. Luckily, `Python` allows for defining functions in a way such that they can be used in other files (or other interactive `Python` session).

The simplest solution is to put the function(s) in a file (also known as a **module**) and then _import_ that module in the main code (also known as the main module).

### Module
A _module_ is a file containing `Python` definitions and statement. It is a file with suffix `.py` appended. 
#### Global Variable `__name__`
The global variable `__name__` contains the name of a module.

In [1]:
print(__name__)

__main__


#### Example
Let us define two functions that perform a mathematical operation. Create a file called `myfuncs.py` with following lines:
```python
def sum2(x,y):
    return pow(x,2) + pow(y,2)

def sum3(x,y):
    return pow(x,3) + pow(y,3)

```

How do I use these two functions, defined in the module `myfuncs.py`, in the notebook here.

Simple, we use the keyword `import`

In [None]:
myfuncs.sum2(3,4)

In [None]:
myfuncs.sum3(3,4)

When we use `import myfuncs`, the module name is entered into the current symbol table (use `dir()` function to list the contents). Using the module name, `myfuncs`, we can access the functions (and variables) defined inside the module.

In [7]:
myfuncs.__name__

'myfuncs'

A module can contain executable statements as well as function definitions. The statements are run only the _first_ time the module is imported. 

This also means that if you update a module during an interactive session, you will have to reload the module.

Example, in the `myfuncs.py` file, add a global print statement to the top of the file.
```python
print('Initializing the module')
```

In [3]:
import myfuncs

Modules can import other modules. It is customary to place all import statements at the top of your code, though it is not a requirement.

Importing specific functions from a module is also possible using the keyword `from`

In [4]:
from myfuncs import sum2

This does not introduce the module name from which the imports are taken in the local symbol table.

You can also import all functions in a module:

In [6]:
from myfuncs import *

However, importing everything using `*` is not a good practice, since it causes poorly readable code. It does save on typing.

#### Renaming imports
Sometimes you might want to _rename_ the imported module (for readability or avoiding conflicts). This can be done using the keyword `as`.

In [8]:
import myfuncs as myfunctions
myfunctions.sum2(4,5)

41

In [9]:
from myfuncs import sum2 as sumsquared
sumsquared(4,5)

41

### Executing modules as scripts
What if I want to have some statements in the module that I want to run as a script.

For example, add the following lines to `myfuncs.py`:
```python
print(sum2(4,5))
```
And then run the file from command line:
```shell
python myfuncs.py
```
This will get the desired output. However, if I import this file, the same code will run at import time. This is undesirable.

To avoid this, we will use the global variable `__name__` which will be set to `__main__` only if the module is being executed as a `Python` file.

To achieve this, modify the statement as:
```python
if __name__ == "__main__":
    print(sum2(4,5))
```
The above statement will only be executed if the module is executed as a `Python` file, but not when it is imported.

This is useful, if you create a module and add a few tests as part of the module.

## `Python` Packages
Packages allows for structuring modules in `Python` by using "dotted module names". For instance, you can have a package `A`, that consists of modules `b` and `c`. These will be accessed as `A.b` and `A.c`, respectively.

By structuring modules within a package, one does not have to worry about conflicts between the global variable names of `b` and `c`. Additionally, often, the module collection within a pacakge, allows you to naturally differentiate between different types of capabilities. 

> For example, you might be developing a collection of functions that allow you to analyze urban data. You might want to implement three different types of functionalities. 
    - First is to be able to import different formats of data - csv, database, excel, binary format, text, etc.
    - Second is to be able to apply preprocessing operations on the data - normalization, feature selection, sampling, etc.
    - Third is to be able to apply different types of machine learning functions - classification, clustering, etc.
Now, instead of creating one module and creating a long list of functions, one could create three modules, one for each "class" of operations (io, preprocess, and analysis), and then define functions within each module. 

> These modules can then be "collected" into one "package". Here is a possible directory/file structure that forms a valid `Python` package:
* urbanscience/
    * \_\_init.py\_\_
    * io/
        - \_\_init.py\_\_
        - textread.py
        - textwrite.py
        - csvread.py
        - csvwrite.py
        - ...
    * preprocess/
        - \_\_init.py\_\_
        - normalize.py
        - sampling.py
        - ...
    * analysis/
        - \_\_init.py\_\_
        - svm.py
        - ...

> The role of the `\_\_init\_\_.py` is to tell `Python` to treat the directory as a package. The use of this file ensures that one can ignore certain directories that we do not intend to be treated as a package. \_\_init\_\_.py can just be an empty file, but it can also execute initialization code for the package or set the \_\_all\_\_ variable, described later.

> You can then `import` the package as:
```python
import urbanscience as us
```
Or just a subpackage
```python
import urbanscience.preprocess as usp
```
Or a specific module
```python
import urbanscience.preprocess.sampling as sm
```
This can also be done as:
```python
from urbanscience.preprocess import sampling as sm
```

The last two imports have the same effect in the above example. However, there is an important difference in the two.

When using `from package import item`, the `item` can either be a subpackage of the package or a module or a name defined within the package (variable, function, or class).

However, when using `import package.item1.item2`, the items except for the last one should be package or subpackage, and the last item should be a package or a module, but not a variable, function, or a class defined within the package.

#### Importing * from a package
If you do 
```python
from package-name import *
```
all subpackages, modules, variables, etc., within the package are loaded. This could take a very long-time and should be avoided.

While the user can avoid this by avoiding `*` in import statements, the package creator can help by defining a `\_\_all\_\_` list in `\_\_init\_\_.py` file that lists the submodules that should be imported when `*` is used.

### How does `Python` find a package
When we import a package, `Python` searches through the directories defined in `sys.path` to look for a directory corresponding to the specified package.

In [23]:
import sys
sys.path

['',
 '/anaconda3/lib/python36.zip',
 '/anaconda3/lib/python3.6',
 '/anaconda3/lib/python3.6/lib-dynload',
 '/anaconda3/lib/python3.6/site-packages',
 '/anaconda3/lib/python3.6/site-packages/aeosa',
 '/anaconda3/lib/python3.6/site-packages/IPython/extensions',
 '/Users/chandola/.ipython']

Note that this contains the current directory as well.