# Importing Modules in Python
## Modules, Packages, and Libraries
### What is a Module/Package/Library?

A module is what we call a single .py file, specifically when it is intended to be imported rather than run directly. You might create a python file with many useful functions in it called “my_funcs.py”. You then might create a python file called “main.py” that uses the functions in my_funcs.py and runs whatever specific business need you have. In this case, you would refer to my_funcs.py as the “my_funcs module”.  

A package is an entire directory of modules that are “packaged” into one. The vast majority of third party code used for data science are large and complex enough to warrant many modules, so they are all referred to as “python packages”. Typically you will write modules, but you probably won’t write entire packages.  

In python a “library” doesn’t have a distinct meaning. However, python by default comes with several packages called the “standard library”. That is, when you install python, you will automatically have those packages. You will not need to take any extra steps to download them. An example is the “random” package from the standard library, which generates random numbers. A data science package like scikit-learn is a 3rd party package that needs to be downloaded separately, so it is not part of the “standard library”.

## Importing Modules and Packages
Importing files in python is similar to the `%include` statement in SAS. It is used to pull in variables and code from another python file. These python files can either be code that you have written or code from official or third party python packages. Similar to SAS, we might want to do this in order to separate code into logical chunks. We also might use it to reduce copy/pasting.
There are two main ways to import code:
First, you can import an entire module (or submodule). Note you can, and often should, create a shorthand alias using the “as” keyword.

In [5]:
#Import a module
import pandas

#Now to use the "DataFrame" class in pandas call:
df = pandas.DataFrame([[1,2,3],[4,5,6]])
#We'll go over DataFrames later, for now just roll with it

#Import a module using a shorthand alias
import pandas as pd

#Now we can use the "DataFrame" object using a shorter version:
df = pd.DataFrame([[1,2,3],[4,5,6]])
df

Unnamed: 0,0,1,2
0,1,2,3
1,4,5,6


Second, you can import a specific variable/function/class from a module. If you don’t want to import an entire module (or if you don’t want to always use a prefix) you can import a function/variable/class directly from a module like so:

In [6]:
#Import a module
from pandas import DataFrame

#Now you can use DataFrame directly without any prefix
df = DataFrame([[1,2,3],[4,5,6]])

#You can also import multiple variables at once by separating with commas
from pandas import DataFrame, Series
df = DataFrame([[1,2,3],[4,5,6]])
s = Series([1,2,3,4])
s

0    1
1    2
2    3
3    4
dtype: int64

## Difference with SAS
When using the `%include` statement in SAS, all variables are now in the same “area” or scope. You never have to use prefixes like `00_setup.user`. You simply type `user`. On the one hand this seems simpler, but it can lead to problems and/or confusion. For example, suppose you have the following code:

```SAS
%include "&prjfolder./00_setup.sas";
%include "&prjfolder./10_pull_data.sas";

DATA mytable;
  SET sasdata.&source_table.;
  b = a*2;
RUN;
```

Where did the macro variable `&source_table`. come from? Is it in 00_setup.sas? Or in 10_pull_data.sas? Ideally we’d like to put all macro variables in 00_setup, but that’s not always what happens in practice. We could make it a rule never to %include more than one sas code, but that seems needlessly limiting.  

The only way for an analyst to know where `&source_table.` came from is to hunt through the various SAS codes in the hopes of finding it, but this is frustrating. Python’s prefixes like `pd.DataFrame` or explicit imports like `from pandas import DataFrame` remove this confusion.  

Another common problem is when two or more files unintentionally use the same variable names. Suppose I have the following two files:

### SAS file 00_setup.sas

```SAS
%let prjfolder = "/kroger/Lev1/analysis/cm/analysts/kevinb";
%let user = an_cm_ws14;
libname sasdata "&prjfolder./sasdata";
libname input "&prjfolder./input";
```

### SAS file 10_import_table.sas

```SAS
%include "/kroger/lev1/analysis/cm/analysts/kevinb/sasprogs/00_setup.sas";
%let user = an_cm_ws15;

DATA foo;
  SET sasdata.bar;
  a = 2*b;
RUN;

%db_proxy_user(&user.);
```

Each of these files defines the macro variable `user`. An analyst might look in the 00_setup file to find the schema, but find no tables in exadata. This might lead to a long time debugging what ended up being a stupid mistake.  

Or, suppose there was a macro defined in 00_setup that uses the `&user.` macro variable. If that macro was called later in 10_import_table.sas, this might lead to even more complicated confusion. Without prefixes, a programmer must be aware of all variable names in all included files and be sure not to overwrite any of them.  

If you’re not used to using prefixes, at first it can seem needless and tedious. But as you get used to them you realize how much time and trouble it saves.  


## Where does python look for modules?
Python basically looks for modules to import in two places:  

1. The current working directory. I.e. the location of the main code you’re running.
2. A special path where all 3rd party and standard library packages are stored. On windows that path is C:27-packages.  

If you look there, you can see all the source code for packages like scikit-learn. (Just be sure not to edit any files unless you really know what you’re doing!)  


## Practice 

In a single directory, create two python files called `functions.py` and `main.py`. Their content should look like this:

### functions.py

```python

def sqrt(x):
    '''
    Returns the square root of x.
    '''
    return x**0.5
    
if __name__ == '__main__':
    #Module Tests to make sure code works as planned
    if 2 != sqrt(4):
        print("Square root of 4 should be 2")
    
    if 9 != sqrt(82):
        print("Square root of 81 should be 9")
    
```

Note: the `if __name__ == '__main__'` snippet is an idiosyncrasy of python. Each module has a hidden variable called `__name__`. If the module is run directly, its value is `"__main__"`, otherwise it will be the name of the file. Often times you'll see test code written at the bottom of the module after `if __name__ == '__main__'`. This way the test code only runs if the module is run directly. It won't run if the module is imported from another file.

### main.py

```python
import functions as f

a = 81
b = f.sqrt(a) #b is now 9

```

For this simple program, splitting the problem into multiple modules is overkill. But you can imagine how for a large program it would be necessary to create separate modules.  

Modules have the side-effect benefit of forcing you to break your code down into logical parts. Just like functions force you to break individual commands into logical parts, modules force you to break entire programs into logical parts.  

In SAS, at least at 84.51, we tend to break programs up sequentially rather than logically. That is, we'll have 00_setup, 10_pull_data, 20_clean_data, 30_score_households, 40_create_report and so on. Although you can do this in python, typically programs are broken down into modules by logical function rather than sequence of execution.  


## Popular Standard Library Packages

There are dozens of packages in the standard library. The most commonly used ones include:  

* random -- create random numbers, select random elements from lists etc.
* datetime -- parse dates, do math with dates
* math -- common mathematical functions beyond basic addition/multiplication
* time -- get the current time
* copy -- create "deep copies" of python objects
* os -- common operating system specific commands
* pickle -- permanently store python objects in files
* multiprocess -- parallel processing support
* json -- parsing JSON files
* urllib2 -- handling URLs and HTTP requests
* sys -- system specific parameters and functions
* re -- regular expressions

Click to see the [full list of packages](https://docs.python.org/2/library/index.html) in the standard library.