# Python Training:
## <span style="color:darkblue">A Short Guide to Using Classes in Python: continued</span>
#### <span style="color:light gray">Tristen Wentling</span>

This notebook is a continuation intended to privide more examples of using classes in Python. It follows the notebooks *A Brief Introduction to Classes and Object Oriented Programming in Python* and *A Short Guide to Using Classes in Python*. For further and more detailed information on classes and object oriented programming refer to [Learn Python the Hard Way](https://learnpythonthehardway.org/book/ex41.html), [Think Python 2e](http://greenteapress.com/thinkpython2/html/thinkpython2016.html) (full pdf available), [the official documentation](https://docs.python.org/3/tutorial/classes.html), or [Tutorialspoint](https://www.tutorialspoint.com/python/python_classes_objects.htm).

Topics included are:
* Using classes as modules
* Basics of multiprocessing and multithreading        

# Using classes as modules

## <span style="color:green">Using modules to clean up your work</span>
In part one we walked through the process of building a model to design a class and built up each of its components until ultimately we had a nice grocery store class that had a lot of built in functionality. We Made classes for products, employees, and the grocery store itself, having decided that facilities information could be left as attributes of the grocery store class.

| |**<center>  Grocery Store </center> **| |
|---------------|------------------------------|---------------------|
|  Products  |   <center> Employees </center> |   Facilities  |

As part of that process we also built up some subclasses for the Products class to provide different features for some different product types: produce, dry goods and dairy items.

|         |**Products**|          |
|:-------:|:----------:|:--------:|
|Produce  |Dairy       |Dry Goods |   

There are some benefits to using a notebook as we develop a class. For instance, it might be nice to have all of the parts laid out where we can see them and modify them as you fine tune your class methods and tweak how different components work together  as you build it up. Eventually, however, once you have finished building in all of the functionality you're looking for, it becomes much neater to make use of this portability aspect that often comes up.

One of the best reasons to spend the time and effort to build classes and write functions is for modularity.  We want to build things that we will be able to use more than once a lot of the time, and so instead of leaving all of this work in a notebook we can extract our efforts and write them into a module that we can import into new programs.  

We do this by saving the information as a python script which we can then import everything we need from, leaving out all of the details and leaving us functionality like we are used to with other python classes and modules, like lists or pandas dataframes.

This is also a good reason to include docstrings with your definitions so that after importing you can make reference to this information.

Let's look at how we can use our GroceryStore class as an imported module.

### <span style="color:green">The grocery module</span>
Take all of the class definitions and save them into a python script file, ***grocery.py***. Then while working in the same directory, you can import grocery just like you would any other module.

In [None]:
from grocery import GroceryStore

In [None]:
Publix = GroceryStore()

# Mock csv data
hired_employees = [[3,  'Jessyca'],
                   [4,  'Melissa'],
                   [5,  'Michelle'],
                   [6,  'Beverly'],
                   [7,  'Constance'],
                   [8,  'Jerry'],
                   [9,  'Garry'],
                   [10, 'Terry']]

shipping_manifest = [['dairy', 'cheese', 3, 1.99, 63],
                     ['dairy', 'cream', 4, 3.19, 29],
                     ['dairy', 'butter', 5, 5.14, 74],
                     ['dry goods', 'cereal', 6, 2.96, 515],
                     ['dry goods', 'flour', 7, 1.59, 95],
                     ['produce', 'celery', 8, 1.79, 54],
                     ['produce', 'carrots', 9, 2.46, 112],
                     ['produce', 'mangoes', 10, 1.82, 42]]

#  Adding records to the Publix store like we did before
for i in range(len(hired_employees) ):
    a, b = hired_employees[i][:]
    Publix.add_employee(hired_employees[i][0], hired_employees[i][1])
    c, d, e, f, g = shipping_manifest[i][:]
    Publix.add_inventory(c, d, e, f, g)

In [None]:
Publix.get_employee_names()

In [None]:
Publix.get_inventory()

So by wrapping it all together as a module we get all of the functionality we built into our class, but we can now build the Publix store, with data, with just a couple lines of code.

Next we're going to talk briefly about multiprocessing and multithreading and show how classes are often used to take advantage of the the utility of multithreading processes.

# Multiprocessing and Multithreading

## <span style="color:green">Multiprocessing</span>

Python does have one significant drawback, called gloabl interpreter lock (GIL).  This is a well documented issue ( if you want to know more about it just google "python GIL" ) that basically means there is a performance limitation on the Python interpreter that processes your code.  While some libraries, like numpy, have found a way to sidestep this issue, it is generally not a trivial task to do on your own.  When you have large amounts of datasets or processes to run, this limitation can slow you down but there are ways you can make your code more efficient.

The difference between the two terms multiprocessing and multithreading is not immediately evident, but the distinction is important to consider as the two have different purposes.  Multiprocessing is used when you want to parallelize some type of functionality, particularly things like reading and writing to or from several large files.  These operations are not slowed down so much by the interpreter lock as they are reading and writing speeds.  

The good news is there is a multiprocessing library ([see the docs](https://docs.python.org/3/library/concurrent.futures.html)) that can be leveraged to make this faster. Though this is not our focus, we can look briefly how it might be used. One particularly helpful application of this is creating and processing several numpy arrays or pandas dataframes. It may vary depending on the particular tasks, but often you can gain significant improvements in performance.

First we'll create some fake files to process and some function we want to run on each fake file.

In [None]:
# Create some fake files
set_a = ['x', 1, 2, 3, 'a', 'b', 'c', '1s2']
set_b = ['a', 'tt','pl', '1', 17, 23, 'k', 'l']
set_c = [9, '9', '1', 1, '3', 3, 5, '7']

# Write a function that you want to execute on each list
def tester(a_list):
    results = list()
    for item in a_list:
        try:
            results.append(item.isalpha())
        except:
            if type(item) == str:
                raise
            else:
                results.append(False)
    return results
r = 1
#r = 10
#r = 100
#r = 1000
#r = 10000
#r = 100000
#r = 1000000

#  Modification for second test
set_a = set_a * r
set_b = set_b * r
set_c = set_c * r

# Store the fake files together
filenames = [set_a, set_b, set_c]

In [None]:
# import the multiprocessing tool
import concurrent.futures as fut

Now we'll test processing the tester function by iterating through the list and then using the timeit cell magic we'll time each method.

(note: 1 s = 1000 ms = 1,000,000 µs)

In [None]:
%%timeit
beginner = list()
for i in filenames:
    beginner.append(tester(i))

In [None]:
%%timeit
with fut.ProcessPoolExecutor() as executor:
    intermediary = executor.map(tester, filenames)

What happened?  Didn't I just tell you multiprocessing would be faster?  It has to do with the size and overhead of each method. It takes a little more to start up the multiprocessing than the built in iterators. Try uncommenting the modifications above and rerun it. You should see that the iteration method grows at a faster rate than the pool method even though initially it runs faster for smaller data.  These differences become more pronounced with actual file input and output operations.

<a href="https://drive.google.com/a/nielsen.com/file/d/0B0jAkOUhQ4j6cVRENmZhUk5CS2M/view?usp=sharing"> Graph </a>
![Comparison Plot](IterVsPlot.png)
[//]: # (![Comparison Plot](IterVsPlot.png))

## <span style="color:green">Multithreading</span>

Now for our real purpose, we want to look at multithreading.  Multithreading is a way to split up processes into **threads**. These threads can be run in order, can wait for a trigger event to execute, or switch back and forth from one thread to another, as well as having a lot of other features. At a basic level, we can explore minimally how we actually use classes to create new threads and use their methods to work together.
A great deal of information is available to learn more about multithreading with Python, a few good references are:
* [The official documentation](https://docs.python.org/3/library/threading.html)
* [Tutorialspoint](https://docs.python.org/3/library/threading.html)
* [Python Module of the Week](https://pymotw.com/3/threading/)

For our first step we'll import the library and use it to define a new class. We'll call the parent class's initializer and set some attributes as well as create a quick function that each thread will run

In [None]:
from threading import Thread
import time

exitFlag = 0

class myThread(Thread):
    """We want to create some basic threads to do some basic stuff """

    def __init__(self, ID, counter = 5, delay = 5):
        """How we're initializing our threads"""
        super(myThread, self).__init__()
        self.threadID = ID
        self.name = "Thread " + str(ID)
        self.counter = counter
        self.delay = delay

    def run(self):
        """What we want each thread to do when it runs"""
        self.print_time()

    def print_time(self):
        """Tells thread how long to sleep and prints a timestamp for when it ran"""
        while self.counter:
            if exitFlag:
                self.exit()
            time.sleep(self.delay)
            print(self.name + " ran at " + str(time.ctime(time.time())) + "\n")
            self.counter -= 1

Now we'll create some threads from our extension of the class definition and run them.

In [None]:
# Create new threads
thread1 = myThread(ID=1, delay=5)
thread2 = myThread(ID=2, delay=2)
thread3 = myThread(ID=3, delay=3)
# Start new Threads
thread1.start()
thread2.start()
thread3.start()

So it looks like we have successfully created some threads using our extension of the original Thread definition.  Using these threads can be especially helpful if you want some processes to wait on some other process, like having one thread update a file before the other pulls the data from the file. 

As far as parallelization goes, threading still can suffer from the GIL problem, but there are also other newer libraries that can bypass this problem to a greater extent. The multiprocessing library we used the Pool function from earlier has something similar for processes as well, for more info see [this blog post](http://chriskiehl.com/article/parallelism-in-one-line/).

In all there a great deal of tools available to us, and while classes may seem rather involved 