# Introducing Libraries: NumPy

- onl01-dtsc-pt-041320
- 05/01/20

![libgif](https://media0.giphy.com/media/7E8lI6TkLrvvAcPXso/giphy.gif?cid=790b76115d360a95792e4333770609b8&rid=giphy.gif)

## Introduction/Overview

### Learning Objectives

#### _Our goals today are to be able to_: <br/>

- Identify and import Python modules and packages (libraries)
- Identify differences between NumPy and base Python in usage and operation
- Create a new module of our own

#### _Big questions for this lesson_: <br/>
- What is a package, what do packages do, and why might we want to use them?
- When do we want to use NumPy?

<!---### Activation:

![excel2](https://raw.githubusercontent.com/jirvingphd/dsc-lp-libraries-numpy/master/img/excelpic2.jpg)

Most people have used Microsoft Excel or Google sheets. But what are the limitations of excel?

- [Take a minute to read this article](https://www.bbc.com/news/magazine-22223190)
- make a list of problems excel presents

- **Q: How is using python different?**

- **A: Python...**
    -  
    --->

#### Pair Programming / Group Activity:


- Pair Programming in Breakout Rooms
    - To Fork, go to:
        - https://github.com/jirvingphd/dtsc-pt-041320-sect-04-python-libraries-study-group
    - To clone directly:
        - `git clone https://github.com/jirvingphd/dtsc-pt-041320-sect-04-python-libraries-study-group.git`


## Questions?

1. What is/does ‘Latin-1’ do when Encoding?
2. What’s the difference between panda series and panda DataFrames?
3. How do we identify that the Dataframe really has two table views, one on top of the other? 
    - Also going over the thought process behind cleaning up the data.



___

# Importing Python packages 

In an earlier lesson, we wrote a function to calculate the mean of an list. That was **tedious**. To make our code efficient we could store that function in a *python module* and call it later when we need it. 

And thankfully, other people have _also_ wrote and optimized functions and wrapped them into **modules** and **packages** (also known as _libraries_ )

![numpy](https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/numpy.png)

[NumPy](https://www.numpy.org/) is the fundamental package for scientific computing with Python. 


To import a package type `import` followed by the name of the library as shown below.

## Terminology

![mod2](https://raw.githubusercontent.com/jirvingphd/dsc-lp-libraries-numpy/master/img/modules2.png)

![packages3](https://raw.githubusercontent.com/jirvingphd/dsc-lp-libraries-numpy/master/img/packages3.png)

## pip & the Python Package Index

![python-fact](https://raw.githubusercontent.com/jirvingphd/dsc-lp-libraries-numpy/master/img/python_def.png)

![pypi](https://raw.githubusercontent.com/jirvingphd/dsc-lp-libraries-numpy/master/img/pypi_packages.png)

In [2]:
## One of My PyPi packages for flatiron school data science  (fsds)
!pip install -U fsds
from fsds.imports import *

fsds v0.2.8 loaded.  Read the docs: https://fs-ds.readthedocs.io/en/latest/ 


Handle,Package,Description
dp,IPython.display,Display modules with helpful display and clearing commands.
fs,fsds,Custom data science bootcamp student package
mpl,matplotlib,Matplotlib's base OOP module with formatting artists
plt,matplotlib.pyplot,Matplotlib's matlab-like plotting module
np,numpy,scientific computing with Python
pd,pandas,High performance data structures and tools
sns,seaborn,High-level data visualization library based on matplotlib


### You're not limited to PyPI

Make your own modules
![pipmod](https://raw.githubusercontent.com/jirvingphd/dsc-lp-libraries-numpy/master/img/import_modules.png)

![pippack](https://raw.githubusercontent.com/jirvingphd/dsc-lp-libraries-numpy/master/img/package_redo.png)

# First library we will import is `Numpy`


![numpy](https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/numpy.png)

[NumPy](https://www.numpy.org/) is the fundamental package for scientific computing with Python. 


To import a package type `import` followed by the name of the package as shown below.

In [5]:
import numpy
numpy

<module 'numpy' from '//anaconda3/envs/learn-env/lib/python3.6/site-packages/numpy/__init__.py'>

In [6]:
import numpy # Look, ma! we're importing!
l = [1,2,3]
x=numpy.array([1,2,3])
print(x)

[1 2 3]


#### New type of object

In [7]:
type(x)

numpy.ndarray

#### Alias libraries

Many packages have a canonical way to import them with an abbreviated alias.

In [8]:
import numpy as np # np = alias


y=np.array([4,5,6])
print(y)

[4 5 6]


#### Other standard aliases 

In [9]:
import scipy
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels as sm

### Import specific modules from a larger package

In [10]:
# sometimes we will want to import a specific module from a package
import matplotlib.pyplot as plt
from matplotlib import pyplot as plt 

In [16]:
import test_folder.temperizer  as tp
tp

<module 'test_folder.temperizer' from '/Users/jamesirving/Documents/GitHub/_STUDY GROUP PREP/online-dtsc-pt-041320-cohort-notes/Mod 1/sect_04/test_folder/temperizer.py'>

What happens if we mess with naming conventions? For example, import one of our previous libraries as `print`.


**PLEASE NOTE THAT WE WILL HAVE TO RESET THE KERNEL AFTER RUNNING THIS.**<br> Comment out your code after running it.


In [12]:
# import seaborn as print

In [17]:
#Did we get an error? What about when we run the following command?

print(x)

#Restart your kernel and clear cells

[1 2 3]


#### Helpful links: package documentation

Packages have associated documentation to explain how to use the different tools included in a package.

_Sample of libraries_
- [NumPy](https://docs.scipy.org/doc/numpy/)
- [SciPy](https://docs.scipy.org/doc/scipy/reference/)
- [Pandas](http://pandas.pydata.org/pandas-docs/stable/)
- [Matplotlib](https://matplotlib.org/contents.html)

## NumPy versus base Python

Now that we know packages exist, why do we want to use them? Let us examine a comparison between base Python and Numpy.

Python has lists and normal python can do basic math. NumPy, however, has the helpful objects called **arrays**.

Numpy has a few advantages over base Python which we will look at.

### Numpy makes math easy

Because of numpy we can now get the **mean** and other quick math of lists and arrays.

In [18]:
example = [4,3,25,40,62,20]
print(np.mean(example))

25.666666666666668


#### Different types of arrays

In [19]:
names_list=['Bob','John','Sally']
names_array=numpy.char.array(['Bob','John','Sally']) #use numpy.array for numbers and numpy.char.array for strings
print(names_list)
print(names_array)

['Bob', 'John', 'Sally']
['Bob' 'John' 'Sally']


In [20]:
display(names_list)
display(names_array)

['Bob', 'John', 'Sally']

chararray(['Bob', 'John', 'Sally'], dtype='<U5')

#### Array math in action

In [21]:
# Make a list and an array of three numbers

#your code here
numbers_list = [5,22,33,90]
numbers_array = np.array([5,22,33,90])

In [22]:
# divide your array by 2

numbers_array/2

array([ 2.5, 11. , 16.5, 45. ])

In [23]:
# divide your list by 2

numbers_list/2

TypeError: unsupported operand type(s) for /: 'list' and 'int'

Numpy arrays support the `div()` operator while python lists do not. There are other things that make it useful to utilize numpy over base python for evaluating data.

In [24]:
# shape tells us the size of the array

numbers_array.shape

(4,)

### Numpy array and matrix creation functions

Take 5 minutes and explore each of the following functions.  What does each one do?  What is the syntax of each?
- `np.zeros()`
- `np.ones()`
- `np.full()`
- `np.eye()`
- `np.random.random()`

In [25]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [26]:
np.ones(5)

array([1., 1., 1., 1., 1.])

In [27]:
np.full((3,3),3.3)


array([[3.3, 3.3, 3.3],
       [3.3, 3.3, 3.3],
       [3.3, 3.3, 3.3]])

In [28]:
np.eye(6)

array([[1., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 1.]])

In [29]:
np.random.random(6)

array([0.30997128, 0.18943579, 0.55670047, 0.35061903, 0.28778125,
       0.66735802])

### Slicing in NumPy

In [31]:
# We remember slicing from lists
numbers_list = list(range(10))
numbers_list

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [32]:
numbers_list[3:7]

[3, 4, 5, 6]

In [37]:
# Slicing in NumPy Arrays is very similar!
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(a.shape)
a

(3, 4)


array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [38]:
a.shape

(3, 4)

In [39]:
a[:,2]

array([ 3,  7, 11])

In [None]:
# first 2 rows, columns 1 & 2 (remember 0-index!)
b = a[:2, 1:3]
b

### Datatypes in NumPy


In [40]:
a.dtype

dtype('int64')

In [41]:
names_list.dtype

AttributeError: 'list' object has no attribute 'dtype'

In [None]:
a.astype(np.}float64)

### More Array Math 
#### Adding matrices 

In [42]:
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

# Elementwise sum; both produce the array
# [[ 6.0  8.0]
#  [10.0 12.0]]
print(x + y)

[[ 6.  8.]
 [10. 12.]]


In [43]:
print(np.add(x, y))

[[ 6.  8.]
 [10. 12.]]


#### Subtracting matrices 

In [44]:
# Elementwise difference; both produce the array
# [[-4.0| -4.0]
#  [-4.0 -4.0]]
print(x - y)

[[-4. -4.]
 [-4. -4.]]


In [45]:
print(np.subtract(x, y))

[[-4. -4.]
 [-4. -4.]]


#### Multiplying matrices 

In [46]:
# Elementwise product; both produce the array
# [[ 5.0 12.0]
#  [21.0 32.0]]
print(x * y)

[[ 5. 12.]
 [21. 32.]]


In [47]:
print(np.multiply(x, y))

[[ 5. 12.]
 [21. 32.]]


#### Dividing matrices 

In [48]:
# Elementwise division; both produce the array
# [[ 0.2         0.33333333]
#  [ 0.42857143  0.5       ]]
print(x / y)

[[0.2        0.33333333]
 [0.42857143 0.5       ]]


In [49]:
print(np.divide(x, y))

[[0.2        0.33333333]
 [0.42857143 0.5       ]]


#### Raising matrices to powers 

In [50]:
# Elementwise square root; both produce the same array
# [[ 1.          1.41421356]
#  [ 1.73205081  2.        ]]
print(x ** (1/2))

[[1.         1.41421356]
 [1.73205081 2.        ]]


In [51]:
print(np.sqrt(x))

[[1.         1.41421356]
 [1.73205081 2.        ]]


### Numpy is faster

Below, you will find a piece of code we will use to compare the speed of operations on a list and operations on an array. In this speed test, we will use the package [time](https://docs.python.org/3/library/time.html).

In [52]:
import time
import numpy as np

size_of_vec = 1000

def pure_python_version():
    t1 = time.time()
    X = range(size_of_vec)
    Y = range(size_of_vec)
    Z = [X[i] + Y[i] for i in range(len(X))]
    return time.time() - t1

def numpy_version():
    t1 = time.time()
    X = np.arange(size_of_vec)
    Y = np.arange(size_of_vec)
    Z = X + Y
    return time.time() - t1


t1 = pure_python_version()
t2 = numpy_version()
print("python: " + str(t1), "numpy: "+ str(t2))
print("Numpy is in this example " + str(t1/t2) + " times faster!")

python: 0.0004429817199707031 numpy: 3.0994415283203125e-05
Numpy is in this example 14.292307692307693 times faster!


___

# ACTIVITY: Pair Programming 

- The remainder of this notebook is a Pair Programming Activity (time=15-30 min(?))

    - We will break into groups of 2-3 students. 
    - One student will share their screen while the group works together to solve the tasks. 

    - Clone the activity repository to your computer (or [visit GitHub and fork it](https://github.com/jirvingphd/dtsc-pt-041320-sect-04-python-libraries-study-group) first so you can save your work.)
- `git clone https://github.com/jirvingphd/dtsc-pt-041320-sect-04-python-libraries-study-group.git`

## Making our own module

![modlife](https://media1.giphy.com/media/dW0KhIROCaAdCO0V3S/giphy.gif?cid=790b76115d36096678416c65519d8082&rid=giphy.gif)

In [53]:
# this option will re-import your module each time you save an update to it

%load_ext autoreload
%autoreload 2

In [54]:
import tetetemperizer as tp

ModuleNotFoundError: No module named 'tetemperizer'

### Example: Convert F to C

1. This function is already implemented in `temperizer.py`.
2. Notice that we can call the imported function and see the result.

In [None]:
# 32F should equal 0C
tp.convert_f_to_c(32)

In [None]:
# -40F should equal -40C
tp.convert_f_to_c(-40)

In [None]:
# 212F should equal 100C
tp.convert_f_to_c(212)

### Your turn: Convert C to F

1. Find the stub function in `temperizer.py`
2. The word `pass` means "this space intentionally left blank."
3. Add your code _in place of_ the `pass` keyword, _below_ the docstring.
4. Run these cells and make sure that your code works.

In [None]:
# 0C should equal 32F
tp.convert_c_to_f(0)

In [None]:
# -40C should equal -40F
tps.convert_c_to_f(-40)

In [None]:
# 100C should equal 212F
tps.convert_c_to_f(100)

## Next: Adding New Functions

You need to add support for Kelvin to the `temperizer` library.

1. Create new _stub functions_ in `temperizer.py`:

    * `convert_c_to_k`
    * `convert_f_to_k`
    * `convert_k_to_c`
    * `convert_k_to_f`

    Start each function with a docstring and the `pass` keyword, e.g.:

    ```python
    def convert_f_to_k(temperature_f):
        """Convert Fahrenheit to Kelvin."""
        pass
    ```

2. Add cells to this notebook to test and validate these functions, similar to the ones above.

3. Then, go back to `temperizer.py` to replace `pass` with your code.

4. Run the notebook cells to make sure that your new functions work.

#### Small note:

Docstrings (those lines with `""" """` on either side) allow us to create self-documented code. 

Now later, if you forget what each function does, you can use the `?` or `help` functions the same way you would with other functions, and your documentation will show up!

In [None]:
tps.convert_f_to_c()

### Extra credit:

make a function in your temperizer that will take a temp in F, and print out:

```
The temperature [number] F is:
    - x in C
    - y in k
```

In [None]:
tps.convert_f_to_all(89)

## Congrats!!

You've now made your own module of temperature conversion functions!

#### _Our goals today were to be able to_: <br/>

- Identify and import Python modules and packages (libraries)
- Identify differences between NumPy and base Python in usage and operation
- Create a new module of our own