# Module 1 - Introducing Libraries: NumPy

### Introduction

#### _Our goals today are to be able to_: <br/>

- Identify and import Python libraries
- Identify differences between NumPy and base Python in usage and operation

#### _Big questions for this lesson_: <br/>
- What is a package, what do packages do, and why might we want to use them?
- When do we want to use NumPy?

### Activation:

![excel](excelpic.jpg)

Most people have used Microsoft Excel or Google sheets. But what are the limitations of excel?

- [Take a minute to read this article](https://www.bbc.com/news/magazine-22223190)
- make a list of problems excel presents

How is using python different?

### 1. Importing Python Libraries




![numpy](https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/numpy.png)

[NumPy](https://www.numpy.org/) is the fundamental package for scientific computing with Python. 


To import a package type `import` followed by the name of the library as shown below.

In [None]:
import numpy as np # Many packages have a canonical way to import them

Now let's import some other packages. We will cover in more detail some fun options for numpy later.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
data = {'a': np.arange(50),
        'c': np.random.randint(0, 50, 50),
        'd': np.random.randn(50)}
data['b'] = data['a'] + 10 * np.random.randn(50)
data['d'] = np.abs(data['d']) * 100

plt.scatter('a', 'b',data=data)
plt.xlabel('entry a')
plt.ylabel('entry b')
plt.show()

Try importing the seaborn library as ['sns'](https://seaborn.pydata.org/) which is the convention.

In [None]:
#type your code here


What happens if we mess with naming conventions? For example, import one of our previous libraries as `print`.


PLEASE NOTE THAT WE WILL HAVE TO RESET THE KERNEL AFTER RUNNING THIS. Comment out your code after running it.


In [None]:
# your code here.


In [None]:
# Try using the print statement to print the string.
string='Hello'

In [None]:
print(string)

#### Helpful links: library documenation

Libraries have associated documentation to explain how to use the different tools included in a library.

- [NumPy](https://docs.scipy.org/doc/numpy/)
- [StatsModels](https://www.statsmodels.org/stable/index.html)
- [SciPy](https://docs.scipy.org/doc/scipy/reference/)
- [Pandas](http://pandas.pydata.org/pandas-docs/stable/)
- [Matplotlib](https://matplotlib.org/contents.html)

### 2. NumPy versus base Python

Now that we know libraries exist, why do we want to use them? Let us examine a comparison between base Python and Numpy.

Python has lists and normal python can do basic math. NumPy, however, has the helpful objects called arrays.

Numpy has a few advantages over base Python which we will look at.

### What is an array?
- A table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. 

### What does an array look like?

![numpy_array_t.png](attachment:numpy_array_t.png)


### Important 

In [None]:
y = np.array([[5.2,3.0,4.5],[9.1,0.1,0.3]])
y_3 = np.array([[[1,4,7],[2,9,7],[1,3,0],[9,6,9]]]) # Three-dimensional array
y_3.shape

In [None]:
# create z a numpy array of shape: (4,3,2).

## ***ndarray.ndim***  
 The number of axes (dimensions) of the array.   

In [None]:
y.ndim

***ndarray.shape***  
The dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the number of axes, ndim.  

In [None]:
y.shape

***ndarray.size***    
The total number of elements of the array. This is equal to the product of the elements of shape.

In [None]:
y.size

***ndarray.dtype***  
An object describing the type of the elements in the array.  

In [None]:
y.dtype

In [None]:
y.astype(int)

### Computing with Numpy the mean of the list.

In [None]:
list_1=[1,2,3,4,5]
# Your code here using no built in functions! Use len range.
  

- We wrote a function to calculate the mean of an list. Very **tedious !**.

- Thankfully, other people have wrote and optimized functions and wrapped them into **libraries** we can then call and use in our analysis.

- Because of numpy we can now get the **mean** and other quick math of lists and arrays.

In [None]:
print(y)
print('Mean of array', np.mean(y))
print('Mean along the axis=0',np.mean(y,axis=0))
print('Mean along the axis=1',np.mean(y,axis=1))

- Using `np.std`, compute the standard deviation of the entire array and of each axis.

In [None]:
# your code here


In [None]:
# Make a list and a numpy array of three numbers

#your code here
numbers_list =[1,2,3]
numbers_array=np.array([4,5,6])

In [None]:
# divide your array by 2

numbers_array / 2

In [None]:
# divide your list by 2

numbers_list / 2

Numpy arrays support the `_div_()` operator while python lists do not. There are other things that make it useful to utilize numpy over base python for evaluating data.

In [None]:
# Selection and assignment work as you might expect
numbers_array=np.array([4,5,6])
numbers_array[1]

In [None]:
numbers_array[1] = 10
numbers_array

Take 5 minutes and explore each of the following functions.  What does each one do?  What is the syntax of each?
- `np.zeros()`
- `np.ones()`
- `np.full()`
- `np.eye()`
- `np.random.random()`

In [None]:
np.zeros([3,3])

In [None]:
np.ones([4,4])

In [None]:
np.full((3,4), 5)

In [None]:
np.eye(7)

In [None]:
np.random.random((3,4))

### Slicing in NumPy

In [None]:
# We remember slicing from lists
numbers_list = list(range(10))
print(numbers_list)
numbers_list[3:7]

![numpy-slice_t.png](attachment:numpy-slice_t.png)

In [None]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
a

In [None]:
a[2: :1,: :3]

In [None]:
# first 2 rows, columns 1 & 2 (remember 0-index!)
b = a[:2, 1:3]
b

### Investigate some more.

In [None]:
# each row in the second column of a

In [None]:
# each column in the second and third row of a

### More Array Math

In [None]:
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

# Elementwise sum; both produce the array
# [[ 6.0  8.0]
#  [10.0 12.0]]
print(x + y,'\n')
print(np.add(x, y))

In [None]:
# Elementwise difference; both produce the array
# [[-4.0 -4.0]
#  [-4.0 -4.0]]
print(x - y,'\n')
print(np.subtract(x, y))

In [None]:
# Elementwise product; both produce the array
# [[ 5.0 12.0]
#  [21.0 32.0]]
print(x * y,'\n')
print(np.multiply(x, y))

In [None]:
# Elementwise division; both produce the array
# [[ 0.2         0.33333333]
#  [ 0.42857143  0.5       ]]
print(x / y,'\n')
print(np.divide(x, y))

In [None]:
# Elementwise square root; both produce the same array
# [[ 1.          1.41421356]
#  [ 1.73205081  2.        ]]
print(x ** .5,'\n')
print(np.sqrt(x))

Below, you will find a piece of code we will use to compare the speed of operations on a list and operations on an array. In this speed test, we will use the library [time](https://docs.python.org/3/library/time.html).

In [None]:
import time
import numpy as np

size_of_vec = 100000000

def pure_python_version():
    t1 = time.time()
    X = range(size_of_vec)
    Y = range(size_of_vec)
    Z = [X[i] + Y[i] for i in range(len(X))]
    return time.time() - t1

def numpy_version():
    t1 = time.time()
    X = np.arange(size_of_vec)
    Y = np.arange(size_of_vec)
    Z = X + Y
    return time.time() - t1


t1 = pure_python_version()
t2 = numpy_version()
print("python: " + str(t1), "numpy: "+ str(t2))
print("Numpy is in this example " + str(t1/t2) + " times faster!")

In pairs, run the speed test with a different number, and share your results with the class.