# Intro to Numpy

NumPy, short for Numeric Python, is useful primarily because of it provides a brings the **<u>N-dimensional array</u>** to the Python coding environment. 

This notebook demonstrates the limitations of Python's built-in data types in executing some scientific analyses, and then illustrates the utility of NumPy's ND array object. 

Source: https://campus.datacamp.com/courses/intro-to-python-for-data-science

### Motivation for using NumPy -- It's useful and it's *fast*!
While most of what NumPy can do can be done using native Python objects (e.g. lists, tuples, etc.), NumPy makes certain calcultions not only easier, but much faster. Let's explore an example...

---
First, let's create a dummy datasets of heights and weights of 5 imaginary people. 

In [1]:
#Create a list of heights and weights
height = [1.73, 1.68, 1.17, 1.89, 1.79]
weight = [65.4, 59.2, 63.6, 88.4, 68.7]
print (height)
print (weight)

[1.73, 1.68, 1.17, 1.89, 1.79]
[65.4, 59.2, 63.6, 88.4, 68.7]


If we assume body mass index `(BMI) = weight / height ** 2`, what would it take to compute BMI for our data?

In [2]:
#[Attempt to] compute BMI from lists
bmi = weight/height ** 2

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

The above attempt raises an error because <u>we can't do this with lists</u>.<br>
The only way around this is to iterate through each item in the lists...

In [3]:
#Compute BMI from lists
bmi = []
for idx in range(len(height)):
    bmi.append(weight[idx] / height[idx] ** 2)
print (bmi)

[21.85171572722109, 20.97505668934241, 46.46066184527724, 24.74734749867025, 21.44127836209856]


However with `NumPy`, we have access to more data types, specifically `arrays`, that can speed through this process. 

In [4]:
#Import numpy, often done using the alias 'np'
import numpy as np

In [5]:
#Convert the height and weight lists to arrays
arrHeight = np.array(height)
arrWeight = np.array(weight)

print (arrHeight)
print (arrWeight)

[1.73 1.68 1.17 1.89 1.79]
[65.4 59.2 63.6 88.4 68.7]


NumPy arrays allow us to do computations on entire collections...

In [6]:
#Convert weight from kg to lbs
print(arrWeight * 2.20462)

[144.182148 130.513504 140.213832 194.888408 151.457394]


In [7]:
#Compute BMI from weights and heights
arrBMI = arrWeight / arrHeight ** 2
print (arrBMI)

[21.85171573 20.97505669 46.46066185 24.7473475  21.44127836]


**Take home message**: 
* Both Python *lists/tuples* and Numpy *arrays* store values that can be referenced by their index.
* BUT, operations on NumPy arrays, such as multiplication, can be done on all values at once, i.e. without iteration.
* HOWEVER, NumPy arrays, unlike lists/tuples, can store only numeric data and all data must be of the same time (integer *or* floating point).

---
### NumPy is fast...
Ok, but now look at the speed boost that NumPy gives us. Here we'll use Jupyter's `%timeit%` command, which repeats specified code a number of times so it can report how fast that code runs. We'll use it to compare the same calculation - computing the square of all numbers from 1 to 1000 - done in both native Python and in NumPy...

In [8]:
#Construct a list of values from 0 to 999
L = range(1000)

In [9]:
#How fast to loop through all LIST items and compute its square
%timeit [i**2 for i in L]

372 µs ± 15.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [10]:
#Construct an ARRAY of values from 0 to 999
a = np.arange(1000)

In [11]:
#How fast to loop through all ARRAY items and compute its square
%timeit a**2

2.03 µs ± 62.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


_Note: it may seem that the NumPy version took longer. However, note how many loops were run: most likely the NumPy version was so fast that `%timeit%` ran more iterations. The key metric is the first one where you should see NumPy is orders of magnitude faster!_