# Programming for Data Science (Python)

In [None]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

<p style="font-family: Arial; font-size:3.75em;color:purple; font-style:bold"><br>
Introduction to numpy:
</p><br>

<p style="font-family: Arial; font-size:1.25em;color:#2462C0; font-style:bold"><br>
Package for scientific computing with Python
</p><br>

Numerical Python, or "Numpy" for short, is a foundational package on which many of the most common data science packages are built.  Numpy provides us with high performance multi-dimensional arrays which we can use as vectors or matrices.  

The key features of numpy are:

- ndarrays: n-dimensional arrays of the same data type which are fast and space-efficient.  There are a number of built-in methods for ndarrays which allow for rapid processing of data without using loops (e.g., compute the mean).
- Vectorization: enables numeric operations on ndarrays.
- Broadcasting: a useful tool which defines implicit behavior between multi-dimensional arrays of different sizes.
- Input/Output: simplifies reading and writing of data from/to file.

<b>Additional Recommended Resources:</b><br>
<a href="https://docs.scipy.org/doc/numpy/reference/">Numpy Documentation</a><br>
<i>Python for Data Analysis</i> by Wes McKinney<br>
<i>Python Data science Handbook</i> by Jake VanderPlas


<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>

Getting started with ndarray<br><br></p>

**ndarrays** are time and space-efficient multidimensional arrays at the core of numpy.  Like the data structures in Week 2, let's get started by creating ndarrays using the numpy package.

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

How to create Rank 1 numpy arrays:
</p>

In [None]:
import numpy as np



In [None]:
# test the shape of the array we just created, it should have just one dimension (Rank 1)

# because this is a 1-rank array, we need only one index to accesss each element
 
# ndarrays are mutable, here we change an element of the array


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

How to create a Rank 2 numpy array:</p>

A rank 2 **ndarray** is one with two dimensions.  Notice the format below of [ [row] , [row] ].  2 dimensional arrays are great for representing matrices which are often useful in data science.

In [None]:
# Create a rank 2 array


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

There are many way to create numpy arrays:
</p>

Here we create a number of different size arrays with different shapes and different pre-filled values.  numpy has a number of built in methods which help us quickly and easily create multidimensional arrays.

In [None]:
# create a 2x2 array of zeros

# create a 2x2 array filled with 9.0

# create an array of ones


In [None]:
# notice that the above ndarray (ex3) is actually rank 2, it is a 2x1 array


# which means we need to use two indexes to access an element


In [None]:
# create an array of random floats between 0 and 1


<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>

Datatypes
<br><br></p>

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Datatypes:
</p>

In [None]:
# Python assigns the  data type


In [None]:
# Python assigns the  data type


In [None]:
#You can also tell Python the  data type


In [None]:
# you can use this to force floats into integers (using floor function)


In [None]:
# you can use this to force integers into floats if you anticipate
# the values may change to floats later


<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>

Array Indexing
<br><br></p>

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Slice indexing:
</p>

Similar to the use of slice indexing with lists and strings, we can use slice indexing to pull out sub-regions of ndarrays.

In [None]:
# Rank 2 array of shape (3, 4)

#Use array slicing to get a subarray consisting of 2 rows x 2 columns.

#When you modify a slice, you actually modify the underlying array.

#To avoid that, you need to explicitly use the np.array()function.


In [None]:
# You may generate an array of lower rank
  

# Or an array of the same rank as the an_array


#We can do the same thing for columns of an array:



<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Fancy indexing: array of indices
</p>

Sometimes it's useful to use an array of indexes to access or change elements.

In [None]:
# Create a new array



In [None]:
# Create an array of indices


In [None]:
# Examine the pairings of row_indices and col_indices.  These are the elements we'll change next.


In [None]:
# Select one element from each row


In [None]:
# Change one element from each row using the indices selected


<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>
Boolean Indexing

<br><br></p>
<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
</p>

In [None]:
# create a 3x2 array


In [None]:
# create a filter which will be boolean values for whether each element meets this condition
filter = (an_array > 15)
filter

Notice that the filter is a same size ndarray as an_array which is filled with True for each element whose corresponding element in an_array which is greater than 15 and False for those elements whose value is less than 15.

In [None]:
# we can now select just those elements which meet that criteria


In [None]:
# For short, we could have just used the approach below without the need for the separate filter array.


What is particularly useful is that we can actually change elements in the array applying a similar logical filter.  Let's add 100 to all the even values.

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Arithmetic Array Operations:

</p>

In [None]:
x = np.array([[111,112],[121,122]], dtype=np.int)
y = np.array([[211.1,212.1],[221.1,222.1]], dtype=np.float64)


In [None]:
#plus

In [None]:
# subtract


In [None]:
# multiply


In [None]:
# divide


In [None]:
# square root


In [None]:
# exponent (e ** x)


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Let's explore the efficiency of universal functions

</p>

In [None]:
# Using loop to compute the reciprocal of each element of an array
np.random.seed(0)


<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>

Aggregation functions <br><br>
</p>

In [None]:
# setup a random 2 x 4 matrix


In [None]:
# compute the mean for all elements


In [None]:
# compute the means by row


In [None]:
# compute the means by column


In [None]:
# sum all the elements


In [None]:
# compute the medians


In [None]:
#sorting
# create a 10 element array of randoms


In [None]:
#Find unique elements


<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>

Broadcasting:
<br><br>
</p>

Introduction to broadcasting. <br>
For more details, please see: <br>
https://docs.scipy.org/doc/numpy-1.10.1/user/basics.broadcasting.html

In [None]:
#Create a 4X3 array


In [None]:
# create a rank 1 ndarray with 3 values


In [None]:
#Add together


In [None]:
# create an ndarray which is 4 x 1 to broadcast across columns


In [None]:
# add to each column of 'start' using broadcasting


In [None]:
# this will just broadcast in both dimensions


In [None]:
# create our 3x4 matrix


In [None]:
# create our 4x1 array


In [None]:
# add the two together using broadcasting


In [None]:
#Application of broadcasting - centering an array
