## LSESU Applicable Maths Python Lesson 6
###### 29/11/16

Today is all about handling and generating data. We'll be looking at the first principles of 2 different packages you should know about for handling data in Python:
    * NumPy
    * Pandas

** Run the appropriate version of the following commands ASAP to get yourself set up! **

In [100]:
# Run this if you are using a Mac machine or have multiple versions of Python installed
!pip3 install numpy pandas matplotlib pandas_datareader --upgrade

Requirement already up-to-date: numpy in /anaconda/lib/python3.5/site-packages
Requirement already up-to-date: pandas in /anaconda/lib/python3.5/site-packages
Requirement already up-to-date: matplotlib in /anaconda/lib/python3.5/site-packages
Collecting pandas_datareader
  Using cached pandas_datareader-0.2.1-py2.py3-none-any.whl
Requirement already up-to-date: python-dateutil>=2 in /anaconda/lib/python3.5/site-packages (from pandas)
Requirement already up-to-date: pytz>=2011k in /anaconda/lib/python3.5/site-packages (from pandas)
Requirement already up-to-date: cycler in /anaconda/lib/python3.5/site-packages (from matplotlib)
Requirement already up-to-date: pyparsing!=2.0.0,!=2.0.4,!=2.1.2,>=1.5.6 in /anaconda/lib/python3.5/site-packages (from matplotlib)
Collecting requests-file (from pandas_datareader)
  Downloading requests_file-1.4.1-py2.py3-none-any.whl
Collecting requests (from pandas_datareader)
  Downloading requests-2.12.1-py2.py3-none-any.whl (574kB)
[K    100% |███████████

In [99]:
# Run this if you are using a Windows machine
!pip install numpy pandas matplotlib pandas_datareader --upgrade

Collecting numpy
  Downloading numpy-1.11.2-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (3.9MB)
[K    100% |████████████████████████████████| 3.9MB 298kB/s 
[?25hCollecting pandas
  Downloading pandas-0.19.1-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (11.6MB)
[K    100% |████████████████████████████████| 11.6MB 101kB/s 
[?25hCollecting matplotlib
  Downloading matplotlib-1.5.3-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (11.2MB)
[K    97% |███████████████████████████████▎| 11.0MB 18.8MB/s eta 0:00:01^C��██████████████| 11.2MB 3.4MB/s eta 0:00:01

[31mOperation cancelled by user[0m
[?25h

In [102]:
# Everyone run this block
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
from pandas_datareader import data as web

* **Recap from last week**

We looked at the basics of Object Oriented Programming, or OOP, last week. If you couldn't make it, don't be concerned because the content from last week won't affect what we will be looking at today.
```
class Human(object):
    def __init__(self,name,age,height):
        self.name = name
        self.age = age
        self.height = height

    def __lt__(self,other):
        return self.age < other.age
    def __le__(self,other):
        return self.age <= other.age
    def __gt__(self,other):
        return self.age > other.age
    def __ge__(self,other):
        return self.age >= other.age
    def __eq__(self,other):
        return self.age==other.age
    
    def age_in_dog_years(self):
        return 7*self.age
```

## NumPy

NumPy is the standard mathematical and scientific computing package for Python. NumPy is a need to know if you want to write efficient and interpretable code. NumPy includes an optimised array type as well as linear algebra, Fourier Transform and random number capabilities.

Under the hood, much of NumPy is written in C/C++/Fortran which is highly optimised according to your specific computer. Using NumPy gives you features and speed you couldn't achieve with native Python.

[Link to NumPy documentation](http://www.numpy.org/)

### The main NumPy feature - the Array type

The NumPy array is a grid of values all of the same type, this is different to Python lists which can have elements of different types. 

Arrays are indexed similarly to lists, with each dimension being indexed from zero. When declaring the Array object be clear in your mind the dimensions of the array you need.
```
np.ones((3,4),int) 
    -> 3 is the number of ROWS
    -> 4 is the number of COLUMNS
    -> int is the type of the array elements
--
array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]])
       ```

In [20]:
# Creating an array of all zeroes

np.zeros((3,4),int)

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

In [34]:
print('a. np.zeros((2,2))')
a = np.zeros((2,2),int)  
print(a)                 

print('b. np.ones((1,2))')
b = np.ones((1,2))   
print(b)              

print('c. np.full((2,2), 7)')
c = np.full((2,2), 7) # Create a constant array
print(c)              

print('d. np.eye(2)')
d = np.eye(2)        
print(c)             
    
print('e. np.random.random((2,2))')
e = np.random.random((2,2)) 
print(e)                    

a. np.zeros((2,2))
[[0 0]
 [0 0]]
b. np.ones((1,2))
[[ 1.  1.]]
c. np.full((2,2), 7)
[[ 7.  7.]
 [ 7.  7.]]
d. np.eye(2)
[[ 7.  7.]
 [ 7.  7.]]
e. np.random.random((2,2))
[[ 0.4128484   0.09479139]
 [ 0.29987876  0.67592249]]




In general you can follow the format below for declaring most NumPy arrays

```
np.format(shape, fill_value, dtype)
```
Where `shape` is declared as a tuple like `(3,4)` and dtype is the type of the data which is constant across the array but doesn't have to be a number

**Try replacing `int` in the declaration of array `a` with `str`**

In [41]:
# You can also declare NumPy arrays using standard Python
# Lists (and Lists of Lists)

l1 = [1,2,3,4]
l2 = [5,6,7,8]
l3 = [9,10,11,12]
l = [l1,l2,l3]

print(l)

l_array = np.array(l)

#print(l_array)

[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]


### Indexing a NumPy array

In [66]:
# For a list of lists we would use the [][] notation
upper_left_val = l[0][0]
print(upper_left_val)
print()

# We use a single [] with np and seperate dimensions by ,
upper_left_val_np = l_array[0,0]
print(upper_left_val_np)
print()

# You can also slice arrays as so
print(l_array[0:2,1:3])
print()

# And use the shape attribute of the object to understand
# size 
print(l_array.shape)
print()

# Or the dtype attribute to inspect the type of the array
print(l_array.dtype)

1

1

[[2 3]
 [6 7]]

(3, 4)

int64


In [65]:
# Using the arange function you can retrieve linearly
# spaced integers
lin_space_int = np.arange(1,10)
print(lin_space_int)
print()

# Or specify a step to create different spacings
lin_space_new = np.arange(1,10,0.5)
print(lin_space_new)

[1 2 3 4 5 6 7 8 9]

[ 1.   1.5  2.   2.5  3.   3.5  4.   4.5  5.   5.5  6.   6.5  7.   7.5  8.
  8.5  9.   9.5]


### Array mathematics

In [88]:
# Declare two example arrays
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

# By default, operations are element wise in NumPy

# Addition, two options
print(x+y)
#print(np.add(x,y))
print()

# Subtraction
print(x-y)
#print(np.subtract(x,y))
print()

# Product
print(x*y)
#print np.multiply(x,y)
print()

# Division
print(x/y)
#print(np.divide(x,y))
print()

# Square Root
print(np.sqrt(x))
print()

## For Matrix operations, use the set of NumPy functions

# Dot product
print(x.dot(y))
#print(np.dot(x,y))
print()

# You can also sum across dimensions easily
print(np.sum(x)) # For every element
print()

print(np.sum(x,axis=0))
print()

# Or transpose a Matrix
print(x)
print()
print(x.T)

[[  6.   8.]
 [ 10.  12.]]

[[-4. -4.]
 [-4. -4.]]

[[  5.  12.]
 [ 21.  32.]]

[[ 0.2         0.33333333]
 [ 0.42857143  0.5       ]]

[[ 1.          1.41421356]
 [ 1.73205081  2.        ]]

[[ 19.  22.]
 [ 43.  50.]]

10.0

[ 4.  6.]

[[ 1.  2.]
 [ 3.  4.]]

[[ 1.  3.]
 [ 2.  4.]]


## **Challenge**
#### Declare a 5 by 5 array of any numbers you want using one of the above methods we've discussed. Then read [this](https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html) NumPy documentation, when you are ready print the mean, standard deviation, minimum and maximum of your array

In [94]:
# TO DO 
# You can declare an array of random numbers or start with a list of lists
# Check your array is the right size by printing the .shape attribute



# Print the mean


# Print the standard deviation 


# Print the minimum


# Print the maximum



# END TODO

## Pandas

Pandas is a data manipulation package that we've glimpsed before. If you know R, then the Pandas Dataframe type will be very familiar, if not you can think of a Dataframe is a spreadsheet like object which can be manipulated and interfaced with much easier than lists of dictionaries (or dictionaries of lists!).

[Link to Pandas documentation](http://pandas.pydata.org/pandas-docs/stable/index.html)

### The main Pandas feature - the Dataframe type

In [103]:
# Choose a stock
ticker = 'GOOG'

# Choose a start date in US format MM/DD/YYYY
stock_start = '10/2/2014'
# Choose an end date in US format MM/DD/YYYY
stock_end = '10/2/2016'

# Retrieve the Data from Google's Finance Database
stock = web.DataReader(ticker,data_source='google',
                       start = stock_start,end=stock_end)

# Print a table of the Data to see what we have just fetched
stock.tail()

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2016-09-26,782.74,782.74,773.07,774.21,1533206
2016-09-27,775.5,785.99,774.31,783.01,1153247
2016-09-28,777.85,781.81,774.97,781.56,1109834
2016-09-29,781.44,785.8,774.23,775.01,1314746
2016-09-30,776.33,780.94,774.09,777.29,1585333


In [104]:
# Generate the logarithm of the ratio between each days closing price
stock['Log_Ret'] = np.log(stock['Close']/stock['Close'].shift(1))

# Generate the rolling standard deviation across the time series data
stock['Volatility'] = pd.rolling_std(stock['Log_Ret'],window=100)*np.sqrt(100)

	Series.rolling(center=False,window=100).std()


In [106]:
# Create a plot of changing Closing Price and Volatility
stock[['Close','Volatility']].plot(subplots=True,color='b',figsize=(8,6))

AttributeError: module 'matplotlib.cbook' has no attribute 'normalize_kwargs'