# Linear Algebra

#### Importing Packages
Python has an extensive collection of third-party libraries, or ***packages***, with additional functions, data-structures, etc.  Many (most?) packages of interest are hosted on the Python Package Index ***pypi***, and can be installed into your environment using ***pip***.  The Anaconda distribution in the pre-work includes a number of these that are useful in data science, so you should have most of them installed already.  

ref:  https://pypi.python.org/pypi

A note on namespaces when importing - there are a few different ways to import:

    1. import numpy

    - all submodules of numpy are accessible in the numpy namespace
    - e.g. numpy.array([1,2,3])


    2. import numpy as np

    - same as 1 except an alias 'np' is created for the namespace instead
    - e.g. np.array([1,2,3])


    3. from numpy import *

    - adds all submodules to global namespace
    - e.g. array([1,2,3])
    - Note: This can be dangerous because if different modules have submodules with the same name than whatever is imported last will overrite what came before it - i.e. naming collision -> overwriting!.
    
    4. from numpy import array
    
    - will import only the indicated submodules into the global namespace
    - e.g. array([1,2,3])
    - Note: can be ok since you are being explicit
    
    We will generally use 2 and 4 (sparingly)

In [None]:
## Dylan: Add here a couple of sentences on what numpy is and why we use it

In [None]:
import numpy

import numpy as np

from numpy import absolute

# The next one is dangerous to do, and not recommended 
# except in cases where you know why you're using it
# from numpy import *

Now we can do the same thing three ways

In [None]:
print numpy.absolute(-10)
print np.absolute(-10)
print absolute(-10)

Where can we see what is inside a directory? => This has nothing to do here. Why you add it here? Stick to Numpy or it breaks the flow....

dir() is an automatically loaded function

In [None]:
# We get the same results with np. then press tab (Jupyter shortcuts!)
dir(np)

In [None]:
# We get the same results with np.absolute() then press shift + tab (Jupyter shortcuts!)
help(np.absolute)

In [None]:
vector = np.array([1, 2, 1])

In [None]:
data = np.array([[1, 2, 3],[2, 4, 9]])
data

In [None]:
data[0]  # first row

In [None]:
data[ : , 1]  # all rows, second column

Numpy lets us perform matrix operations

In [None]:
np.dot(data, vector)

We can transpose an array

In [None]:
print data
print data.T

### Dylan => need to show use of these other functions from numpy:
- np.linalg.inv
- np.random.normal
- np.random.rand
- np.random.shuffle
- mean
- var
- shape
- reshape
- arange
- astype
- hstack
- vstack

have a look here:
https://github.com/ga-students/DAT_SF_16/blob/master/labs/lab_03_instructor.ipynb

### Pandas (python package)

Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive.

Pandas is great for tabular/indexed data

In [None]:
# NOTE: you should normally put all your imports at the top of the file
import pandas as pd

In [None]:
data = pd.read_csv('../data/nytimes.csv')

In [None]:
# Note here we're calling the head method on the dataframe to return the 'head' of the 
# dataframe, in this case the first 4 lines
# head() actually creates a new copy of the data, this is important later in the course!
data.head(4)

In [None]:
data.info()

In [None]:
data.describe()

In [None]:
# Function that groups users by age.
def map_age_category(x):
    if x < 18:
        return '1'
    elif x < 25:
        return '2'
    elif x < 32:
        return '3'
    elif x < 45:
        return '4'
    else:
        return '5'

data['age_categories'] = data['Age'].apply(map_age_category)

In [None]:
data.head()

# add here more pandas stuff
- show them that they can do the same things they were doing in numpy (mean, var) in pandas
- show them how to use the .values attribute
- show them indexing
- Basically go through most of the examples covered here:
http://pandas.pydata.org/pandas-docs/stable/10min.html

- give them an example, then ask them to do something similar
- At least cover these:
- merge, join, concat, append
- .ix, .loc, .iloc
- stack

Add exercises for them to do

## Plotting!

In [None]:
import pandas.io.data
import datetime
import matplotlib.pyplot as plt

%matplotlib inline

mu, sigma = 0, 0.1
normal_dist = np.random.normal(mu, sigma, 1000)
aapl = pd.io.data.get_data_yahoo('FB', 
                                 start=datetime.datetime(2015, 4, 1), 
                                 end=datetime.datetime(2015, 4, 28))
aapl.head()

## MatPlotLib

MatPlotLib is a standard, granular method for building visualizations. Although tried and true, it can be cumbersome compared to other higher level packages such as Seaborn or Bokeh. Note most visualization packages use matplotlib as their base.

In [None]:
fig = plt.figure(figsize=(20,16))

ax = fig.add_subplot(2,2,1)
ax.plot(aapl.index, aapl['Close'])
ax.set_title('Line plots', size=24)

ax = fig.add_subplot(2,2,2)
ax.plot(aapl['Close'], 'o')
ax.set_title('Scatter plots', size=24)

ax = fig.add_subplot(2,2,3)
ax.hist(normal_dist, bins=50)
ax.set_title('Histograms', size=24)
ax.set_xlabel('count', size=16)

ax = fig.add_subplot(2,2,4)
ax.boxplot(normal_dist)
ax.set_title('Boxplots', size=24)

### Bokeh
To install Bokeh, go to a terminal and type:

`conda install bokeh` 

Bokeh is built by the same people that created Anaconda (Continuum Analytics) and is designed out of the box for web display, making it nice for creating presentation ready, interactive visuals quickly. Labs in this course will be shown in Bokeh. Checkout http://bokeh.pydata.org/en/latest/docs/quickstart.html#concepts to see some of the range of capabilities.

In [None]:
from bokeh.plotting import figure, output_notebook,show
output_notebook()

In [None]:
# prepare some data
x = aapl.Low
y = aapl['High']

# create a new plot with a title and axis labels
p = figure(title="Stock High vs. Low", x_axis_label='Low', y_axis_label='High')

# add a line renderer with legend and line thickness
p.circle(x, y, legend="High vs. Low", line_width=2)

# show the results
show(p)

In [None]:
x = aapl.index
y = aapl.Close
p = figure(title="Stock Open & Close over time", x_axis_label='Date', y_axis_label='High',x_axis_type="datetime")
# Note that I've declared the x_axis_type
p.square(x, y, legend="Close")
p.circle(x,aapl.Open,legend='Open',color='red')
# show the results
show(p)

## Pandas Plotting!

The plot method is a great, quick way to visualize your dataframes. By selecting the columns you care to view, calling .plot() on the dataframe defaults to a line chart vs. the index.

We will be revisiting this so just take a second to appreciate what can be done with one line of code.

In [None]:
aapl[['Open','Close']].plot()

In [None]:
aapl[['High','Low','Open','Close']].plot(kind='box')