# Agenda
* Numpy
* Pandas
* Lab

# Introduction


## Create a new notebook for your code-along:

From our submission directory, type:
    
    jupyter notebook

From the IPython Dashboard, open a new notebook.
Change the title to: "Numpy and Pandas"

# Introduction to Numpy

More info: [http://wiki.scipy.org/Tentative_NumPy_Tutorial](http://wiki.scipy.org/Tentative_NumPy_Tutorial)

## Numpy Overview

* Why Python for Data? Numpy brings *decades* of C math into Python!
* Numpy provides a wrapper for extensive C/C++/Fortran codebases, used for data analysis functionality
* NDAarray allows easy vectorized math and broadcasting (i.e. functions for vector elements of different shapes)

In [None]:
import numpy as np

### Creating ndarrays

An array object represents a multidimensional, homogeneous array of fixed-size items. 

In [None]:
# Creating arrays
a = np.zeros((3))
b = np.ones((2,3))
c = np.random.randint(1,10,(2,3,4))
d = np.arange(0,11,1)

What are these functions?

    arange?

In [None]:
# Note the way each array is printed:
print a
print b
print c
print d

### Arithmetic in arrays is element wise

In [None]:
a = np.array( [20,30,40,50] )
b = np.arange( 4 )
b

In [None]:
c = a-b
c

In [None]:
b**2

### Indexing, Slicing and Iterating

In [None]:
# one-dimensional arrays work like lists:
a = np.arange(10)**2

In [None]:
a

In [None]:
a[2:5]

In [None]:
# Multidimensional arrays use tuples with commas for indexing
# with (row,column) conventions beginning, as always in Python, from 0

In [None]:
b = np.random.randint(1,100,(4,4))

In [None]:
b

In [None]:
# Guess the output
print(b[2,3])
print(b[0,0])


In [None]:
print b[0:3,1]
print b[:,1:3]

In [None]:
b[1:3,:]

# Introduction to Pandas

## Pandas Overview

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

### Import a dataset

In [None]:
passengers = pd.read_csv("data/air-passenger-arrivals-by-country-of-embarkation.csv")

### View the first five rows

In [None]:
passengers.head()

### Bottom seven rows

In [None]:
passengers.tail(7)

### Length of dataset

In [None]:
len(passengers)

### Count for each variable in the dataset

In [None]:
passengers.count()

### Statistics for the dataset

In [None]:
passengers.describe()

### Datatypes for each column, automatically inferred by Pandas

In [None]:
passengers.dtypes

Value should be numeric, but null values are specified as strings "na" or "-"

### Re-importing the data, specifying null values and column names

In [None]:
passengers = pd.read_csv("data/air-passenger-arrivals-by-country-of-embarkation.csv", na_values=["na", "-"],
                        names=["year-month", "total", "region", "country", "value"], header=0)

### Describe by default shows stats for numeric columns

In [None]:
passengers.describe()

### Explicitly describing all columns

In [None]:
passengers.describe(include='all')

In [None]:
passengers["country"].value_counts()

### Total visitor arrivals by country

In [None]:
passengers.groupby("country")["value"].sum()

### Sort the values by descending

In [None]:
passengers.groupby("country")["value"].sum().sort_values(ascending=False)

### Filter results by year-month = '2017-01'

In [None]:
passengers[passengers["year-month"] == "2017-01"]

### and sort by value

In [None]:
passengers_201701 = passengers[passengers["year-month"] == "2017-01"]

In [None]:
passengers_201701.sort_values(by="value", ascending=False)

### What are the top five highest total arrivals by year-month?

In [None]:
passengers.groupby("year-month").sum().sort_values(by="value", ascending=False).head(5)

### Which countries have the most number of months with arrivals > 100000

In [None]:
passengers[passengers.value > 100000]["country"].value_counts()

### Adding columns for year and month

In [None]:
passengers["year"] = passengers["year-month"].apply(lambda x: int(x.split("-")[0]))

In [None]:
passengers["month"] = passengers["year-month"].apply(lambda x: int(x.split("-")[1]))

In [None]:
passengers.head()

### Average number of passenger arrivals by month

In [None]:
passengers.groupby("month").mean()

### Sort descending by value

In [None]:
passengers.groupby("month")["value"].mean().sort_values(ascending=False)

# Next Steps

**Recommended Resources**

Name | Description
--- | ---
[Official Pandas Tutorials](http://pandas.pydata.org/pandas-docs/stable/10min.html) | Wes & Company's selection of tutorials and lectures
[Julia Evans Pandas Cookbook](https://github.com/jvns/pandas-cookbook) | Great resource with examples from weather, bikes and 311 calls
[Learn Pandas Tutorials](https://bitbucket.org/hrojas/learn-pandas) | A great series of Pandas tutorials from Dave Rojas
[Research Computing Python Data PYNBs](https://github.com/ResearchComputing/Meetup-Fall-2013/tree/master/python) | A super awesome set of python notebooks from a meetup-based course exclusively devoted to pandas