# Agenda
* Numpy
* Pandas
* Lab


# Introduction


## Create a new notebook for your code-along:

From our submission directory, type:
    
    jupyter notebook

From the IPython Dashboard, open a new notebook.
Change the title to: "Numpy and Pandas"

# Introduction to Numpy

* Overview
* ndarray
* Indexing and Slicing

More info: [http://wiki.scipy.org/Tentative_NumPy_Tutorial](http://wiki.scipy.org/Tentative_NumPy_Tutorial)


## Numpy Overview

* Why Python for Data? Numpy brings *decades* of C math into Python!
* Numpy provides a wrapper for extensive C/C++/Fortran codebases, used for data analysis functionality
* NDAarray allows easy vectorized math and broadcasting (i.e. functions for vector elements of different shapes)

In [1]:
%matplotlib inline
import numpy as np

### Creating ndarrays

An array object represents a multidimensional, homogeneous array of fixed-size items. 

In [2]:
# Creating arrays
a = np.zeros((3))
b = np.ones((2,3))
c = np.random.randint(1,10,(2,3,4))
d = np.arange(0,11,1)

What are these functions?

    arange?

In [6]:
# Note the way each array is printed:
a,b,c,d

(array([ 0.,  0.,  0.]), array([[ 1.,  1.,  1.],
        [ 1.,  1.,  1.]]), array([[[4, 8, 5, 2],
         [4, 2, 3, 4],
         [2, 8, 7, 5]],
 
        [[8, 9, 3, 9],
         [7, 1, 6, 8],
         [8, 5, 8, 8]]]), array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10]))

In [None]:
## Arithmetic in arrays is element wise

In [11]:
a = np.array( [20,30,40,50] )
b = np.arange( 4 )
b
print a, b

[20 30 40 50] [0 1 2 3]


In [12]:
c = a-b
c

array([20, 29, 38, 47])

In [13]:
b**2

array([0, 1, 4, 9])

## Indexing, Slicing and Iterating

In [14]:
# one-dimensional arrays work like lists:
a = np.arange(10)**2

In [15]:
a

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

In [16]:
a[2:5]

array([ 4,  9, 16])

In [17]:
# Multidimensional arrays use tuples with commas for indexing
# with (row,column) conventions beginning, as always in Python, from 0

In [20]:
b = np.random.randint(1,100,(4,4))

In [21]:
b

array([[99, 37, 51,  6],
       [62, 41, 13,  1],
       [14, 27, 87, 69],
       [29, 48,  4, 58]])

In [22]:
# Guess the output
print(b[2,3])
print(b[0,0])


69
99


In [24]:
b[0:3,1],b[:,1]
#Note that for range, the last value is n-1. But for a single number, last value is n starting from 0

(array([37, 41, 27]), array([37, 41, 27, 48]))

In [25]:
b[1:3,:]

array([[62, 41, 13,  1],
       [14, 27, 87, 69]])

# Introduction to Pandas

* Object Creation
* Viewing data
* Selection
* Missing data
* Grouping
* Reshaping
* Time series
* Plotting
* i/o
 

_pandas.pydata.org_

## Pandas Overview

_Source: [pandas.pydata.org](http://pandas.pydata.org/pandas-docs/stable/10min.html)_

In [26]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [27]:
dates = pd.date_range('20140101',periods=6)
dates

DatetimeIndex(['2014-01-01', '2014-01-02', '2014-01-03', '2014-01-04',
               '2014-01-05', '2014-01-06'],
              dtype='datetime64[ns]', freq='D')

In [31]:
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
z = pd.DataFrame(index = df.index, columns = df.columns)
df.columns
print z 
print df

              A    B    C    D
2014-01-01  NaN  NaN  NaN  NaN
2014-01-02  NaN  NaN  NaN  NaN
2014-01-03  NaN  NaN  NaN  NaN
2014-01-04  NaN  NaN  NaN  NaN
2014-01-05  NaN  NaN  NaN  NaN
2014-01-06  NaN  NaN  NaN  NaN
                   A         B         C         D
2014-01-01  0.579888 -0.221338  1.006163  0.965058
2014-01-02 -0.706283  0.652975  0.264237 -0.488835
2014-01-03 -1.082676 -1.939761  1.036881 -1.291938
2014-01-04  0.841728 -0.780663  0.327462  0.712852
2014-01-05  1.290401 -0.617174  0.494087 -1.540861
2014-01-06  0.183026 -0.858343  1.541798  1.509804


In [33]:
# Index, columns, underlying numpy data
df.T
df.T

Unnamed: 0,2014-01-01 00:00:00,2014-01-02 00:00:00,2014-01-03 00:00:00,2014-01-04 00:00:00,2014-01-05 00:00:00,2014-01-06 00:00:00
A,0.579888,-0.706283,-1.082676,0.841728,1.290401,0.183026
B,-0.221338,0.652975,-1.939761,-0.780663,-0.617174,-0.858343
C,1.006163,0.264237,1.036881,0.327462,0.494087,1.541798
D,0.965058,-0.488835,-1.291938,0.712852,-1.540861,1.509804


In [36]:
df2 = pd.DataFrame({ 'A' : 1.,
                         'B' : pd.Timestamp('20130102'),
                         'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
                         'D' : np.array([3] * 4,dtype='int32'),
                         'E' : 'foo' })
    

df2
df3 = pd.DataFrame({ 'A' : 1.,
                         'B' : pd.Timestamp('20130102'),
                         'D' : np.array([3] * 4,dtype='int32'),
                         'E' : 'foo' })
    

print df2
print df3

     A          B    C  D    E
0  1.0 2013-01-02  1.0  3  foo
1  1.0 2013-01-02  1.0  3  foo
2  1.0 2013-01-02  1.0  3  foo
3  1.0 2013-01-02  1.0  3  foo
     A          B  D    E
0  1.0 2013-01-02  3  foo
1  1.0 2013-01-02  3  foo
2  1.0 2013-01-02  3  foo
3  1.0 2013-01-02  3  foo


In [37]:
# With specific dtypes
df2.dtypes

A           float64
B    datetime64[ns]
C           float32
D             int32
E            object
dtype: object

#### Viewing Data

In [38]:
df.head()

Unnamed: 0,A,B,C,D
2014-01-01,0.579888,-0.221338,1.006163,0.965058
2014-01-02,-0.706283,0.652975,0.264237,-0.488835
2014-01-03,-1.082676,-1.939761,1.036881,-1.291938
2014-01-04,0.841728,-0.780663,0.327462,0.712852
2014-01-05,1.290401,-0.617174,0.494087,-1.540861


In [39]:
df.tail()

Unnamed: 0,A,B,C,D
2014-01-02,-0.706283,0.652975,0.264237,-0.488835
2014-01-03,-1.082676,-1.939761,1.036881,-1.291938
2014-01-04,0.841728,-0.780663,0.327462,0.712852
2014-01-05,1.290401,-0.617174,0.494087,-1.540861
2014-01-06,0.183026,-0.858343,1.541798,1.509804


In [40]:
df.index

DatetimeIndex(['2014-01-01', '2014-01-02', '2014-01-03', '2014-01-04',
               '2014-01-05', '2014-01-06'],
              dtype='datetime64[ns]', freq='D')

In [41]:
df.describe()

Unnamed: 0,A,B,C,D
count,6.0,6.0,6.0,6.0
mean,0.184347,-0.627384,0.778438,-0.02232
std,0.917663,0.848936,0.500001,1.264779
min,-1.082676,-1.939761,0.264237,-1.540861
25%,-0.483955,-0.838923,0.369118,-1.091162
50%,0.381457,-0.698919,0.750125,0.112009
75%,0.776268,-0.320297,1.029202,0.902007
max,1.290401,0.652975,1.541798,1.509804


In [46]:
print df
print df.sort_values(by='B')
df2 = df.sort_values(by='B')
print df2
# Note the df.sort_values(by='B") sorts by column B but creates a new dataframe. The original df is unaffected

                   A         B         C         D
2014-01-01  0.579888 -0.221338  1.006163  0.965058
2014-01-02 -0.706283  0.652975  0.264237 -0.488835
2014-01-03 -1.082676 -1.939761  1.036881 -1.291938
2014-01-04  0.841728 -0.780663  0.327462  0.712852
2014-01-05  1.290401 -0.617174  0.494087 -1.540861
2014-01-06  0.183026 -0.858343  1.541798  1.509804
                   A         B         C         D
2014-01-03 -1.082676 -1.939761  1.036881 -1.291938
2014-01-06  0.183026 -0.858343  1.541798  1.509804
2014-01-04  0.841728 -0.780663  0.327462  0.712852
2014-01-05  1.290401 -0.617174  0.494087 -1.540861
2014-01-01  0.579888 -0.221338  1.006163  0.965058
2014-01-02 -0.706283  0.652975  0.264237 -0.488835
                   A         B         C         D
2014-01-03 -1.082676 -1.939761  1.036881 -1.291938
2014-01-06  0.183026 -0.858343  1.541798  1.509804
2014-01-04  0.841728 -0.780663  0.327462  0.712852
2014-01-05  1.290401 -0.617174  0.494087 -1.540861
2014-01-01  0.579888 -0.221338 

### Selection

In [47]:
df[['A','B']]

Unnamed: 0,A,B
2014-01-01,0.579888,-0.221338
2014-01-02,-0.706283,0.652975
2014-01-03,-1.082676,-1.939761
2014-01-04,0.841728,-0.780663
2014-01-05,1.290401,-0.617174
2014-01-06,0.183026,-0.858343


In [48]:
df[0:3]

Unnamed: 0,A,B,C,D
2014-01-01,0.579888,-0.221338,1.006163,0.965058
2014-01-02,-0.706283,0.652975,0.264237,-0.488835
2014-01-03,-1.082676,-1.939761,1.036881,-1.291938


In [49]:
# By label
df.loc[dates[0]]

A    0.579888
B   -0.221338
C    1.006163
D    0.965058
Name: 2014-01-01 00:00:00, dtype: float64

In [50]:
# multi-axis by label
df.loc[:,['A','B']]

Unnamed: 0,A,B
2014-01-01,0.579888,-0.221338
2014-01-02,-0.706283,0.652975
2014-01-03,-1.082676,-1.939761
2014-01-04,0.841728,-0.780663
2014-01-05,1.290401,-0.617174
2014-01-06,0.183026,-0.858343


In [52]:
# Date Range
df.loc['20140102':'20140104',['B']]
# Note here the second date is included -vc

Unnamed: 0,B
2014-01-02,0.652975
2014-01-03,-1.939761
2014-01-04,-0.780663


In [53]:
# Fast access to scalar
df.at[dates[1],'B']

0.65297482836608778

In [54]:
# iloc provides integer locations similar to np style
df.iloc[3:]

Unnamed: 0,A,B,C,D
2014-01-04,0.841728,-0.780663,0.327462,0.712852
2014-01-05,1.290401,-0.617174,0.494087,-1.540861
2014-01-06,0.183026,-0.858343,1.541798,1.509804


### Boolean Indexing

In [56]:
df[df.A < 0] # Basically a 'where' operation

Unnamed: 0,A,B,C,D
2014-01-02,-0.706283,0.652975,0.264237,-0.488835
2014-01-03,-1.082676,-1.939761,1.036881,-1.291938


### Setting

In [57]:
df_posA = df.copy() # Without "copy" it would act on the dataset

df_posA[df_posA.A < 0] = -1*df_posA
# This flips the signs of all elements of rows where values of A < 0. Other rows are not touched.

In [58]:
df_posA

Unnamed: 0,A,B,C,D
2014-01-01,0.579888,-0.221338,1.006163,0.965058
2014-01-02,0.706283,-0.652975,-0.264237,0.488835
2014-01-03,1.082676,1.939761,-1.036881,1.291938
2014-01-04,0.841728,-0.780663,0.327462,0.712852
2014-01-05,1.290401,-0.617174,0.494087,-1.540861
2014-01-06,0.183026,-0.858343,1.541798,1.509804


In [59]:
#Setting new column aligns data by index
s1 = pd.Series([1,2,3,4,5,6],index=pd.date_range('20140102',periods=6))

In [60]:
s1

2014-01-02    1
2014-01-03    2
2014-01-04    3
2014-01-05    4
2014-01-06    5
2014-01-07    6
Freq: D, dtype: int64

In [61]:
df['F'] = s1

In [62]:
df

Unnamed: 0,A,B,C,D,F
2014-01-01,0.579888,-0.221338,1.006163,0.965058,
2014-01-02,-0.706283,0.652975,0.264237,-0.488835,1.0
2014-01-03,-1.082676,-1.939761,1.036881,-1.291938,2.0
2014-01-04,0.841728,-0.780663,0.327462,0.712852,3.0
2014-01-05,1.290401,-0.617174,0.494087,-1.540861,4.0
2014-01-06,0.183026,-0.858343,1.541798,1.509804,5.0


### Missing Data

In [65]:
# Add a column with missing data
df1 = df.reindex(index=dates[0:4],columns=list(df.columns) + ['E'])
df1
# Dates go from location 0 to 3, note the use of list() on df.columns to add addl. col. name 'E'

Unnamed: 0,A,B,C,D,F,E
2014-01-01,0.579888,-0.221338,1.006163,0.965058,,
2014-01-02,-0.706283,0.652975,0.264237,-0.488835,1.0,
2014-01-03,-1.082676,-1.939761,1.036881,-1.291938,2.0,
2014-01-04,0.841728,-0.780663,0.327462,0.712852,3.0,


In [66]:
df1.loc[dates[0]:dates[1],'E'] = 1

In [68]:
df1
# Not clear why dates[1] as endpoint is not excluded in this statement ***

Unnamed: 0,A,B,C,D,F,E
2014-01-01,0.579888,-0.221338,1.006163,0.965058,,1.0
2014-01-02,-0.706283,0.652975,0.264237,-0.488835,1.0,1.0
2014-01-03,-1.082676,-1.939761,1.036881,-1.291938,2.0,
2014-01-04,0.841728,-0.780663,0.327462,0.712852,3.0,


In [69]:
# find where values are null
pd.isnull(df1)
# Creates a new dataframe with True/False values

Unnamed: 0,A,B,C,D,F,E
2014-01-01,False,False,False,False,True,False
2014-01-02,False,False,False,False,False,False
2014-01-03,False,False,False,False,False,True
2014-01-04,False,False,False,False,False,True


### Operations

In [70]:
df.describe()

Unnamed: 0,A,B,C,D,F
count,6.0,6.0,6.0,6.0,5.0
mean,0.184347,-0.627384,0.778438,-0.02232,3.0
std,0.917663,0.848936,0.500001,1.264779,1.581139
min,-1.082676,-1.939761,0.264237,-1.540861,1.0
25%,-0.483955,-0.838923,0.369118,-1.091162,2.0
50%,0.381457,-0.698919,0.750125,0.112009,3.0
75%,0.776268,-0.320297,1.029202,0.902007,4.0
max,1.290401,0.652975,1.541798,1.509804,5.0


In [71]:
df.mean(),df.mean(1) # Operation on two different axes

(A    0.184347
 B   -0.627384
 C    0.778438
 D   -0.022320
 F    3.000000
 dtype: float64, 2014-01-01    0.582443
 2014-01-02    0.144419
 2014-01-03   -0.255499
 2014-01-04    0.820276
 2014-01-05    0.725291
 2014-01-06    1.475257
 Freq: D, dtype: float64)

### Applying functions

In [72]:
df

Unnamed: 0,A,B,C,D,F
2014-01-01,0.579888,-0.221338,1.006163,0.965058,
2014-01-02,-0.706283,0.652975,0.264237,-0.488835,1.0
2014-01-03,-1.082676,-1.939761,1.036881,-1.291938,2.0
2014-01-04,0.841728,-0.780663,0.327462,0.712852,3.0
2014-01-05,1.290401,-0.617174,0.494087,-1.540861,4.0
2014-01-06,0.183026,-0.858343,1.541798,1.509804,5.0


In [73]:
df.apply(np.cumsum)

Unnamed: 0,A,B,C,D,F
2014-01-01,0.579888,-0.221338,1.006163,0.965058,
2014-01-02,-0.126394,0.431637,1.2704,0.476222,1.0
2014-01-03,-1.20907,-1.508124,2.307282,-0.815715,3.0
2014-01-04,-0.367342,-2.288787,2.634744,-0.102863,6.0
2014-01-05,0.923058,-2.905961,3.128831,-1.643724,10.0
2014-01-06,1.106084,-3.764304,4.670629,-0.133921,15.0


In [76]:
df.apply(lambda x: x.max() - x.min())
# Above does not alter df but creates a new series with the difference of max-min by COLUMN!

A    2.373077
B    2.592735
C    1.277561
D    3.050665
F    4.000000
dtype: float64

In [79]:
# Built in string methods
s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])
print s
s.str.lower()

0       A
1       B
2       C
3    Aaba
4    Baca
5     NaN
6    CABA
7     dog
8     cat
dtype: object


0       a
1       b
2       c
3    aaba
4    baca
5     NaN
6    caba
7     dog
8     cat
dtype: object

### Merge

In [80]:
np.random.randn(10,4)

array([[ 0.39462207, -0.35701611, -1.29920775,  0.59460181],
       [ 1.05647765,  0.83904796,  1.01490374,  0.39531841],
       [ 0.81759941, -0.63986786, -1.4096965 , -0.67753529],
       [ 0.64417677,  0.42545764,  0.42673915,  1.26885913],
       [ 0.01719219, -0.76216963,  1.00313238,  1.47111861],
       [ 0.34304415,  1.02793888, -0.42312357, -1.01902576],
       [ 0.31148253,  1.66334743, -0.09581272,  0.29241327],
       [ 0.08148912,  1.06697393,  0.43033573,  1.14205797],
       [ 1.54486785,  0.36092142, -1.92131053,  0.53001535],
       [ 1.65267611,  0.84374708, -0.91686157,  0.2799484 ]])

In [81]:
#Concatenating pandas objects together
df = pd.DataFrame(np.random.randn(10,4))
df

Unnamed: 0,0,1,2,3
0,1.664186,0.451233,-0.333167,-0.132597
1,0.606885,0.247958,1.047617,0.488071
2,-2.964315,1.046227,0.352172,-1.485879
3,0.799703,-0.518114,-1.059849,0.591196
4,1.09054,-0.776548,1.727447,0.097877
5,-0.610227,0.691842,-0.102435,1.364433
6,1.293393,-0.737383,0.706023,1.400901
7,-0.096599,-0.611628,0.341076,1.634748
8,0.639985,-2.100968,0.870013,0.823543
9,-0.184826,1.812616,1.129015,-1.628226


In [83]:
# Break it into pieces
pieces = [df[:3], df[3:7],df[7:]]
pieces
# pieces is a list of dataframes

[          0         1         2         3
 0  1.664186  0.451233 -0.333167 -0.132597
 1  0.606885  0.247958  1.047617  0.488071
 2 -2.964315  1.046227  0.352172 -1.485879,
           0         1         2         3
 3  0.799703 -0.518114 -1.059849  0.591196
 4  1.090540 -0.776548  1.727447  0.097877
 5 -0.610227  0.691842 -0.102435  1.364433
 6  1.293393 -0.737383  0.706023  1.400901,
           0         1         2         3
 7 -0.096599 -0.611628  0.341076  1.634748
 8  0.639985 -2.100968  0.870013  0.823543
 9 -0.184826  1.812616  1.129015 -1.628226]

In [91]:
pd.concat(pieces)
# pd.concat() concatenates by row, i.e. add verically
# The argument is a list of dfs here. Could be a series object

Unnamed: 0,0,1,2,3
0,1.664186,0.451233,-0.333167,-0.132597
1,0.606885,0.247958,1.047617,0.488071
2,-2.964315,1.046227,0.352172,-1.485879
3,0.799703,-0.518114,-1.059849,0.591196
4,1.09054,-0.776548,1.727447,0.097877
5,-0.610227,0.691842,-0.102435,1.364433
6,1.293393,-0.737383,0.706023,1.400901
7,-0.096599,-0.611628,0.341076,1.634748
8,0.639985,-2.100968,0.870013,0.823543
9,-0.184826,1.812616,1.129015,-1.628226


In [93]:
# Also can "Join" and "Append"
df
# Join and Append add a second df (or series) to one being considered

Unnamed: 0,0,1,2,3
0,1.664186,0.451233,-0.333167,-0.132597
1,0.606885,0.247958,1.047617,0.488071
2,-2.964315,1.046227,0.352172,-1.485879
3,0.799703,-0.518114,-1.059849,0.591196
4,1.09054,-0.776548,1.727447,0.097877
5,-0.610227,0.691842,-0.102435,1.364433
6,1.293393,-0.737383,0.706023,1.400901
7,-0.096599,-0.611628,0.341076,1.634748
8,0.639985,-2.100968,0.870013,0.823543
9,-0.184826,1.812616,1.129015,-1.628226


### Grouping


In [94]:
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                       'foo', 'bar', 'foo', 'foo'],
                       'B' : ['one', 'one', 'two', 'three',
                             'two', 'two', 'one', 'three'],
                       'C' : np.random.randn(8),
                       'D' : np.random.randn(8)})

In [95]:
df

Unnamed: 0,A,B,C,D
0,foo,one,-0.156692,-1.283333
1,bar,one,0.060946,-0.864953
2,foo,two,-1.820843,-0.508642
3,bar,three,-0.689419,0.803342
4,foo,two,0.455165,-1.28795
5,bar,two,1.42897,-0.926617
6,foo,one,-0.521938,0.180666
7,foo,three,2.031093,-0.795033


In [96]:
df.groupby(['A','B']).sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,C,D
A,B,Unnamed: 2_level_1,Unnamed: 3_level_1
bar,one,0.060946,-0.864953
bar,three,-0.689419,0.803342
bar,two,1.42897,-0.926617
foo,one,-0.67863,-1.102667
foo,three,2.031093,-0.795033
foo,two,-1.365678,-1.796592


### Reshaping

In [97]:
# You can also stack or unstack levels

In [98]:
a = df.groupby(['A','B']).sum()

In [99]:
# Pivot Tables
pd.pivot_table(df,values=['C','D'],index=['A'],columns=['B'])

Unnamed: 0_level_0,C,C,C,D,D,D
B,one,three,two,one,three,two
A,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
bar,0.060946,-0.689419,1.42897,-0.864953,0.803342,-0.926617
foo,-0.339315,2.031093,-0.682839,-0.551334,-0.795033,-0.898296


### Time Series


In [100]:
import pandas as pd
import numpy as np

In [105]:
# 100 Seconds starting on January 1st
rng = pd.date_range('1/1/2014', periods=100, freq='S')
#rng

In [102]:
# Give each second a random value
ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)

In [103]:
ts

2014-01-01 00:00:00    445
2014-01-01 00:00:01    371
2014-01-01 00:00:02    382
2014-01-01 00:00:03    432
2014-01-01 00:00:04    339
2014-01-01 00:00:05    495
2014-01-01 00:00:06    238
2014-01-01 00:00:07    300
2014-01-01 00:00:08    336
2014-01-01 00:00:09    194
2014-01-01 00:00:10    189
2014-01-01 00:00:11    155
2014-01-01 00:00:12    255
2014-01-01 00:00:13    209
2014-01-01 00:00:14    337
2014-01-01 00:00:15     29
2014-01-01 00:00:16    461
2014-01-01 00:00:17    197
2014-01-01 00:00:18    480
2014-01-01 00:00:19    206
2014-01-01 00:00:20    485
2014-01-01 00:00:21    417
2014-01-01 00:00:22    154
2014-01-01 00:00:23    258
2014-01-01 00:00:24    237
2014-01-01 00:00:25    157
2014-01-01 00:00:26     61
2014-01-01 00:00:27    386
2014-01-01 00:00:28    256
2014-01-01 00:00:29    433
                      ... 
2014-01-01 00:01:10    116
2014-01-01 00:01:11    114
2014-01-01 00:01:12    188
2014-01-01 00:01:13    422
2014-01-01 00:01:14    265
2014-01-01 00:01:15    287
2

In [106]:
# Built in resampling
ts.resample('1Min').mean() # Resample secondly to 1Minutely

2014-01-01 00:00:00    248.25
2014-01-01 00:01:00    245.70
Freq: T, dtype: float64

In [109]:
# Many additional time series features
#ts 

### Plotting


In [2]:
ts.plot()

NameError: name 'ts' is not defined

In [111]:
def randwalk(startdate,points):
    ts = pd.Series(np.random.randn(points), index=pd.date_range(startdate, periods=points))
    ts=ts.cumsum()
    ts.plot()
    return(ts)

In [112]:
# Using pandas to make a simple random walker by repeatedly running:
a=randwalk('1/1/2012',1000)

In [113]:
# Pandas plot function will print with labels as default

In [114]:
df = pd.DataFrame(np.random.randn(100, 4), index=ts.index,columns=['A', 'B', 'C', 'D'])
df = df.cumsum()
plt.figure();df.plot();plt.legend(loc='best') #

<matplotlib.legend.Legend at 0xb4f7278>

### I/O
I/O is straightforward with, for example, pd.read_csv or df.to_csv

#### The benefits of open source:

Let's look under x's in plt modules

# Next Steps

**Recommended Resources**

Name | Description
--- | ---
[Official Pandas Tutorials](http://pandas.pydata.org/pandas-docs/stable/10min.html) | Wes & Company's selection of tutorials and lectures
[Julia Evans Pandas Cookbook](https://github.com/jvns/pandas-cookbook) | Great resource with examples from weather, bikes and 311 calls
[Learn Pandas Tutorials](https://bitbucket.org/hrojas/learn-pandas) | A great series of Pandas tutorials from Dave Rojas
[Research Computing Python Data PYNBs](https://github.com/ResearchComputing/Meetup-Fall-2013/tree/master/python) | A super awesome set of python notebooks from a meetup-based course exclusively devoted to pandas