<h1>Chapter 4: Indexing and Selecting</h1>

First we import the standard <em>numpy</em> and <em>pandas</em> modules.

In [3]:
import numpy as np
import pandas as pd

Create a time series of crude oil spot prices for the 4 quarters of 2013, taken from IMF data:  

In [4]:
SpotCrudePrices_2013_Data={
                'U.K. Brent' : {'2013-Q1':112.9, '2013-Q2':103.0, '2013-Q3':110.1, '2013-Q4':109.4},
                'Dubai':{'2013-Q1':108.1, '2013-Q2':100.8, '2013-Q3':106.1,'2013-Q4':106.7},
                'West Texas Intermediate':{'2013-Q1':94.4, '2013-Q2':94.2, '2013-Q3':105.8,'2013-Q4':97.4}}
                
SpotCrudePrices_2013=pd.DataFrame.from_dict(SpotCrudePrices_2013_Data)
SpotCrudePrices_2013


Unnamed: 0,Dubai,U.K. Brent,West Texas Intermediate
2013-Q1,108.1,112.9,94.4
2013-Q2,100.8,103.0,94.2
2013-Q3,106.1,110.1,105.8
2013-Q4,106.7,109.4,97.4


Select the prices for the available time periods of Dubai crude using the [] operator:

In [5]:
dubaiPrices=SpotCrudePrices_2013['Dubai']; dubaiPrices

2013-Q1    108.1
2013-Q2    100.8
2013-Q3    106.1
2013-Q4    106.7
Name: Dubai, dtype: float64

Select the columns in a particular order:

In [6]:
SpotCrudePrices_2013[['West Texas Intermediate','U.K. Brent']]


Unnamed: 0,West Texas Intermediate,U.K. Brent
2013-Q1,94.4,112.9
2013-Q2,94.2,103.0
2013-Q3,105.8,110.1
2013-Q4,97.4,109.4


In [7]:
SpotCrudePrices_2013['Brent Blend']

KeyError: 'Brent Blend'

In [8]:
SpotCrudePrices_2013.get('Brent Blend')

In [9]:
SpotCrudePrices_2013.get('U.K. Brent')

2013-Q1    112.9
2013-Q2    103.0
2013-Q3    110.1
2013-Q4    109.4
Name: U.K. Brent, dtype: float64

In [10]:
SpotCrudePrices_2013.get('Brent Blend','N/A')

'N/A'

In [11]:
SpotCrudePrices_2013['2013-Q1']

KeyError: '2013-Q1'

In [12]:
dubaiPrices['2013-Q1']

108.09999999999999

Retrieve values directly as an attribute 

In [13]:
SpotCrudePrices_2013.Dubai

2013-Q1    108.1
2013-Q2    100.8
2013-Q3    106.1
2013-Q4    106.7
Name: Dubai, dtype: float64

In [14]:
SpotCrudePrices_2013

Unnamed: 0,Dubai,U.K. Brent,West Texas Intermediate
2013-Q1,108.1,112.9,94.4
2013-Q2,100.8,103.0,94.2
2013-Q3,106.1,110.1,105.8
2013-Q4,106.7,109.4,97.4


Rename the column index names so they are all valid identifiers:

In [20]:
SpotCrudePrices_2013.columns=['Dubai','UK_Brent', 
                                       'West_Texas_Intermediate']
SpotCrudePrices_2013

Unnamed: 0,Dubai,UK_Brent,West_Texas_Intermediate
2013-Q1,108.1,112.9,94.4
2013-Q2,100.8,103.0,94.2
2013-Q3,106.1,110.1,105.8
2013-Q4,106.7,109.4,97.4


SpotCrudePrices_2013.West_Texas_Intermediate

Select by specifying column index number:

In [18]:
SpotCrudePrices_2013[[1]]

Unnamed: 0,U.K. Brent
2013-Q1,112.9
2013-Q2,103.0
2013-Q3,110.1
2013-Q4,109.4


<h2>Range Slicing </h2>

Obtain the 1st 2 rows:

In [23]:
SpotCrudePrices_2013[:2]

Unnamed: 0,Dubai,UK_Brent,West_Texas_Intermediate
2013-Q1,108.1,112.9,94.4
2013-Q2,100.8,103.0,94.2


Obtain all rows starting from index 2:

In [24]:
SpotCrudePrices_2013[2:]

Unnamed: 0,Dubai,UK_Brent,West_Texas_Intermediate
2013-Q3,106.1,110.1,105.8
2013-Q4,106.7,109.4,97.4


Obtain rows at interval of 2, starting from row 0:

In [25]:
SpotCrudePrices_2013[::2]

Unnamed: 0,Dubai,UK_Brent,West_Texas_Intermediate
2013-Q1,108.1,112.9,94.4
2013-Q3,106.1,110.1,105.8


Reverse the order of rows in DataFrame:

In [26]:
SpotCrudePrices_2013[::-1]

Unnamed: 0,Dubai,UK_Brent,West_Texas_Intermediate
2013-Q4,106.7,109.4,97.4
2013-Q3,106.1,110.1,105.8
2013-Q2,100.8,103.0,94.2
2013-Q1,108.1,112.9,94.4


<h4>Series behavior </h4>

In [29]:
dubaiPrices=SpotCrudePrices_2013['Dubai']
dubaiPrices

2013-Q1    108.1
2013-Q2    100.8
2013-Q3    106.1
2013-Q4    106.7
Name: Dubai, dtype: float64

Obtain last 3 rows or all rows higher than the first.

In [28]:
dubaiPrices[1:]

2013-Q2    100.8
2013-Q3    106.1
2013-Q4    106.7
Name: Dubai, dtype: float64

Obtain all rows but the last:

In [31]:
dubaiPrices[:-1]

2013-Q1    108.1
2013-Q2    100.8
2013-Q3    106.1
Name: Dubai, dtype: float64

Reverse the rows:

In [32]:
dubaiPrices[::-1]

2013-Q4    106.7
2013-Q3    106.1
2013-Q2    100.8
2013-Q1    108.1
Name: Dubai, dtype: float64

## Label-oriented Indexing

Create a DataFrame:

In [33]:
NYC_SnowAvgsData={'Months' :          
                            ['January','February','March', 
                            'April', 'November', 'December'],
                            'Avg SnowDays' : [4.0,2.7,1.7,0.2,0.2,2.3],
                            'Avg Precip. (cm)' : [17.8,22.4,9.1,1.5,0.8,12.2],
                            'Avg Low Temp. (F)' : [27,29,35,45,42,32] }


In [34]:
NYC_SnowAvgsData

{'Avg Low Temp. (F)': [27, 29, 35, 45, 42, 32],
 'Avg Precip. (cm)': [17.8, 22.4, 9.1, 1.5, 0.8, 12.2],
 'Avg SnowDays': [4.0, 2.7, 1.7, 0.2, 0.2, 2.3],
 'Months': ['January', 'February', 'March', 'April', 'November', 'December']}

In [35]:
NYC_SnowAvgs=pd.DataFrame(NYC_SnowAvgsData,      
                      index=NYC_SnowAvgsData['Months'], 
                      columns=['Avg SnowDays','Avg Precip. (cm)',                                                               
                               'Avg Low Temp. (F)'])
NYC_SnowAvgs


Unnamed: 0,Avg SnowDays,Avg Precip. (cm),Avg Low Temp. (F)
January,4.0,17.8,27
February,2.7,22.4,29
March,1.7,9.1,35
April,0.2,1.5,45
November,0.2,0.8,42
December,2.3,12.2,32


Using single label with <em>.loc</em> operator:

In [36]:
NYC_SnowAvgs.loc['January']

Avg SnowDays          4.0
Avg Precip. (cm)     17.8
Avg Low Temp. (F)    27.0
Name: January, dtype: float64

Using list or labels:

In [37]:
NYC_SnowAvgs.loc[['January','April']]

Unnamed: 0,Avg SnowDays,Avg Precip. (cm),Avg Low Temp. (F)
January,4.0,17.8,27
April,0.2,1.5,45


Using label range:

In [38]:
NYC_SnowAvgs.loc['January':'March']

Unnamed: 0,Avg SnowDays,Avg Precip. (cm),Avg Low Temp. (F)
January,4.0,17.8,27
February,2.7,22.4,29
March,1.7,9.1,35


Row index must be specified first:

In [39]:
NYC_SnowAvgs.loc['Avg SnowDays']

KeyError: 'the label [Avg SnowDays] is not in the [index]'

In [40]:
NYC_SnowAvgs.loc[:,'Avg SnowDays']

January     4.0
February    2.7
March       1.7
April       0.2
November    0.2
December    2.3
Name: Avg SnowDays, dtype: float64

Specific 'coordinate' selection

In [41]:
NYC_SnowAvgs.loc['March','Avg SnowDays']

1.7

Alternative style:

In [42]:
NYC_SnowAvgs.loc['March']['Avg SnowDays']

1.7

Using square brackets ( [ ] ):

In [43]:
NYC_SnowAvgs['Avg SnowDays']['March']

1.7

[ ] operator cannot be used to select rows directly.

In [44]:
NYC_SnowAvgs['March']['Avg SnowDays']

KeyError: 'March'

Use <em>.loc</em> operator instead

In [45]:
NYC_SnowAvgs.loc['March']

Avg SnowDays          1.7
Avg Precip. (cm)      9.1
Avg Low Temp. (F)    35.0
Name: March, dtype: float64

<h3>Selection using a Boolean array</h3>

In [46]:
NYC_SnowAvgs.loc[NYC_SnowAvgs['Avg SnowDays']<1,:]

Unnamed: 0,Avg SnowDays,Avg Precip. (cm),Avg Low Temp. (F)
April,0.2,1.5,45
November,0.2,0.8,42


In [47]:
SpotCrudePrices_2013.loc[:,SpotCrudePrices_2013.loc['2013-Q1']>110]

Unnamed: 0,UK_Brent
2013-Q1,112.9
2013-Q2,103.0
2013-Q3,110.1
2013-Q4,109.4


In [48]:
SpotCrudePrices_2013.loc['2013-Q1']>110

Dubai                      False
UK_Brent                    True
West_Texas_Intermediate    False
Name: 2013-Q1, dtype: bool

<h2>Integer-oriented Indexing</h2>

Create DataFrame

In [51]:
import scipy.constants as phys
import math

In [55]:
sci_values=pd.DataFrame([[math.pi, math.sin(math.pi), 
                                    math.cos(math.pi)],
                                   [math.e,math.log(math.e), 
                                    phys.golden],
                                   [phys.c,phys.g,phys.e],
                                   [phys.m_e,phys.m_p,phys.m_n]],
                          index=list(range(0,20,5)))
sci_values

Unnamed: 0,0,1,2
0,3.141593,1.224647e-16,-1.0
5,2.718282,1.0,1.618034
10,299792500.0,9.80665,1.6021769999999999e-19
15,9.109383e-31,1.672622e-27,1.674927e-27


Select first two rows using integer slicing:

In [53]:
sci_values.iloc[:2]

Unnamed: 0,0,1,2
0,3.141593,1.224647e-16,-1.0
5,2.718282,1.0,1.618034


Select speed of light and acceleration of gravity in the 3rd row:

In [54]:
sci_values.iloc[2,0:2]

0    2.997925e+08
1    9.806650e+00
Name: 10, dtype: float64

Arguments to <em>.iloc</em> are strictly positional:

In [56]:
sci_values.iloc[10]

IndexError: single positional indexer is out-of-bounds

Use <em>.loc</em> instead:

In [57]:
sci_values.loc[10]

0    2.997925e+08
1    9.806650e+00
2    1.602177e-19
Name: 10, dtype: float64

Slice out specific row:

In [58]:
sci_values.iloc[2:3,:]

Unnamed: 0,0,1,2
10,299792458,9.80665,1.6021769999999999e-19


Obtain cross-section using integer position:

In [59]:
sci_values.iloc[3]

0    9.109383e-31
1    1.672622e-27
2    1.674927e-27
Name: 15, dtype: float64

Attempt to slice past the end of the array : 

In [60]:
sci_values.iloc[6,:]

IndexError: single positional indexer is out-of-bounds

Selection of scalar values and timings.

In [61]:
sci_values.iloc[3,0]

9.1093829099999999e-31

In [62]:
sci_values.iat[3,0]

9.1093829099999999e-31

In [63]:
%timeit sci_values.iloc[3,0]

1000 loops, best of 3: 156 µs per loop


In [65]:
%timeit sci_values.iat[3,0]

The slowest run took 5.47 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 14.4 µs per loop


<h2>Mixed Indexing with the .ix operator</h2>

In [67]:
stockIndexDataDF=pd.read_csv('data/stock_index_data.csv')

In [68]:
stockIndexDataDF

Unnamed: 0,TradingDate,Nasdaq,S&P 500,Russell 2000
0,2014/01/30,4123.13,1794.19,1139.36
1,2014/01/31,4103.88,1782.59,1130.88
2,2014/02/03,3996.96,1741.89,1094.58
3,2014/02/04,4031.52,1755.2,1102.84
4,2014/02/05,4011.55,1751.64,1093.59
5,2014/02/06,4057.12,1773.43,1103.93


In [69]:
stockIndexDF=stockIndexDataDF.set_index('TradingDate')
stockIndexDF

Unnamed: 0_level_0,Nasdaq,S&P 500,Russell 2000
TradingDate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2014/01/30,4123.13,1794.19,1139.36
2014/01/31,4103.88,1782.59,1130.88
2014/02/03,3996.96,1741.89,1094.58
2014/02/04,4031.52,1755.2,1102.84
2014/02/05,4011.55,1751.64,1093.59
2014/02/06,4057.12,1773.43,1103.93


<b>Using a single label:</b>

In [70]:
stockIndexDF.ix['2014/01/30']

Nasdaq          4123.13
S&P 500         1794.19
Russell 2000    1139.36
Name: 2014/01/30, dtype: float64

<b>Using list of labels</b>:

In [71]:
stockIndexDF.ix[['2014/01/30']]

Unnamed: 0,Nasdaq,S&P 500,Russell 2000
2014/01/30,4123.13,1794.19,1139.36


<b>Difference between using scalar indexer and list indexer:</b>

In [72]:
type(stockIndexDF.ix['2014/01/30'])

pandas.core.series.Series

In [73]:
type(stockIndexDF.ix[['2014/01/30']])

pandas.core.frame.DataFrame

<b>Using a label-based slice:</b>

In [74]:
tradingDates=stockIndexDataDF.TradingDate

In [75]:
stockIndexDF.ix[tradingDates[:3]]

Unnamed: 0,Nasdaq,S&P 500,Russell 2000
2014/01/30,4123.13,1794.19,1139.36
2014/01/31,4103.88,1782.59,1130.88
2014/02/03,3996.96,1741.89,1094.58


<b>Using a single integer:</b>

In [76]:
stockIndexDF.ix[0]

Nasdaq          4123.13
S&P 500         1794.19
Russell 2000    1139.36
Name: 2014/01/30, dtype: float64

<b>Using a list of integers:</b>

In [77]:
stockIndexDF.ix[[0,2]]

Unnamed: 0_level_0,Nasdaq,S&P 500,Russell 2000
TradingDate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2014/01/30,4123.13,1794.19,1139.36
2014/02/03,3996.96,1741.89,1094.58


<b>Using an integer slice:</b>

In [78]:
stockIndexDF.ix[1:3]

Unnamed: 0_level_0,Nasdaq,S&P 500,Russell 2000
TradingDate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2014/01/31,4103.88,1782.59,1130.88
2014/02/03,3996.96,1741.89,1094.58


<b>Using a boolean array:</b>

In [79]:
stockIndexDF.ix[stockIndexDF['Russell 2000']>1100]

Unnamed: 0_level_0,Nasdaq,S&P 500,Russell 2000
TradingDate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2014/01/30,4123.13,1794.19,1139.36
2014/01/31,4103.88,1782.59,1130.88
2014/02/04,4031.52,1755.2,1102.84
2014/02/06,4057.12,1773.43,1103.93


<h2>Multi-Indexing</h2>

<b>Read stock index data:</b>

In [81]:
sharesIndexDataDF=pd.read_csv('./data/stock_index_prices.csv')

In [82]:
sharesIndexDataDF

Unnamed: 0,TradingDate,PriceType,Nasdaq,S&P 500,Russell 2000
0,2014/02/21,open,4282.17,1841.07,1166.25
1,2014/02/21,close,4263.41,1836.25,1164.63
2,2014/02/21,high,4284.85,1846.13,1168.43
3,2014/02/24,open,4273.32,1836.78,1166.74
4,2014/02/24,close,4292.97,1847.61,1174.55
5,2014/02/24,high,4311.13,1858.71,1180.29
6,2014/02/25,open,4298.48,1847.66,1176.0
7,2014/02/25,close,4287.59,1845.12,1173.95
8,2014/02/25,high,4307.51,1852.91,1179.43
9,2014/02/26,open,4300.45,1845.79,1176.11


<b>Create a MultiIndex :</b>

In [83]:
sharesIndexDF=sharesIndexDataDF.set_index(['TradingDate','PriceType'])

In [84]:
mIndex=sharesIndexDF.index; mIndex

MultiIndex(levels=[[u'2014/02/21', u'2014/02/24', u'2014/02/25', u'2014/02/26', u'2014/02/27', u'2014/02/28'], [u'close', u'high', u'open']],
           labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5], [2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1]],
           names=[u'TradingDate', u'PriceType'])

In [85]:
sharesIndexDF

Unnamed: 0_level_0,Unnamed: 1_level_0,Nasdaq,S&P 500,Russell 2000
TradingDate,PriceType,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2014/02/21,open,4282.17,1841.07,1166.25
2014/02/21,close,4263.41,1836.25,1164.63
2014/02/21,high,4284.85,1846.13,1168.43
2014/02/24,open,4273.32,1836.78,1166.74
2014/02/24,close,4292.97,1847.61,1174.55
2014/02/24,high,4311.13,1858.71,1180.29
2014/02/25,open,4298.48,1847.66,1176.0
2014/02/25,close,4287.59,1845.12,1173.95
2014/02/25,high,4307.51,1852.91,1179.43
2014/02/26,open,4300.45,1845.79,1176.11


<b>Apply get_level_values function:</b>

In [86]:
mIndex.get_level_values(0)

Index([u'2014/02/21', u'2014/02/21', u'2014/02/21', u'2014/02/24', u'2014/02/24', u'2014/02/24', u'2014/02/25', u'2014/02/25', u'2014/02/25', u'2014/02/26', u'2014/02/26', u'2014/02/26', u'2014/02/27', u'2014/02/27', u'2014/02/27', u'2014/02/28', u'2014/02/28', u'2014/02/28'], dtype='object')

In [87]:
mIndex.get_level_values(1)

Index([u'open', u'close', u'high', u'open', u'close', u'high', u'open', u'close', u'high', u'open', u'close', u'high', u'open', u'close', u'high', u'open', u'close', u'high'], dtype='object')

In [88]:
mIndex.get_level_values(2)

IndexError: Too many levels: Index has only 2 levels, not 3

<b>Hierarchical indexing with a multi-indexed DataFrame:</b>

In [89]:
sharesIndexDF.ix['2014/02/21']

Unnamed: 0_level_0,Nasdaq,S&P 500,Russell 2000
PriceType,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
open,4282.17,1841.07,1166.25
close,4263.41,1836.25,1164.63
high,4284.85,1846.13,1168.43


In [90]:
sharesIndexDF.ix['2014/02/21','open']

Nasdaq          4282.17
S&P 500         1841.07
Russell 2000    1166.25
Name: (2014/02/21, open), dtype: float64

<b>Slice using a multi-index:</b>

In [91]:
sharesIndexDF.ix['2014/02/21':'2014/02/24']

Unnamed: 0_level_0,Unnamed: 1_level_0,Nasdaq,S&P 500,Russell 2000
TradingDate,PriceType,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2014/02/21,open,4282.17,1841.07,1166.25
2014/02/21,close,4263.41,1836.25,1164.63
2014/02/21,high,4284.85,1846.13,1168.43
2014/02/24,open,4273.32,1836.78,1166.74
2014/02/24,close,4292.97,1847.61,1174.55
2014/02/24,high,4311.13,1858.71,1180.29


<b>Try slicing at a lower level:</b>

In [92]:
sharesIndexDF.ix[('2014/02/21','open'):('2014/02/24','open')]

KeyError: 'Key length (2) was greater than MultiIndex lexsort depth (1)'

Sort first before slicing with a MultiIndex:

In [93]:
sharesIndexDF.sortlevel(0).ix[('2014/02/21','open'):('2014/02/24','open')]

Unnamed: 0_level_0,Unnamed: 1_level_0,Nasdaq,S&P 500,Russell 2000
TradingDate,PriceType,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2014/02/21,open,4282.17,1841.07,1166.25
2014/02/24,close,4292.97,1847.61,1174.55
2014/02/24,high,4311.13,1858.71,1180.29
2014/02/24,open,4273.32,1836.78,1166.74


In [94]:
sharesIndexDF.sortlevel(0).ix[('2014/02/21','close'):('2014/02/24','close')]

Unnamed: 0_level_0,Unnamed: 1_level_0,Nasdaq,S&P 500,Russell 2000
TradingDate,PriceType,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2014/02/21,close,4263.41,1836.25,1164.63
2014/02/21,high,4284.85,1846.13,1168.43
2014/02/21,open,4282.17,1841.07,1166.25
2014/02/24,close,4292.97,1847.61,1174.55


<b>Pass list of tuples:</b>

In [96]:
sharesIndexDF.sortlevel(0).ix[[('2014/02/21','open'),('2014/02/24','open')]]

Unnamed: 0_level_0,Unnamed: 1_level_0,Nasdaq,S&P 500,Russell 2000
TradingDate,PriceType,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2014/02/21,open,4282.17,1841.07,1166.25
2014/02/24,open,4273.32,1836.78,1166.74


<b>Use of the swaplevel function:</b>

In [98]:
swappedDF=sharesIndexDF[:7].swaplevel(0, 1, axis=0)
swappedDF

Unnamed: 0_level_0,Unnamed: 1_level_0,Nasdaq,S&P 500,Russell 2000
PriceType,TradingDate,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
open,2014/02/21,4282.17,1841.07,1166.25
close,2014/02/21,4263.41,1836.25,1164.63
high,2014/02/21,4284.85,1846.13,1168.43
open,2014/02/24,4273.32,1836.78,1166.74
close,2014/02/24,4292.97,1847.61,1174.55
high,2014/02/24,4311.13,1858.71,1180.29
open,2014/02/25,4298.48,1847.66,1176.0


In [99]:
reorderedDF=sharesIndexDF[:7].reorder_levels(['PriceType', 
                                                      'TradingDate'], 
                                                       axis=0)
reorderedDF


Unnamed: 0_level_0,Unnamed: 1_level_0,Nasdaq,S&P 500,Russell 2000
PriceType,TradingDate,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
open,2014/02/21,4282.17,1841.07,1166.25
close,2014/02/21,4263.41,1836.25,1164.63
high,2014/02/21,4284.85,1846.13,1168.43
open,2014/02/24,4273.32,1836.78,1166.74
close,2014/02/24,4292.97,1847.61,1174.55
high,2014/02/24,4311.13,1858.71,1180.29
open,2014/02/25,4298.48,1847.66,1176.0


<h2>Cross-sections</h2>

<b>xs( ) method</b>

In [102]:
sharesIndexDF.xs('open',level='PriceType')

Unnamed: 0_level_0,Nasdaq,S&P 500,Russell 2000
TradingDate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2014/02/21,4282.17,1841.07,1166.25
2014/02/24,4273.32,1836.78,1166.74
2014/02/25,4298.48,1847.66,1176.0
2014/02/26,4300.45,1845.79,1176.11
2014/02/27,4291.47,1844.9,1179.28
2014/02/28,4323.52,1855.12,1189.19


<b>swaplevel( ) alternative:</b>

In [103]:
sharesIndexDF.swaplevel(0, 1, axis=0).ix['open']

Unnamed: 0_level_0,Nasdaq,S&P 500,Russell 2000
TradingDate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2014/02/21,4282.17,1841.07,1166.25
2014/02/24,4273.32,1836.78,1166.74
2014/02/25,4298.48,1847.66,1176.0
2014/02/26,4300.45,1845.79,1176.11
2014/02/27,4291.47,1844.9,1179.28
2014/02/28,4323.52,1855.12,1189.19


<h1>Boolean Indexing</h2>

<B>Trading dates for which NASD closed above 4300:</b>

In [116]:
sharesIndexDataDF

Unnamed: 0,TradingDate,PriceType,Nasdaq,S&P 500,Russell 2000
0,2014/02/21,open,4282.17,1841.07,1166.25
1,2014/02/21,close,4263.41,1836.25,1164.63
2,2014/02/21,high,4284.85,1846.13,1168.43
3,2014/02/24,open,4273.32,1836.78,1166.74
4,2014/02/24,close,4292.97,1847.61,1174.55
5,2014/02/24,high,4311.13,1858.71,1180.29
6,2014/02/25,open,4298.48,1847.66,1176.0
7,2014/02/25,close,4287.59,1845.12,1173.95
8,2014/02/25,high,4307.51,1852.91,1179.43
9,2014/02/26,open,4300.45,1845.79,1176.11


In [119]:
sharesIndexDataDF.ix[(sharesIndexDataDF['PriceType']=='close') & \
                     (sharesIndexDataDF['Nasdaq']>4300) ]

Unnamed: 0,TradingDate,PriceType,Nasdaq,S&P 500,Russell 2000
13,2014/02/27,close,4318.93,1854.29,1187.94
16,2014/02/28,close,4308.12,1859.45,1183.03


In [120]:
highSelection=sharesIndexDataDF['PriceType']=='high'
NasdaqHigh=sharesIndexDataDF['Nasdaq']<4300
sharesIndexDataDF.ix[highSelection & NasdaqHigh]


Unnamed: 0,TradingDate,PriceType,Nasdaq,S&P 500,Russell 2000
2,2014/02/21,high,4284.85,1846.13,1168.43


<h2>isin, any all methods</h2>

In [121]:
stockSeries=pd.Series(['NFLX','AMZN','GOOG','FB','TWTR'])
stockSeries.isin(['AMZN','FB'])


0    False
1     True
2    False
3     True
4    False
dtype: bool

In [122]:
stockSeries[stockSeries.isin(['AMZN','FB'])]

1    AMZN
3      FB
dtype: object

In [131]:
australianMammals = {'kangaroo': {'Subclass':'marsupial', 
                              'Origin':'native'},
               'flying fox' : {'Subclass':'placental', 
                               'Origin':'native'},              
               'black rat': {'Subclass':'placental', 
                             'Origin':'invasive'},
               'platypus' : {'Subclass':'monotreme', 
                             'Origin':'native'},
               'wallaby' :  {'Subclass':'marsupial', 
                             'Origin':'native'},
        'palm squirrel' : {'Subclass':'placental', 
                           'Origin':'invasive'},
        'anteater':     {'Subclass':'monotreme', 
                         'Origin':'native'},
        'koala':        {'Subclass':'marsupial', 
                         'Origin':'native'}
}



In [132]:
ozzieMammalsDF=pd.DataFrame(australianMammals)

In [133]:
aussieMammalsDF=ozzieMammalsDF.T; aussieMammalsDF

Unnamed: 0,Origin,Subclass
anteater,native,monotreme
black rat,invasive,placental
flying fox,native,placental
kangaroo,native,marsupial
koala,native,marsupial
palm squirrel,invasive,placental
platypus,native,monotreme
wallaby,native,marsupial


In [134]:
aussieMammalsDF.isin({'Subclass':['marsupial'],'Origin':['native']})

Unnamed: 0,Origin,Subclass
anteater,True,False
black rat,False,False
flying fox,True,False
kangaroo,True,True
koala,True,True
palm squirrel,False,False
platypus,True,False
wallaby,True,True


In [137]:
nativeMarsupials={'Subclass':['marsupial'],
                            'Origin':['native']}


In [142]:
nativeMarsupialMask=aussieMammalsDF.isin(nativeMarsupials).all(1)
aussieMammalsDF[nativeMarsupialMask]


Unnamed: 0,Origin,Subclass
kangaroo,native,marsupial
koala,native,marsupial
wallaby,native,marsupial


<h2>where() method</h2>

In [143]:
np.random.seed(100)
normvals=pd.Series([np.random.normal() for i in np.arange(10)])
normvals


0   -1.749765
1    0.342680
2    1.153036
3   -0.252436
4    0.981321
5    0.514219
6    0.221180
7   -1.070043
8   -0.189496
9    0.255001
dtype: float64

<b>Difference between using where() and standard boolean as filter on Series object</b>

In [144]:
normvals[normvals>0]

1    0.342680
2    1.153036
4    0.981321
5    0.514219
6    0.221180
9    0.255001
dtype: float64

In [145]:
normvals.where(normvals>0)

0         NaN
1    0.342680
2    1.153036
3         NaN
4    0.981321
5    0.514219
6    0.221180
7         NaN
8         NaN
9    0.255001
dtype: float64

<b>No Difference between using where() and standard boolean as filter on Pandas object</b>

In [146]:
np.random.seed(100) 
normDF=pd.DataFrame([[round(np.random.normal(),3) for i in np.arange(5)] for j in range(3)], 
             columns=['0','30','60','90','120'])
normDF


Unnamed: 0,0,30,60,90,120
0,-1.75,0.343,1.153,-0.252,0.981
1,0.514,0.221,-1.07,-0.189,0.255
2,-0.458,0.435,-0.584,0.817,0.673


In [147]:
normDF[normDF>0]

Unnamed: 0,0,30,60,90,120
0,,0.343,1.153,,0.981
1,0.514,0.221,,,0.255
2,,0.435,,0.817,0.673


In [148]:
normDF.where(normDF>0)

Unnamed: 0,0,30,60,90,120
0,,0.343,1.153,,0.981
1,0.514,0.221,,,0.255
2,,0.435,,0.817,0.673


<b>mask() is inverse of where()</b>

In [149]:
normDF.mask(normDF>0)

Unnamed: 0,0,30,60,90,120
0,-1.75,,,-0.252,
1,,,-1.07,-0.189,
2,-0.458,,-0.584,,


<h2>Operations on Indexes</h2>

<b>Read in stock index data</b>

In [150]:
stockIndexDataDF=pd.read_csv('./data/stock_index_data.csv')

In [151]:
stockIndexDataDF

Unnamed: 0,TradingDate,Nasdaq,S&P 500,Russell 2000
0,2014/01/30,4123.13,1794.19,1139.36
1,2014/01/31,4103.88,1782.59,1130.88
2,2014/02/03,3996.96,1741.89,1094.58
3,2014/02/04,4031.52,1755.2,1102.84
4,2014/02/05,4011.55,1751.64,1093.59
5,2014/02/06,4057.12,1773.43,1103.93


<b>Set the index of DataFrame to the TradingDate using set_index(..)</b>

In [152]:
stockIndexDF=stockIndexDataDF.set_index('TradingDate')

In [153]:
stockIndexDF

Unnamed: 0_level_0,Nasdaq,S&P 500,Russell 2000
TradingDate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2014/01/30,4123.13,1794.19,1139.36
2014/01/31,4103.88,1782.59,1130.88
2014/02/03,3996.96,1741.89,1094.58
2014/02/04,4031.52,1755.2,1102.84
2014/02/05,4011.55,1751.64,1093.59
2014/02/06,4057.12,1773.43,1103.93


<b>reset_index reverses set_index:</b>

In [154]:
stockIndexDF.reset_index()

Unnamed: 0,TradingDate,Nasdaq,S&P 500,Russell 2000
0,2014/01/30,4123.13,1794.19,1139.36
1,2014/01/31,4103.88,1782.59,1130.88
2,2014/02/03,3996.96,1741.89,1094.58
3,2014/02/04,4031.52,1755.2,1102.84
4,2014/02/05,4011.55,1751.64,1093.59
5,2014/02/06,4057.12,1773.43,1103.93


This concludes the chapter. 