In [3]:
import pandas as pd
import numpy as np

### BAD WAY 

In [4]:
index = [('California', 2000), ('California', 2010),
('New York', 2000), ('New York', 2010),
('Texas', 2000), ('Texas', 2010)]
populations = [33871648, 37253956,
  18976457, 19378102,
  20851820, 25145561]
pop = pd.Series(populations, index=index)
pop

(California, 2000)    33871648
(California, 2010)    37253956
(New York, 2000)      18976457
(New York, 2010)      19378102
(Texas, 2000)         20851820
(Texas, 2010)         25145561
dtype: int64

In [5]:
pop.values

array([33871648, 37253956, 18976457, 19378102, 20851820, 25145561])

In [6]:
pop.keys

<bound method Series.keys of (California, 2000)    33871648
(California, 2010)    37253956
(New York, 2000)      18976457
(New York, 2010)      19378102
(Texas, 2000)         20851820
(Texas, 2010)         25145561
dtype: int64>

In [7]:
pop[('California', 2010):('Texas', 2000)]

(California, 2010)    37253956
(New York, 2000)      18976457
(New York, 2010)      19378102
(Texas, 2000)         20851820
dtype: int64

In [8]:
pop[[i for i in pop.index if i[1] == 2010]]

(California, 2010)    37253956
(New York, 2010)      19378102
(Texas, 2010)         25145561
dtype: int64

This produces the desired result, but is not as clean (or as efficient for large datasets) as the slicing syntax we’ve grown to love in Pandas.

## THE BETTER WAY: PANDAS MULTIINDEX 

In [11]:
# index already have multiple columns
index = pd.MultiIndex.from_tuples(index)

In [12]:
index

MultiIndex(levels=[[u'California', u'New York', u'Texas'], [2000, 2010]],
           labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])

Notice that the MultiIndex contains multiple levels of indexing—in this case, the state names and the years, as well as multiple labels for each data point which encode these levels.

In [13]:
pop = pop.reindex(index)

In [14]:
pop

California  2000    33871648
            2010    37253956
New York    2000    18976457
            2010    19378102
Texas       2000    20851820
            2010    25145561
dtype: int64

Here the first two columns of the Series representation show the multiple index values, while the third column shows the data. Notice that some entries are missing in the first column: in this multi-index representation, any blank entry indicates the same value as the line above it.

##### Now to access all data for which the second index is 2010, we can simply use the Pandas slicing notation: 

In [15]:
pop[:, 2010]

California    37253956
New York      19378102
Texas         25145561
dtype: int64

##### To convert multi index series to conventional indexed DataFrame we use unstack() method 

In [16]:
pop_df = pop.unstack()

In [17]:
pop_df

Unnamed: 0,2000,2010
California,33871648,37253956
New York,18976457,19378102
Texas,20851820,25145561


##### add more column to dataframe already existing 

In [18]:
pop_df = pd.DataFrame({'total':pop,
                      'under18':[9267089, 9284094,
                                   4687374, 4318033,
                                   5906301, 6879014]})

In [19]:
pop_df

Unnamed: 0,Unnamed: 1,total,under18
California,2000,33871648,9267089
California,2010,37253956,9284094
New York,2000,18976457,4687374
New York,2010,19378102,4318033
Texas,2000,20851820,5906301
Texas,2010,25145561,6879014


In [22]:
f_u18 = pop_df['under18'] / pop_df['total']

In [23]:
f_u18

California  2000    0.273594
            2010    0.249211
New York    2000    0.247010
            2010    0.222831
Texas       2000    0.283251
            2010    0.273568
dtype: float64