In [5]:
import pandas as pd
import numpy as np

In [2]:
from pandas import Series, DataFrame

## Series

Another way to think about a Series is as a fixed-length, ordered dict, as it is a map‐ ping of index values to data values. It can be used in many contexts where you might use a dict

In [3]:
obj = Series([1,2,3,4])

In [4]:
obj

0    1
1    2
2    3
3    4
dtype: int64

In [5]:
obj.values

array([1, 2, 3, 4])

In [6]:
obj.index

RangeIndex(start=0, stop=4, step=1)

In [7]:
obj2 = Series([1,2,3,4], index=['a','b','c','d'])

In [8]:
obj2

a    1
b    2
c    3
d    4
dtype: int64

In [9]:
obj2.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [11]:
obj2[['a','b']]

a    1
b    2
dtype: int64

In [13]:
obj2['b']=64

In [14]:
obj2

a     1
b    64
c     3
d     4
dtype: int64

In [15]:
obj3 = Series([1,2,3,4], index=['a','b','c','d'])

### Similar to Numpy's vectorization

In [16]:
obj3*3

a     3
b     6
c     9
d    12
dtype: int64

In [17]:
obj3[obj3*3 > 6]

c    3
d    4
dtype: int64

In [18]:
'c' in obj3

True

You can create a series from a dictionary by directly passing in the dictionary into the series constructor.

In [19]:
sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
obj4 = pd.Series(sdata)

In [20]:
obj4

Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64

When you are only passing a dict, the index in the resulting Series will have the dict’s keys in sorted order. You can override this by passing the dict keys in the order you want them to appear in the resulting Series:

In [21]:
states = ['California', 'Ohio', 'Oregon', 'Texas']
obj5 = pd.Series(sdata, index=states)

In [22]:
obj5

California        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
dtype: float64

Here, three values found in sdata were placed in the appropriate locations, but since no value for 'California' was found, it appears as NaN (not a number), which is con‐ sidered in pandas to mark missing or NA values. Since 'Utah' was not included in states, it is excluded from the resulting object. The isnull and notnull functions in pandas should be used to detect missing data

In [23]:
pd.isnull(obj5)

California     True
Ohio          False
Oregon        False
Texas         False
dtype: bool

In [24]:
pd.notnull(obj5)

California    False
Ohio           True
Oregon         True
Texas          True
dtype: bool

In [25]:
# OR as an instance method
obj5.isnull()

California     True
Ohio          False
Oregon        False
Texas         False
dtype: bool

In [26]:
obj

0     1
1     2
2     3
3     4
b    64
dtype: int64

In [28]:
obj.index = ['Bob', 'Steve', 'Jeff', 'Ryan', 'Aditya']

In [29]:
obj

Bob        1
Steve      2
Jeff       3
Ryan       4
Aditya    64
dtype: int64

In [83]:
sports = {'Archery': 'Bhutan',
          'Golf': 'Scotland',
          'Sumo': 'Japan',
          'Taekwondo': 'South Korea'}
s = pd.Series(sports)
s

Archery           Bhutan
Golf            Scotland
Sumo               Japan
Taekwondo    South Korea
dtype: object

In [84]:
s.iloc[3]

'South Korea'

In [85]:
s.loc['Golf']

'Scotland'

In [86]:
s[3]

'South Korea'

## DataFrames

In [6]:
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],
            'year': [2000, 2001, 2002, 2001, 2002, 2003],
            'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}
frame = pd.DataFrame(data)

In [7]:
frame

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9
5,Nevada,2003,3.2


In [10]:
frame.head()

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9


In [11]:
frame.loc[4]

state    Nevada
year       2002
pop         2.9
Name: 4, dtype: object

If you specify a sequence of columns, the DataFrame’s columns will be arranged in that order:

In [12]:
pd.DataFrame(data, columns=['year', 'state', 'pop'])

Unnamed: 0,year,state,pop
0,2000,Ohio,1.5
1,2001,Ohio,1.7
2,2002,Ohio,3.6
3,2001,Nevada,2.4
4,2002,Nevada,2.9
5,2003,Nevada,3.2


If you pass a column that isn’t contained in the dict, it will appear with missing values in the result:

In [13]:
frame2 = pd.DataFrame(data, columns=['year', 'state', 'pop', 'debt'],
                             index=['one', 'two', 'three', 'four',
                                    'five', 'six'])

In [14]:
frame2

Unnamed: 0,year,state,pop,debt
one,2000,Ohio,1.5,
two,2001,Ohio,1.7,
three,2002,Ohio,3.6,
four,2001,Nevada,2.4,
five,2002,Nevada,2.9,
six,2003,Nevada,3.2,


In [15]:
frame2.columns

Index(['year', 'state', 'pop', 'debt'], dtype='object')

In [16]:
frame2.year

one      2000
two      2001
three    2002
four     2001
five     2002
six      2003
Name: year, dtype: int64

In [17]:
frame2["year"]

one      2000
two      2001
three    2002
four     2001
five     2002
six      2003
Name: year, dtype: int64

Both the above notations are valid. But you will have to use the [] if the column name contains spaces or it conflicts with other attributes.

In [21]:
frame2["concat"] = str(frame2.year) + " "+ str(frame2.state)

In [25]:
frame2

Unnamed: 0,year,state,pop,debt,concat
one,2000,Ohio,1.5,,one 2000\ntwo 2001\nthree 2002\nf...
two,2001,Ohio,1.7,,one 2000\ntwo 2001\nthree 2002\nf...
three,2002,Ohio,3.6,,one 2000\ntwo 2001\nthree 2002\nf...
four,2001,Nevada,2.4,,one 2000\ntwo 2001\nthree 2002\nf...
five,2002,Nevada,2.9,,one 2000\ntwo 2001\nthree 2002\nf...
six,2003,Nevada,3.2,,one 2000\ntwo 2001\nthree 2002\nf...


In [75]:
frame2.loc['three']

year     2002
state    Ohio
pop       3.6
debt      NaN
Name: three, dtype: object

In [78]:
#Won't work ->frame2.loc[3]
frame2.iloc[3]

year       2001
state    Nevada
pop         2.4
debt       -1.5
Name: four, dtype: object

Columns can be modified by assignment. For example, the empty 'debt' column could be assigned a scalar value or an array of values:

In [41]:
frame2['debt'] = 16.5
frame2

Unnamed: 0,year,state,pop,debt
one,2000,Ohio,1.5,16.5
two,2001,Ohio,1.7,16.5
three,2002,Ohio,3.6,16.5
four,2001,Nevada,2.4,16.5
five,2002,Nevada,2.9,16.5
six,2003,Nevada,3.2,16.5


In [44]:
frame2['debt'] = np.arange(6.)

In [45]:
frame2

Unnamed: 0,year,state,pop,debt
one,2000,Ohio,1.5,0.0
two,2001,Ohio,1.7,1.0
three,2002,Ohio,3.6,2.0
four,2001,Nevada,2.4,3.0
five,2002,Nevada,2.9,4.0
six,2003,Nevada,3.2,5.0


When you are assigning lists or arrays to a column, the value’s length must match the length of the DataFrame. If you assign a Series, its labels will be realigned exactly to the DataFrame’s index, inserting missing values in any holes:

In [46]:
val = pd.Series([-1.2, -1.5, -1.7], index=['two', 'four', 'five'])

In [47]:
frame2['debt'] = val
frame2

Unnamed: 0,year,state,pop,debt
one,2000,Ohio,1.5,
two,2001,Ohio,1.7,-1.2
three,2002,Ohio,3.6,
four,2001,Nevada,2.4,-1.5
five,2002,Nevada,2.9,-1.7
six,2003,Nevada,3.2,


Assigning a column that doesn’t exist will create a new column. The del keyword will delete columns as with a dict. As an example of del, I first add a new column of boolean values where the state column equals 'Ohio':

In [48]:
frame2['eastern'] = frame2.state == 'Ohio'
frame2

Unnamed: 0,year,state,pop,debt,eastern
one,2000,Ohio,1.5,,True
two,2001,Ohio,1.7,-1.2,True
three,2002,Ohio,3.6,,True
four,2001,Nevada,2.4,-1.5,False
five,2002,Nevada,2.9,-1.7,False
six,2003,Nevada,3.2,,False


In [49]:
del frame2['eastern']

In [79]:
frame2.columns

Index(['year', 'state', 'pop', 'debt'], dtype='object')

In [81]:
frame2["state"]

one        Ohio
two        Ohio
three      Ohio
four     Nevada
five     Nevada
six      Nevada
Name: state, dtype: object

In [82]:
frame2.loc["three"]

year     2002
state    Ohio
pop       3.6
debt      NaN
Name: three, dtype: object

Another common form of data is a nested dict of dicts. If the nested dict is passed to the DataFrame, pandas will interpret the outer dict keys as the columns and the inner keys as the row indices. The keys in the inner dicts are combined and sorted to form the index in the result. This isn’t true if an explicit index is specified.

In [51]:
pop = {'Nevada': {2001: 2.4, 2002: 2.9},
         'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}

In [53]:
frame3 = pd.DataFrame(pop)
frame3

Unnamed: 0,Nevada,Ohio
2001,2.4,1.7
2002,2.9,3.6
2000,,1.5


You can transpose the DataFrame (swap rows and columns) with similar syntax to a NumPy array:

In [54]:
frame3.T

Unnamed: 0,2001,2002,2000
Nevada,2.4,2.9,
Ohio,1.7,3.6,1.5


![alt text](https://i.ibb.co/gR783q4/Screenshot-2019-10-08-at-6-59-25-PM.png "Logo Title Text 1")

In [87]:
import pandas as pd
purchase_1 = pd.Series({'Name': 'Chris',
                        'Item Purchased': 'Dog Food',
                        'Cost': 22.50})
purchase_2 = pd.Series({'Name': 'Kevyn',
                        'Item Purchased': 'Kitty Litter',
                        'Cost': 2.50})
purchase_3 = pd.Series({'Name': 'Vinod',
                        'Item Purchased': 'Bird Seed',
                        'Cost': 5.00})
df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=['Store 1', 'Store 1', 'Store 2'])
df.head()

Unnamed: 0,Name,Item Purchased,Cost
Store 1,Chris,Dog Food,22.5
Store 1,Kevyn,Kitty Litter,2.5
Store 2,Vinod,Bird Seed,5.0


In [88]:
df.loc['Store 2']

Name                  Vinod
Item Purchased    Bird Seed
Cost                      5
Name: Store 2, dtype: object

In [89]:
type(df.loc['Store 2'])

pandas.core.series.Series

In [90]:
df.loc['Store 1']

Unnamed: 0,Name,Item Purchased,Cost
Store 1,Chris,Dog Food,22.5
Store 1,Kevyn,Kitty Litter,2.5


In [91]:
df.loc['Store 1', 'Cost']

Store 1    22.5
Store 1     2.5
Name: Cost, dtype: float64

In [92]:
df.T

Unnamed: 0,Store 1,Store 1.1,Store 2
Name,Chris,Kevyn,Vinod
Item Purchased,Dog Food,Kitty Litter,Bird Seed
Cost,22.5,2.5,5


In [93]:
df.T.loc['Cost']

Store 1    22.5
Store 1     2.5
Store 2       5
Name: Cost, dtype: object

In [94]:
df['Cost']

Store 1    22.5
Store 1     2.5
Store 2     5.0
Name: Cost, dtype: float64

In [95]:
df.loc['Store 1']['Cost']

Store 1    22.5
Store 1     2.5
Name: Cost, dtype: float64

In [96]:
df.loc[:,['Name', 'Cost']]

Unnamed: 0,Name,Cost
Store 1,Chris,22.5
Store 1,Kevyn,2.5
Store 2,Vinod,5.0


In [97]:
df.drop('Store 1')

Unnamed: 0,Name,Item Purchased,Cost
Store 2,Vinod,Bird Seed,5.0


In [99]:
df

Unnamed: 0,Name,Item Purchased,Cost
Store 1,Chris,Dog Food,22.5
Store 1,Kevyn,Kitty Litter,2.5
Store 2,Vinod,Bird Seed,5.0


In [101]:
df.where(df['Cost']>2.5).dropna()

Unnamed: 0,Name,Item Purchased,Cost
Store 1,Chris,Dog Food,22.5
Store 2,Vinod,Bird Seed,5.0


## Index Objects

pandas’s Index objects are responsible for holding the axis labels and other metadata (like the axis name or names). Any array or other sequence of labels you use when constructing a Series or DataFrame is internally converted to an Index

In [56]:
myobj = pd.Series([1,2,3], index=['a', 'b', 'c'])

In [57]:
myobj

a    1
b    2
c    3
dtype: int64

In [61]:
index = myobj.index

In [62]:
index

Index(['a', 'b', 'c'], dtype='object')

In [63]:
index[1:]

Index(['b', 'c'], dtype='object')

Index objects are immutable and thus can’t be modified by the user. Immutability makes it safer to share Index objects among data structures.

In [66]:
labels = pd.Index(np.arange(3))

In [67]:
labels

Int64Index([0, 1, 2], dtype='int64')

In [68]:
mynewobj = pd.Series([1.5, -2.5, 0], index=labels)

In [69]:
mynewobj

0    1.5
1   -2.5
2    0.0
dtype: float64

In [70]:
mynewobj.index is labels

True

In addition to being array-like, an Index also behaves like a fixed-size set.

In [71]:
frame3.columns

Index(['Nevada', 'Ohio'], dtype='object')

In [72]:
frame3.index

Int64Index([2001, 2002, 2000], dtype='int64')

In [73]:
'Nevada' in frame3.columns

True

In [74]:
2001 in frame3.index

True

Unlike Python sets, a pandas Index can contain duplicate labels:

![alt text](https://i.ibb.co/6gNpTP9/Screenshot-2019-10-08-at-7-16-03-PM.png "Logo Title Text 1")

In [26]:
frame2

Unnamed: 0,year,state,pop,debt,concat
one,2000,Ohio,1.5,,one 2000\ntwo 2001\nthree 2002\nf...
two,2001,Ohio,1.7,,one 2000\ntwo 2001\nthree 2002\nf...
three,2002,Ohio,3.6,,one 2000\ntwo 2001\nthree 2002\nf...
four,2001,Nevada,2.4,,one 2000\ntwo 2001\nthree 2002\nf...
five,2002,Nevada,2.9,,one 2000\ntwo 2001\nthree 2002\nf...
six,2003,Nevada,3.2,,one 2000\ntwo 2001\nthree 2002\nf...


In [33]:
frame2.rename(columns = {'concat':'Concat'})

Unnamed: 0,year,state,pop,debt,Concat
one,2000,Ohio,1.5,,one 2000\ntwo 2001\nthree 2002\nf...
two,2001,Ohio,1.7,,one 2000\ntwo 2001\nthree 2002\nf...
three,2002,Ohio,3.6,,one 2000\ntwo 2001\nthree 2002\nf...
four,2001,Nevada,2.4,,one 2000\ntwo 2001\nthree 2002\nf...
five,2002,Nevada,2.9,,one 2000\ntwo 2001\nthree 2002\nf...
six,2003,Nevada,3.2,,one 2000\ntwo 2001\nthree 2002\nf...


For renaming columns this syntax will also work -

frame.columns = [...]

In [34]:
frame2.drop('concat', axis=1, inplace=True)

In [35]:
frame2

Unnamed: 0,year,state,pop,debt
one,2000,Ohio,1.5,
two,2001,Ohio,1.7,
three,2002,Ohio,3.6,
four,2001,Nevada,2.4,
five,2002,Nevada,2.9,
six,2003,Nevada,3.2,
