# Essential Functionality

Going throught the fundamentals of interacting with data 

In [1]:
import pandas as pd
import numpy as np
from pandas import Series, DataFrame

___
## Reindexing

means to create a new object but conforming to a new index

In [2]:
obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
obj

d    4.5
b    7.2
a   -5.3
c    3.6
dtype: float64

In [3]:
obj2 = obj.reindex(['a', 'b', 'c', 'd','e']) # will show missing values if a new index doesn't have a value
obj2

a   -5.3
b    7.2
c    3.6
d    4.5
e    NaN
dtype: float64

In [4]:
obj3 = pd.Series(['blue', 'purple', 'yellow'], index=[0, 2, 4])
obj3

0      blue
2    purple
4    yellow
dtype: object

*method* allows you to fill in missing values

In [5]:
obj3.reindex(range(6), method='ffill')

0      blue
1      blue
2    purple
3    purple
4    yellow
5    yellow
dtype: object

*reindex* can also alter the row (index), columns, or both. If you are passing one sequece, it will reindex the rows. If you pass two sequences, it will reindex the columns.

In [6]:
frame = pd.DataFrame(np.arange(9).reshape((3, 3)),
                     index=['a', 'c', 'd'],
                     columns=['Ohio', 'Texas', 'California'])

frame

Unnamed: 0,Ohio,Texas,California
a,0,1,2
c,3,4,5
d,6,7,8


In [7]:
frame2 = frame.reindex(['a', 'b', 'c', 'd'])
frame2

Unnamed: 0,Ohio,Texas,California
a,0.0,1.0,2.0
b,,,
c,3.0,4.0,5.0
d,6.0,7.0,8.0


In [8]:
states = ['Texas', 'Utah', 'California']

frame.reindex(columns=states)

Unnamed: 0,Texas,Utah,California
a,1,,2
c,4,,5
d,7,,8


### reindex function arguments

* **index:** New sequence to use as index. Can be Index instance or any other sequence-like Python data structure. An
Index will be used exactly as is without any copying.
* **method:** Interpolation (fill) method; 'ffill' fills forward, while 'bfill' fills backward.
* **fill_value:** Substitute value to use when introducing missing data by reindexing.
* **limit:** When forward- or backfilling, maximum size gap (in number of elements) to fill.
* **tolerance:** When forward- or backfilling, maximum size gap (in absolute numeric distance) to fill for inexact matches.
* **level:** Match simple Index on level of MultiIndex; otherwise select subset of.
* **copy:** If True, always copy underlying data even if new indez is equivalent to old index; if False, do not copy
the data when the indexes are equivalent


___
## Dropping Entries from an Axis

*drop* method will drop entries from an axis. it will return a copy but will not affect the original object.

In [9]:
obj = pd.Series(np.arange(5.), index=['a', 'b', 'c', 'd', 'e'])
obj

a    0.0
b    1.0
c    2.0
d    3.0
e    4.0
dtype: float64

In [10]:
new_obj = obj.drop('c')
new_obj

a    0.0
b    1.0
d    3.0
e    4.0
dtype: float64

In [11]:
obj.drop(['d','c'])

a    0.0
b    1.0
e    4.0
dtype: float64

With DataFrame, you can drop rows or columns by passing the name of the row or column you want to drop.

In [12]:
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
                    index=['Ohio', 'Colorado', 'Utah', 'New York'],
                    columns=['one', 'two', 'three', 'four'])
data

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


In [14]:
data.drop(['Colorado','Ohio'])

Unnamed: 0,one,two,three,four
Utah,8,9,10,11
New York,12,13,14,15


You can also drop by passing axis labels.

In [15]:
data.drop('two', axis=1)

Unnamed: 0,one,three,four
Ohio,0,2,3
Colorado,4,6,7
Utah,8,10,11
New York,12,14,15


In [16]:
data.drop(['two','four'], axis='columns')

Unnamed: 0,one,three
Ohio,0,2
Colorado,4,6
Utah,8,10
New York,12,14


When setting in-place=True, the operation will modify the original object.

In [None]:
obj.drop('c', inplace=True)

In [19]:
obj

a    0.0
b    1.0
d    3.0
e    4.0
dtype: float64

___
## Indexing, Selection, and Filtering

Series indexing works with the same syntax as with NumPy arrays.

In [21]:
obj = pd.Series(np.arange(4.), index=['a', 'b', 'c', 'd'])
obj

a    0.0
b    1.0
c    2.0
d    3.0
dtype: float64

In [22]:
obj['b']

1.0

In [23]:
obj[1]

  obj[1]


1.0

In [24]:
obj[2:4]

c    2.0
d    3.0
dtype: float64

In [25]:
obj[['b', 'a', 'd']]

b    1.0
a    0.0
d    3.0
dtype: float64

In [27]:
obj[[1,3]]

  obj[[1,3]]


b    1.0
d    3.0
dtype: float64

In [28]:
obj[obj < 2]

a    0.0
b    1.0
dtype: float64

In [29]:
obj['b':'c']

b    1.0
c    2.0
dtype: float64

In [31]:
obj['b':'c'] = 5
obj

a    0.0
b    5.0
c    5.0
d    3.0
dtype: float64

Indexing a DataFrame is for retrieving a subset of the data. It could be a single column, a list of columns, or a 2D slice.

In [32]:
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
                    index=['Ohio', 'Colorado', 'Utah', 'New York'],
                    columns=['one', 'two', 'three', 'four'])
data

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


In [33]:
data['two']

Ohio         1
Colorado     5
Utah         9
New York    13
Name: two, dtype: int32

In [34]:
data[['three','one']]

Unnamed: 0,three,one
Ohio,2,0
Colorado,6,4
Utah,10,8
New York,14,12


In [35]:
data[:2]

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7


In [36]:
data[data['three'] > 5]

Unnamed: 0,one,two,three,four
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


Can also index with a boolean DataFrame

In [37]:
data < 5

Unnamed: 0,one,two,three,four
Ohio,True,True,True,True
Colorado,True,False,False,False
Utah,False,False,False,False
New York,False,False,False,False


In [38]:
data[data < 5] = 0
data

Unnamed: 0,one,two,three,four
Ohio,0,0,0,0
Colorado,0,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


### Selecting with loc and iloc

They enable you to select rows and columns by label or by position. Loc selects by label and iloc by position with integers.

In [39]:
data.loc['Colorado', ['two','three']]

two      5
three    6
Name: Colorado, dtype: int32

In [41]:
data.iloc[2, [3, 0, 1]] # selecting by position with integers

four    11
one      8
two      9
Name: Utah, dtype: int32

In [42]:
data.iloc[2]

one       8
two       9
three    10
four     11
Name: Utah, dtype: int32

In [43]:
data.iloc[[1,2], [3, 0, 1]]

Unnamed: 0,four,one,two
Colorado,7,0,5
Utah,11,8,9


Both funcitons work with slice objects.

In [44]:
data.loc[:'Utah','two']

Ohio        0
Colorado    5
Utah        9
Name: two, dtype: int32

In [45]:
data.iloc[:,:3][data.three > 5]

Unnamed: 0,one,two,three
Colorado,0,5,6
Utah,8,9,10
New York,12,13,14


### Indexing options with DataFrame

* **df[val]:** Select single column or sequence of columns from the DataFrame; special case
conveniences: boolean array (filter rows), slice (slice rows), or boolean DataFrame
(set values based on some criterion)
* **df.loc[val]:** Selects single row or subset of rows from the DataFrame by label
* **df.loc[:, val]:** Selects single column or subset of columns by label
* **df.loc[val1, val2]:** Select both rows and columns by label
* **df.iloc[where]:** Selects single row or subset of rows from the DataFrame by integer position
* **df.iloc[:, where]:** Selects single column or subset of columns by integer position
* **df.iloc[where_i, where_j]:** Select both rows and columns by integer position
* **df.at[label_i, label_j]:** Select a single scalar value by row and column label
* **df.iat[i, j]:** Select a single scalar value by row and column position (integers)
* **reindex method:** Select either rows or columns by labels
* **get_value, set_value methods:** Select single value by row and column label

___
## Integer Indexes