## Manipulating DataFrame contents

During Analysis of data, there can be several situation when we need to manipulate the contents of DataFrame. DataFrame objects support various ways of manipulating its contents. Some manipulating operations are

* Adding/ deleting row(s)
* Adding/ deleting column(s)

First we create a DataFrame.

In [2]:
import pandas as pd
x = dict(a = 1.2, b = 1.7, c = 1.3, d = 1.6)
y = dict(a = 12.5, b = 12.7, c = 11.9, d = 13.1)
df = pd.DataFrame({'X':x, 'Y':y})
df

Unnamed: 0,X,Y
a,1.2,12.5
b,1.7,12.7
c,1.3,11.9
d,1.6,13.1


### Adding a column

Am new column can be easily added to a DataFrame by assignment as shown below.

#### Adding a list as a column

In [3]:
df['Z'] = [5, 6, 7, 8]
df

Unnamed: 0,X,Y,Z
a,1.2,12.5,5
b,1.7,12.7,6
c,1.3,11.9,7
d,1.6,13.1,8


#### Adding a numpy array as a column

In [4]:
import numpy as np
df['U'] = np.array([15, 16, 17, 18])
df

Unnamed: 0,X,Y,Z,U
a,1.2,12.5,5,15
b,1.7,12.7,6,16
c,1.3,11.9,7,17
d,1.6,13.1,8,18


#### Adding a `Series` as a column

In [5]:
df['V'] = pd.Series({'a': 31.6, 'b': 37.2, 'c':32.1, 'd':35.5})
df

Unnamed: 0,X,Y,Z,U,V
a,1.2,12.5,5,15,31.6
b,1.7,12.7,6,16,37.2
c,1.3,11.9,7,17,32.1
d,1.6,13.1,8,18,35.5


Note that in case of `Series` added as column, index values of Series are utilized. This becomes clearer in the following example.

In [6]:
df['W'] = pd.Series({'b': 316, 'c': 372, 'd':321, 'e':355})
df

Unnamed: 0,X,Y,Z,U,V,W
a,1.2,12.5,5,15,31.6,
b,1.7,12.7,6,16,37.2,316.0
c,1.3,11.9,7,17,32.1,372.0
d,1.6,13.1,8,18,35.5,321.0


Note the missing value denoted by `NaN` in column W at index 'a'. Also note that the value at index 'e' in the input data is ignored.

The key-value pairs in the dictionary does not work as it might be expected.

In [7]:
df['A'] = {'a': 3.16, 'b': 3.72, 'c':3.21, 'd':3.55}
df

Unnamed: 0,X,Y,Z,U,V,W,A
a,1.2,12.5,5,15,31.6,,a
b,1.7,12.7,6,16,37.2,316.0,b
c,1.3,11.9,7,17,32.1,372.0,c
d,1.6,13.1,8,18,35.5,321.0,d


New column can also be added using syntax `df.loc[:,<column name>]` instead of `df[column_name]` as shown below.

In [9]:
df.loc[:,'B']=25
df

Unnamed: 0,X,Y,Z,U,V,W,A,B
a,1.2,12.5,5,15,31.6,,a,25
b,1.7,12.7,6,16,37.2,316.0,b,25
c,1.3,11.9,7,17,32.1,372.0,c,25
d,1.6,13.1,8,18,35.5,321.0,d,25


Note, however, that the attribute syntax does not add a column. Instead it adds an attribute.

In [10]:
df.C = [11, 12, 13, 14]
df

  df.C = [11, 12, 13, 14]


Unnamed: 0,X,Y,Z,U,V,W,A,B
a,1.2,12.5,5,15,31.6,,a,25
b,1.7,12.7,6,16,37.2,316.0,b,25
c,1.3,11.9,7,17,32.1,372.0,c,25
d,1.6,13.1,8,18,35.5,321.0,d,25


The newly added attribute can be seen below.

In [11]:
df.C

[11, 12, 13, 14]

### Adding rows

New rows can be added using the same approach as in case of column.

#### Adding a list

In [12]:
df.loc['e'] = [1, 2, 3, 4, 5, 6, 7, 8]
df

Unnamed: 0,X,Y,Z,U,V,W,A,B
a,1.2,12.5,5,15,31.6,,a,25
b,1.7,12.7,6,16,37.2,316.0,b,25
c,1.3,11.9,7,17,32.1,372.0,c,25
d,1.6,13.1,8,18,35.5,321.0,d,25
e,1.0,2.0,3,4,5.0,6.0,7,8


#### Adding a Series

In [13]:
newRow = pd.Series([5, 15, 20, 25, 30], index = ['X', 'Y', 'Z', 'U', 'W'])
df.loc['f'] = newRow
df

Unnamed: 0,X,Y,Z,U,V,W,A,B
a,1.2,12.5,5.0,15.0,31.6,,a,25.0
b,1.7,12.7,6.0,16.0,37.2,316.0,b,25.0
c,1.3,11.9,7.0,17.0,32.1,372.0,c,25.0
d,1.6,13.1,8.0,18.0,35.5,321.0,d,25.0
e,1.0,2.0,3.0,4.0,5.0,6.0,7,8.0
f,5.0,15.0,20.0,25.0,,30.0,,


***Remark***

Its important to note that `iloc` indexer cannot be used to add a row/column in a DataFrame.

In [None]:
# newRow = pd.Series([50, 150, 200, 250, 300], index = ['X', 'Y', 'Z', 'A', 'B'])
# df.iloc[5] = newRow  # This doesn't work
# df

### *Home Work*

Explore the functions `append` and `assign` for adding rows and columns in a DataFrame, and understand how is their working different from the functions discussed above.

### Deleting a column

Existing column can be deleted from DataFrame using a drop method as shown below.

In [14]:
df.drop(columns = ['B'])

Unnamed: 0,X,Y,Z,U,V,W,A
a,1.2,12.5,5.0,15.0,31.6,,a
b,1.7,12.7,6.0,16.0,37.2,316.0,b
c,1.3,11.9,7.0,17.0,32.1,372.0,c
d,1.6,13.1,8.0,18.0,35.5,321.0,d
e,1.0,2.0,3.0,4.0,5.0,6.0,7
f,5.0,15.0,20.0,25.0,,30.0,


Note, however, that the function has only returned a DataFrame with specified column deleted.

The original DataFrame remains unchanged.

In [15]:
df

Unnamed: 0,X,Y,Z,U,V,W,A,B
a,1.2,12.5,5.0,15.0,31.6,,a,25.0
b,1.7,12.7,6.0,16.0,37.2,316.0,b,25.0
c,1.3,11.9,7.0,17.0,32.1,372.0,c,25.0
d,1.6,13.1,8.0,18.0,35.5,321.0,d,25.0
e,1.0,2.0,3.0,4.0,5.0,6.0,7,8.0
f,5.0,15.0,20.0,25.0,,30.0,,


In [16]:
df.drop(columns = ['B'], inplace = True)
df

Unnamed: 0,X,Y,Z,U,V,W,A
a,1.2,12.5,5.0,15.0,31.6,,a
b,1.7,12.7,6.0,16.0,37.2,316.0,b
c,1.3,11.9,7.0,17.0,32.1,372.0,c
d,1.6,13.1,8.0,18.0,35.5,321.0,d
e,1.0,2.0,3.0,4.0,5.0,6.0,7
f,5.0,15.0,20.0,25.0,,30.0,


Multiple columns can be deleted by providing the list of columns to be deleted.

In [17]:
df.drop(columns = ['Z', 'V', 'A'])

Unnamed: 0,X,Y,U,W
a,1.2,12.5,15.0,
b,1.7,12.7,16.0,316.0
c,1.3,11.9,17.0,372.0
d,1.6,13.1,18.0,321.0
e,1.0,2.0,4.0,6.0
f,5.0,15.0,25.0,30.0


### Deleting rows

Similarly, the rows can also be dropped as shown below.

In [18]:
df.drop(index = ['a','f'])

Unnamed: 0,X,Y,Z,U,V,W,A
b,1.7,12.7,6.0,16.0,37.2,316.0,b
c,1.3,11.9,7.0,17.0,32.1,372.0,c
d,1.6,13.1,8.0,18.0,35.5,321.0,d
e,1.0,2.0,3.0,4.0,5.0,6.0,7
