# 3.4 DataFrame 
`DataFrame` is a data structure provided by pandas to store 2-dimensional labeled data that can be conceptually viewed as a table with rows and columns.

- **From `dict` of lists**

`DataFrame` can be created from a `dict` of `lists`. The keys in the `dict` are used as the label of the columns in the `DataFrame`. All lists should have the same size.

In [1]:
import pandas as pd

d = {
    "A": [1,2,3,4,5,6],
    "B": ['a','b','c','d','e','f']
}
df = pd.DataFrame(d)
print(df)

   A  B
0  1  a
1  2  b
2  3  c
3  4  d
4  5  e
5  6  f


By default, each row will be labelled with their integer positions. However this can be defined by providing the input argument index to `DataFrame`. 

In [2]:
import pandas as pd

d = {
    "A": [1,2,3,4,5,6],
    "B": ['a','b','c','d','e','f']
}
df = pd.DataFrame(d, index=['I','II','III','IV','V','VI'])
print(df)

     A  B
I    1  a
II   2  b
III  3  c
IV   4  d
V    5  e
VI   6  f


`dict` of `ndarray` can also be used with similar syntax to create a `DataFrame`.



In [3]:
import pandas as pd
import numpy as np
d = {
    "A": np.array([1,2,3,4,5,6]),
    "B": np.array(['a','b','c','d','e','f'])
}
df = pd.DataFrame(d, index=['I','II','III','IV','V','VI'])
print(df)

     A  B
I    1  a
II   2  b
III  3  c
IV   4  d
V    5  e
VI   6  f


- **From `dict` of Series**
  
`DataFrame` can also be created from a `dict` of `Series`. The main thing of a `Series` and a `list` is that `Series` is labeled, meaning each item of a Series is labelled with an index. Therefore when we pass a `dict` of `Series` to create a `DataFrame`, the items with same label/index are placed in the same row. The labels of the rows follow the label in the Series whereas the labels of the columns follow the key of the `dict`.

In [4]:
import pandas as pd
d = {
    "A": pd.Series([1,2,3,4,5,6], index=['I','II','III','IV','V','VI']),
    "B": pd.Series(['a','b','c','d','e','f'], index=['I','II','III','IV','V','VII'])
}
df = pd.DataFrame(d)
print(df)

       A    B
I    1.0    a
II   2.0    b
III  3.0    c
IV   4.0    d
V    5.0    e
VI   6.0  NaN
VII  NaN    f


If one `Series` contains a label not exist in the other `Series`, it will be filled with `NaN` (not-a-number). Note that in this case we did not pass the input argument index to DataFrame. Therefore all labels from all the `Series` are used to create the labels for the `DataFrame`. This can be observed in the previous code snippet. `d["A"]` has `index VI` but not `VII` whereas `d["B"]` does not have index `VI` but `VII`. The created df has indices of `I`, `II`, `III`, `IV`, `V`, `VI`, `VII` with `NaN` in column `A` row `VII` and column `B` row `VI`.

The created `DataFrame` will only contain labels specified by the index.

In [5]:
import pandas as pd
d = {
    "A": pd.Series([1,2,3,4,5,6], index=['I','II','III','IV','V','VI']),
    "B": pd.Series(['a','b','c','d','e','f'], index=['I','II','III','IV','V','VII'])
}
df = pd.DataFrame(d, index=['I','II','IV'])
print(df)

    A  B
I   1  a
II  2  b
IV  4  d


- **From `list` of `dict`**

`DataFrame` can be created form a `list` of `dict`. Each `dict` is considered as a row in the `DataFrame`, and the keys in each dict are the labels of the columns in the `DataFrame`.

The labels of the rows can be specified with the index argument in `DataFrame`.

In [6]:
import pandas as pd
d = [
    {'a':1, 'b':2, 'c':3},
    {'a':5, 'b':4}
]
df = pd.DataFrame(d)
print(df)

   a  b    c
0  1  2  3.0
1  5  4  NaN


- **From `dict` of `dict`**

`DataFrame` can also be created with a `dict` of `dict`. The keys of the external `dict` are used for the labels of the column (just like in the `dict` of `lists`) and the keys of the internal `dict` are used for the labels of the rows (just like in the `list` of `dict`).

In [7]:
import pandas as pd
d = {
    'a': { 'x': 1, 'y': 2 },
    'b': { 'x': 3, 'y': 4 },
    'c': { 'x': 5, 'y': 6 },
    'd': { 'x': 7, 'y': 8 }
}
df = pd.DataFrame(d)
print(df)

   a  b  c  d
x  1  3  5  7
y  2  4  6  8


- **MultiIndexed DataFrame**

MultiIndexed `DataFrame` is a type of `DataFrame` where we have multiple level of indexing. For example, for the labels of the columns, we can have `a`, `b`, `c` under the group of `A`, and also `a`, `b`, `c` under the group of `B`. Let's go for an example before this is getting more confusing. 

In [8]:
import pandas as pd
d = {
    ('A', 'a'): { ('X', 'x'): 1, ('X', 'y'): 2 },
    ('A', 'b'): { ('X', 'x'): 3, ('X', 'y'): 4 },
    ('A', 'c'): { ('X', 'x'): 5, ('X', 'y'): 6 },
    ('B', 'a'): { ('X', 'z'): 7, ('Y', 'x'): 8 }
}
df = pd.DataFrame(d)
print(df)

       A              B
       a    b    c    a
X x  1.0  3.0  5.0  NaN
  y  2.0  4.0  6.0  NaN
  z  NaN  NaN  NaN  7.0
Y x  NaN  NaN  NaN  8.0


As the `DataFrame` is created with a `dict` of `dict`, the keys of the external dict are used as the columns whereas the keys of the internal `dict` are used as the rows.

## 3.4.1 Details of a DataFrame
When we have a huge `DataFrame`, i.e with many rows and many columns, print a `DataFrame` will not show all the data. Therefore pandas has provided some methods to DataFrame to access and investigate some details of the DataFrame. 

**Inspect a sample of the DataFrame**

We can use `.head` and `.tail` to inspect the first and last few samples of the `DataFrame`. By default, calling the method without any input argument will give the first 5 or the last 5 samples. The number of sample to be shown can be specified as the input argument for the methods.

In [9]:
import pandas as pd
import numpy as np
df = pd.DataFrame({
    "a": np.random.random(500),
    "b": np.random.random(500)
})
print(df)
print(df.head())
print(df.tail(3))

            a         b
0    0.960176  0.434210
1    0.444111  0.274507
2    0.831334  0.395826
3    0.970086  0.506639
4    0.789747  0.902416
..        ...       ...
495  0.071509  0.860431
496  0.402562  0.623391
497  0.595817  0.083011
498  0.416127  0.733303
499  0.813301  0.617727

[500 rows x 2 columns]
          a         b
0  0.960176  0.434210
1  0.444111  0.274507
2  0.831334  0.395826
3  0.970086  0.506639
4  0.789747  0.902416
            a         b
497  0.595817  0.083011
498  0.416127  0.733303
499  0.813301  0.617727


**Metadata**

`pandas` provided a number of functions to facilitate accessing and managing a `DataFrame`. These include

- `.shape`: the shape of the `DataFrame` in (row, column),
- `.size`: the total number of elements in the `DataFrame`,
- `.columns`: the column names of the `DataFrame`,
- `.index`: the row labels of the `DataFrame`,
- `.dtypes`: the data types of each column.

In [10]:
import pandas as pd
import numpy as np
df = pd.DataFrame({
    "a": np.random.random(500),
    "b": np.random.random(500)
})
print(df.shape)
print(df.size)
print(df.columns)
print(df.index)
print(df.dtypes)

(500, 2)
1000
Index(['a', 'b'], dtype='object')
RangeIndex(start=0, stop=500, step=1)
a    float64
b    float64
dtype: object


## 3.4.2 Columns manipulation
### 3.4.2.1 Access a column
In a `DataFrame`, columns can be accessed by treating the DataFrame as a `dict` and the column names as the keys of the `dict`.

Each column of the `DataFrame` is a `Series` object. 

In [None]:
import pandas as pd
import numpy as np
df = pd.DataFrame({
    'a': np.random.random(500),
    'b': np.random.random(500)
})
print(df['a'])
print(type(df['a']))

### 3.4.2.2 Add a new column
A new column can be added easily to a `DataFrame`. The syntax of adding a new column is similar to adding a list to a `dict`.

In [11]:
import pandas as pd
import numpy as np
df = pd.DataFrame({
    'a': np.random.random(500),
    'b': np.random.random(500)
})
print(df.columns)

df['c'] = np.random.random(500)
print(df.columns)

Index(['a', 'b'], dtype='object')
Index(['a', 'b', 'c'], dtype='object')


New column can also be created as an operation of other existing columns.

In [12]:
import pandas as pd
import numpy as np
df = pd.DataFrame({
    'a': np.random.random(500),
    'b': np.random.random(500)
})
print(df.columns)

df['c'] = df['a'] + df['b']
print(df.columns)
print(df.head())

Index(['a', 'b'], dtype='object')
Index(['a', 'b', 'c'], dtype='object')
          a         b         c
0  0.103171  0.267172  0.370343
1  0.991207  0.501179  1.492385
2  0.150430  0.242248  0.392678
3  0.723826  0.715942  1.439768
4  0.555700  0.491119  1.046819



If we want to create a new column populated with the same value, we can directly assign the scalar value to the column.

In [13]:
import pandas as pd
import numpy as np
df = pd.DataFrame({
    'a': np.random.random(500),
    'b': np.random.random(500)
})
df['c'] = 5
print(df.head())

          a         b  c
0  0.621371  0.180713  5
1  0.453127  0.453354  5
2  0.507031  0.377923  5
3  0.505304  0.024074  5
4  0.196009  0.665105  5


There are also situations where we do not want to modify the DataFrame inplace, but to add a column to create a copy of the current `DataFrame`. This can be achieved with `.assign(...)`. We will pass the column name of the new column as the input argument and the values to be assigned to the column as the values of that input argument. For example, if we were to add column `c` with random values, we would use `df.assign(c = np.random.random(500))`. This creates a copy of the current DataFrame and leave the original `DataFrame` untouched. Modifying the new `DataFrame` will not affect the original `DataFrame` too.

In [14]:
import pandas as pd
import numpy as np
df = pd.DataFrame({
    'a': np.random.random(500),
    'b': np.random.random(500)
})
df2 = df.assign(c=np.random.random(500))
print(df.head())
print(df2.head())

          a         b
0  0.686095  0.388992
1  0.837058  0.561569
2  0.962085  0.423793
3  0.714289  0.773644
4  0.921786  0.055345
          a         b         c
0  0.686095  0.388992  0.109178
1  0.837058  0.561569  0.163869
2  0.962085  0.423793  0.623563
3  0.714289  0.773644  0.540113
4  0.921786  0.055345  0.073329


### 3.4.2.3 Modify the values of a column
The values of a column can be modified by using the same syntax as adding a new column to a `DataFrame`. This applies to `.assign` too. If the new column name provided exists in the `DataFrame`, no column will be created but the existing column is replaced.

In [15]:
import pandas as pd
import numpy as np
df = pd.DataFrame({
    'a': np.random.random(500),
    'b': np.random.random(500)
})
print(df.head())
df['a'] = 5
print(df.head())
df2 = df.assign(a=2)
print(df2.head())

          a         b
0  0.776931  0.973798
1  0.135711  0.749131
2  0.494551  0.465076
3  0.632927  0.342958
4  0.630202  0.160836
   a         b
0  5  0.973798
1  5  0.749131
2  5  0.465076
3  5  0.342958
4  5  0.160836
   a         b
0  2  0.973798
1  2  0.749131
2  2  0.465076
3  2  0.342958
4  2  0.160836


### 3.4.2.4 Insert a column
By default the new columns are added to the end of the `DataFrame`. However we can insert a column based on the position using `.insert` with the syntax of `.insert(IndexToInsertInto, ColumnName, ColumnValues)`.

In [16]:
import pandas as pd
import numpy as np
df = pd.DataFrame({
    'a': np.random.random(500),
    'b': np.random.random(500)
})
print(df.head())
df.insert(1,"c",np.random.random(500))
print(df.head())

          a         b
0  0.181126  0.824040
1  0.857533  0.376251
2  0.090457  0.720562
3  0.081715  0.971133
4  0.456712  0.031960
          a         c         b
0  0.181126  0.343045  0.824040
1  0.857533  0.196935  0.376251
2  0.090457  0.372063  0.720562
3  0.081715  0.116580  0.971133
4  0.456712  0.220407  0.031960


### 3.4.2.5 Delete a column
A column in a `DataFrame` can be deleted using either `del` or `.pop` as how it can be done in a `dict`.

In [17]:
import pandas as pd
import numpy as np
df = pd.DataFrame({
    'a': np.random.random(500),
    'b': np.random.random(500),
    'c': np.random.random(500)
})
print("df")
print(df.head())

del df['a']
print("\ndf without a")
print(df.head())

b = df.pop('b')
print("\ndf without a and b")
print(df.head())

print("\nb only")
print(b.head())

df
          a         b         c
0  0.873564  0.746882  0.152166
1  0.740629  0.629259  0.097803
2  0.902456  0.251619  0.204443
3  0.109453  0.932686  0.449789
4  0.468472  0.595999  0.106902

df without a
          b         c
0  0.746882  0.152166
1  0.629259  0.097803
2  0.251619  0.204443
3  0.932686  0.449789
4  0.595999  0.106902

df without a and b
          c
0  0.152166
1  0.097803
2  0.204443
3  0.449789
4  0.106902

b only
0    0.746882
1    0.629259
2    0.251619
3    0.932686
4    0.595999
Name: b, dtype: float64


## 3.4.3 Indexing and slicing DataFrame
The indexing and slicing methods for a `Series` apply similarly to a `DataFrame`. To summarise, we can use square brackets `[]`, `.loc`, and `.iloc` to access the data in a `DataFrame`.

### 3.4.3.1 Using square brackets []
- A label/column name

Using a square bracket directly with a label or column name allows us to access a column.

In [18]:
import pandas as pd
import numpy as np
df = pd.DataFrame({
    'a': np.random.random(10),
    'b': np.random.random(10)
})
print(df['a'])

0    0.987829
1    0.818976
2    0.185146
3    0.272635
4    0.521882
5    0.224682
6    0.567891
7    0.442619
8    0.123388
9    0.730673
Name: a, dtype: float64


- A list of labels

If we pass a list of labels, we are able to access multiple columns.

In [19]:
import pandas as pd
import numpy as np
df = pd.DataFrame({
    'a': np.random.random(10),
    'b': np.random.random(10),
    'c': np.random.random(10)
})
print(df[['a','c']])

          a         c
0  0.745274  0.984375
1  0.717472  0.320426
2  0.251486  0.513033
3  0.615348  0.554777
4  0.793531  0.855106
5  0.247286  0.410728
6  0.053735  0.141510
7  0.556726  0.277877
8  0.046031  0.320483
9  0.157992  0.151165


- Access specific row and column

The first pair of square brackets `[]` allows us to access a column. After accessing a column, we will get a `Series`. To access a specific object, we can then use the indexing methods for a `Series`.

Note that in this case the indices of the rows are equivalent to the integer position of the row, therefore `.loc` and `.iloc` of the column Series return the same value.

In [20]:
import pandas as pd
import numpy as np
df = pd.DataFrame({
    'a': np.random.random(10),
    'b': np.random.random(10),
    'c': np.random.random(10)
})
print(df['a'][0])
print(df['a'].loc[0])
print(df['a'].iloc[0])

0.6900977276985747
0.6900977276985747
0.6900977276985747


### 3.4.3.2 Using .loc
Just like in a Series, .loc is label-based and .iloc is position-based. As DataFrame is 2-dimensional, we can specify two arguments for .loc, first for the row, and second for the column. If only one argument is specified, it is assumed to be for the row, and all columns are included. Therefore `df.loc[0]` is equivalent to `df.loc[0,:]`.

- `.loc` with a label


In [None]:
import pandas as pd
import numpy as np
df = pd.DataFrame({
    'a': np.random.random(5),
    'b': np.random.random(5),
    'c': np.random.random(5)
})
print("Row 0 of df:")
print(df.loc[0])

print("\nColumn 'a' of df")
print(df.loc[:,'a'])

print("\nRow 0 column 'a' of df")
print(df.loc[0,'a'])

- `.loc` with a list of labels

If we pass a list of labels to either rows or columns, multiple rows or columns will be selected. 

In [21]:
import pandas as pd
import numpy as np
df = pd.DataFrame({
    'a': np.random.random(5),
    'b': np.random.random(5),
    'c': np.random.random(5)
})
print(df.loc[[0,2], ['a','c']])

          a         c
0  0.273602  0.597380
2  0.352606  0.876664


- `.loc` with a slice object

A slice object can also be used as the input to `.loc`. This will behave similar to how it is in `Series`. To recap, a slice object of the labels will include the stop (in the syntax of `start:stop:step`).

In [22]:
import pandas as pd
import numpy as np
df = pd.DataFrame({
    'a': np.random.random(5),
    'b': np.random.random(5),
    'c': np.random.random(5)
})
print(df.loc[0:2, 'a':'b'])

          a         b
0  0.976181  0.231602
1  0.728874  0.461467
2  0.153624  0.243355


### 3.4.3.3 Using a .iloc
`.iloc` is used similar to `.loc` with the exception that `.iloc` is integer position based. 

- `.iloc` with an integer

The first input of `.iloc` is the position of the row, and the second input is the position of the column.

In [23]:
import pandas as pd
import numpy as np
df = pd.DataFrame({
    'a': np.random.random(5),
    'b': np.random.random(5),
    'c': np.random.random(5)
})
print(df)
print("Row 0 column 'b' of df")
print(df.iloc[0,1])

          a         b         c
0  0.132338  0.661030  0.607319
1  0.163062  0.305606  0.496013
2  0.159727  0.884978  0.250282
3  0.087804  0.806166  0.659464
4  0.435611  0.092973  0.823869
Row 0 column 'b' of df
0.6610297025061675


- `.iloc` with a list of integers

Using a list of integers we can access multiple values at once.

In [24]:
import pandas as pd
import numpy as np
df = pd.DataFrame({
    'a': np.random.random(5),
    'b': np.random.random(5),
    'c': np.random.random(5)
})
print(df.iloc[[0,2], [0,2]])

          a         c
0  0.249774  0.281501
2  0.214454  0.314240


- `.iloc` with a slice object

We can also access multiple values when providing the input of a slice object to `.iloc`. Note that when using slice objects with `.iloc`, the stop in `start:stop:step` is not included in the returned value.

In [25]:
import pandas as pd
import numpy as np
df = pd.DataFrame({
    'a': np.random.random(5),
    'b': np.random.random(5),
    'c': np.random.random(5)
})
print(df.loc[0:2, 'a':'b'])
print(df.iloc[0:2, 0:2])

          a         b
0  0.343884  0.764536
1  0.894613  0.516874
2  0.648928  0.512561
          a         b
0  0.343884  0.764536
1  0.894613  0.516874


## 3.4.4 Combining multiple DataFrame
There are times when we need to combine multiple `DataFrame`. `pandas` provides the `pandas.concat` to perform this operation.

### 3.4.4.1 Join two DataFrame vertically
If we have two DataFrame that we want to combine them vertically, we can pass the `DataFrame` in a list to `pandas.concat`.



In [26]:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
    'a': np.random.random(3),
    'b': np.random.random(3),
}, index=[0,1,2])

df2 = pd.DataFrame({
    'a': np.random.random(3),
    'b': np.random.random(3),
}, index=[3,4,5])

df3 = pd.concat([df1, df2])
print(df3)

          a         b
0  0.014359  0.576625
1  0.392585  0.623005
2  0.564056  0.727451
3  0.269994  0.437383
4  0.962171  0.847473
5  0.397421  0.051164


By default, `pandas.concat` combines the `DataFrame` vertically, i.e. aligning according to column labels. The row label will be duplicated if there are rows with the same label in the individual `DataFrame`. To solve this, we could use the input argument keys for `pandas.concat`. The length of keys is expected to be the same as the number of `DataFrame` to be combined.

This will create a MultiIndexed `DataFrame`. The row can be accessed by specifying a tuple as the row label, for e.g. `df3.loc[('x',0)]`. 

In [27]:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
    'a': np.random.random(3),
    'b': np.random.random(3),
}, index=[0,1,2])

df2 = pd.DataFrame({
    'a': np.random.random(3),
    'b': np.random.random(3),
}, index=[2,4,5])

df3 = pd.concat([df1, df2], keys=['x','y'])
print(df3)
print()
print(df3.loc[('x',0)])

            a         b
x 0  0.343425  0.201986
  1  0.062913  0.862907
  2  0.211079  0.975835
y 2  0.997838  0.316083
  4  0.948251  0.799546
  5  0.033855  0.757045

a    0.343425
b    0.201986
Name: (x, 0), dtype: float64


If the columns in one `DataFrame` is difference from the other `DataFrame`, the default join method of `pandas.concat` is outer, i.e. all columns will be used, and the empty cells will be populated with `NaN`. 

In [28]:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
    'a': np.random.random(3),
    'b': np.random.random(3),
}, index=[0,1,2])

df2 = pd.DataFrame({
    'b': np.random.random(3),
    'c': np.random.random(3),
}, index=[3,4,5])

df3 = pd.concat([df1, df2])
print(df3)

          a         b         c
0  0.982702  0.272621       NaN
1  0.735202  0.957563       NaN
2  0.260090  0.281759       NaN
3       NaN  0.419164  0.001033
4       NaN  0.585137  0.288881
5       NaN  0.754931  0.437966


We can specify `join="inner"` to `pandas.concat` so the resultant `DataFrame` only contains the columns that both `DataFrame` have.

In [29]:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
    'a': np.random.random(3),
    'b': np.random.random(3),
}, index=[0,1,2])

df2 = pd.DataFrame({
    'b': np.random.random(3),
    'c': np.random.random(3),
}, index=[3,4,5])

df3 = pd.concat([df1, df2], join="inner")
print(df3)

          b
0  0.711002
1  0.890062
2  0.547762
3  0.978967
4  0.586402
5  0.530345


### 3.4.4.2 Join two DataFrame horizontally
To join multiple `DataFrame` horizontally, i.e. aligning according to row labels, we need to specify `axis=1` for `pandas.concat`. `axis=0` is the default, which allows us to join the `DataFrame` vertically.

In [30]:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
    'a': np.random.random(3),
    'b': np.random.random(3),
}, index=[0,1,2])

df2 = pd.DataFrame({
    'b': np.random.random(3),
    'c': np.random.random(3),
}, index=[0,1,2])

df3 = pd.concat([df1, df2], axis=1)
print(df3)

          a         b         b         c
0  0.808630  0.303642  0.193114  0.558099
1  0.689054  0.285900  0.437384  0.878497
2  0.266834  0.131028  0.736600  0.923503


If keys are specified, it will be assigned to the columns.



In [31]:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
    'a': np.random.random(3),
    'b': np.random.random(3),
}, index=[0,1,2])

df2 = pd.DataFrame({
    'b': np.random.random(3),
    'c': np.random.random(3),
}, index=[0,1,2])

df3 = pd.concat([df1, df2], axis=1, keys=['x','y'])
print(df3)

          x                   y          
          a         b         b         c
0  0.537523  0.258672  0.108395  0.243456
1  0.634627  0.625525  0.045429  0.414441
2  0.270642  0.614936  0.402201  0.033993


If there are row labels that do not exist in both the `DataFrame`, we can use join to determine the behaviour of joining. 

In [32]:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
    'a': np.random.random(3),
    'b': np.random.random(3),
}, index=[0,1,2])

df2 = pd.DataFrame({
    'c': np.random.random(3),
    'd': np.random.random(3),
}, index=[1,2,3])

df3 = pd.concat([df1, df2], axis=1, join='outer')
print(df3)
df4 = pd.concat([df1, df2], axis=1, join='inner')
print(df4)

          a         b         c         d
0  0.880935  0.298531       NaN       NaN
1  0.490882  0.290139  0.320837  0.294359
2  0.118602  0.751244  0.970755  0.532803
3       NaN       NaN  0.208567  0.047179
          a         b         c         d
1  0.490882  0.290139  0.320837  0.294359
2  0.118602  0.751244  0.970755  0.532803


`join='outer'` uses all row labels, whereas `join='inner'` uses only the row labels exist in both `DataFrame`.



### 3.4.4.3 DataFrame.append
`DataFrame.append` creates a vertical combine of multiple `DataFrame`. `DataFrame.append` is initiated by a `DataFrame` object to append another `DataFrame` object. 

In [None]:
import pandas as pd
import numpy as np

df1 = pd.DataFrame({
    'a': np.random.random(3),
    'b': np.random.random(3),
}, index=[0,1,2])

df2 = pd.DataFrame({
    'c': np.random.random(3),
    'd': np.random.random(3),
}, index=[1,2,3])

df3 = pd.DataFrame({
    'a': np.random.random(3),
    'd': np.random.random(3),
}, index=[0,1,2])

df4 = df1.append([df2,df3])
print(df4)

We can use `ignore_index=True` to remove the original row indices of the `DataFrame` and reassign the row indices in increasing order.

In [None]:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
    'a': np.random.random(3),
    'b': np.random.random(3),
}, index=[0,1,2])
df2 = pd.DataFrame({
    'c': np.random.random(3),
    'd': np.random.random(3),
}, index=[1,2,3])
df3 = pd.DataFrame({
    'a': np.random.random(3),
    'd': np.random.random(3),
}, index=[0,1,2])

df4 = df1.append([df2,df3], ignore_index=True)
print(df4)

## 3.4.5 Operations on Series and DataFrame
### 3.4.5.1 Arithmetic and comparison

## 3.4.6 Looping with DataFrame
pandas provides useful methods in `DataFrame` for us to perform looping on the `DataFrame`. The methods include `.items`, `.iteritems`, `.iterrows`, and `.itertuples`. 


### 3.4.6.1 .items and .iteritems