## How to add columns to a Pandas DataFrame, the basics
This notebook will give some simple examples of adding columns to DataFrames and explain how to deal with some of the more complex scenarios raised.

In [1]:
import pandas as pd
import numpy as np

Let's start with a very simple DataFrame. This DataFrame has 4 columns of random floating point values. The index of this DataFrame will also be the default, a RangeIndex of the size of the DataFrame.

In [2]:
df = pd.DataFrame(np.random.rand(6,4), columns=['a', 'b', 'c', 'd'])

In [3]:
display(df)

Unnamed: 0,a,b,c,d
0,0.01607,0.927096,0.927195,0.331235
1,0.179215,0.800215,0.702414,0.43101
2,0.986067,0.25511,0.592221,0.785731
3,0.224867,0.228913,0.884387,0.196844
4,0.924711,0.735926,0.809921,0.807838
5,0.50291,0.098659,0.343513,0.132948


In [4]:
df.index

RangeIndex(start=0, stop=6, step=1)

### Simple cases
Let's start with the simplest way to add a column, such as a single value. This will be applied to all rows in the DataFrame.

In [5]:
df['e'] = .5

In [6]:
df

Unnamed: 0,a,b,c,d,e
0,0.01607,0.927096,0.927195,0.331235,0.5
1,0.179215,0.800215,0.702414,0.43101,0.5
2,0.986067,0.25511,0.592221,0.785731,0.5
3,0.224867,0.228913,0.884387,0.196844,0.5
4,0.924711,0.735926,0.809921,0.807838,0.5
5,0.50291,0.098659,0.343513,0.132948,0.5


Now, under the hood, pandas is making life easier for you and taking your scalar value (the 0.5) and turning it into an array that matches the index (in this case a ```RangeIndex```) of your DataFrame.

This is sort of the equivalent:

In [7]:
df['e_prime'] = pd.Series(.5, index=pd.RangeIndex(6))

In [8]:
df

Unnamed: 0,a,b,c,d,e,e_prime
0,0.01607,0.927096,0.927195,0.331235,0.5,0.5
1,0.179215,0.800215,0.702414,0.43101,0.5,0.5
2,0.986067,0.25511,0.592221,0.785731,0.5,0.5
3,0.224867,0.228913,0.884387,0.196844,0.5,0.5
4,0.924711,0.735926,0.809921,0.807838,0.5,0.5
5,0.50291,0.098659,0.343513,0.132948,0.5,0.5


You can also pass in an array yourself without an index, but it must match the dimensions of your DataFrame.

In [9]:
df['f'] = np.random.rand(6,1)

In [10]:
df

Unnamed: 0,a,b,c,d,e,e_prime,f
0,0.01607,0.927096,0.927195,0.331235,0.5,0.5,0.004665
1,0.179215,0.800215,0.702414,0.43101,0.5,0.5,0.745241
2,0.986067,0.25511,0.592221,0.785731,0.5,0.5,0.129182
3,0.224867,0.228913,0.884387,0.196844,0.5,0.5,0.799768
4,0.924711,0.735926,0.809921,0.807838,0.5,0.5,0.264247
5,0.50291,0.098659,0.343513,0.132948,0.5,0.5,0.519458


If you try to do this with a non-matching shape, it won't work. This is because the DataFrame won't know where to put the values.

In [11]:
try:
    df['g'] = np.random.rand(5,1)
except Exception as ex:
    print(ex)

Length of values (5) does not match length of index (6)


### Non-matching indices
Now what happens when the data you want to add doesn't match your current DataFrame? Specifically, what if the index is different on the right hand side?

In [12]:
df['g'] = pd.Series(np.random.rand(50), index=pd.RangeIndex(2,52))

In [13]:
df

Unnamed: 0,a,b,c,d,e,e_prime,f,g
0,0.01607,0.927096,0.927195,0.331235,0.5,0.5,0.004665,
1,0.179215,0.800215,0.702414,0.43101,0.5,0.5,0.745241,
2,0.986067,0.25511,0.592221,0.785731,0.5,0.5,0.129182,0.900162
3,0.224867,0.228913,0.884387,0.196844,0.5,0.5,0.799768,0.144825
4,0.924711,0.735926,0.809921,0.807838,0.5,0.5,0.264247,0.884787
5,0.50291,0.098659,0.343513,0.132948,0.5,0.5,0.519458,0.019577


So what happened here? Our column ```g``` only has values at rows 2 through 5, even though we assigned a series with 50 values. Well, these were the rows that matched our index. For the rows that didn't have values, a ```NaN``` was inserted.

Another way to think of this is that we could use the ```loc``` method to select the rows we wanted to update, but unless we set the index on the right hand side, we still need to align with the shape of the DataFrame.

In [14]:
df.loc[2:5, 'g_prime'] = np.random.rand(4)

In [15]:
df

Unnamed: 0,a,b,c,d,e,e_prime,f,g,g_prime
0,0.01607,0.927096,0.927195,0.331235,0.5,0.5,0.004665,,
1,0.179215,0.800215,0.702414,0.43101,0.5,0.5,0.745241,,
2,0.986067,0.25511,0.592221,0.785731,0.5,0.5,0.129182,0.900162,0.865911
3,0.224867,0.228913,0.884387,0.196844,0.5,0.5,0.799768,0.144825,0.217103
4,0.924711,0.735926,0.809921,0.807838,0.5,0.5,0.264247,0.884787,0.755596
5,0.50291,0.098659,0.343513,0.132948,0.5,0.5,0.519458,0.019577,0.207294


The main lesson here is to realize that assigning a column to a DataFrame can lead to some surprising results if you don't realize whether what you are assigning has a matching index or not.