In [14]:
import pandas as pd
import numpy as np

## <u>Problem Statement</u>

 Given a dataframe with 2 columns (**col1** and **col2**), create logic that adds a new column (**col3**) to the dataframe which contains values from **col1** and not **col2**.

### First, lets create lists x and y and add them to a dataframe.

In [106]:
x = [2,4,6,8,10]
y = [8,10,12,14,16]

df = pd.DataFrame()

df['col1'] = x
df['col2'] = y

## Now, let's look at the logic to achieve the task.

1. Create an empty list called new_col.
2. concatenate x and y into one list called xy_concat.

3. Start a for loop that iterates the lenght of xy_concat and appends the current value to new_col if the value is present in only x.


4. Find the absolute difference between the length of new_col and the number of rows in df

5. If the abs difference is less than number of rows in df, iterate the range of the abs difference and append 'None' at each iteration

6. If the length of new_col > the number of rows in df new_col = new_col[:-absdifference]
7. Add new_col to df now that the length of new_col is the same as the number of rows of df

### Steps 1 & 2

In [107]:
new_col = [] # create empty list to store the new column's values
xy_concat = x + y # concatenate lists x and y (will be used in for loop later)

### Step 3

In [108]:
for i in xy_concat:# iterate concatenated list
    if ((i in x) & (i not in y)): # establish criteria used to append values to new column list
        new_col.append(i)

### Three of col1's values do not intersect with the values of col2. However, to add the new column to the dataframe, its length must match the number of rows of the dataframe. 

In [109]:
print(f'Length of new column is {len(new_col)}.' + ' ' + f'Number of rows in dataframe is {df.shape[0]}.')

Length of new column is 3. Number of rows in dataframe is 5.


In [110]:
#note the error when executing the following
df['col3'] = new_col

ValueError: Length of values (3) does not match length of index (5)

### Next, we need to create logic that addresses for scenarios when the new column's length does not match the number of rows in the dataframe.

### First, let's calculate the absolute difference between the number of rows in the dataframe and the current length of the new column.

### Step 4

In [111]:
rowlen = df.shape[0] #get number of rows in df
        
diff = abs(len(new_col) - rowlen) #calculate the absolute difference

### Let's now create the logic which essentially:
1. (If current length of new column list is less than the number of rows in the dataframe) Adds a 'None' value to the new column list until it's length equals the number of rows in the dataframe OR
2. (If current length of new column exceeds number of rows) Removes excess values from new column list so that it's length matches the number of rows in the dataframe.

### Step 5

In [112]:
if diff < rowlen: #scenario 1
    for i in range(diff):
        new_col.append(None)
        
elif len(new_col) > rownlen: # scenario 2
    new_col = new_col[:-diff]
    
else:
    print("New column length and number of rows in dataframe are equivalent")
    
    
df['col3'] = new_col

In [113]:
df

Unnamed: 0,col1,col2,col3
0,2,8,2.0
1,4,10,4.0
2,6,12,6.0
3,8,14,
4,10,16,


### Now, let's assume various people want to contribute their own column to the dataframe. As owner of the dataframe, I have decided to only accept column values that appear in col1 of the dataframe. 

### How can we create a function that checks new columns and does this for us?

In [144]:
def add_column (arg1, arg2):
    '''
    arg1: list of values that will be added as rows for the new column
    arg2: my dataframe that the new column will be added to
    '''
    
    try:
        if arg2.shape[1] == 0:
            print("Empty DataFrame (argument 2)")
        
        elif type(arg1) != list:
            print("Please input a list for argument 1")
            
        else:
    
            new_col = []
            rowlen = arg2.shape[0]

            
            for i in arg1:
                if i in list(arg2.iloc[:,0]):
                    new_col.append(i)
                    
            diff = abs(len(new_col) - rowlen)
            
            if diff < rowlen:
                for i in range(diff):
                    new_col.append(None)

            elif len(new_col) > rownlen:
                new_col = new_col[:-diff]
                
            arg2['New Column'] = new_col
            
            return(arg2)
            
    except Exception as e:
        print(e)      

### Let's create some example data for the persons interested in adding their columns to my dataframe and place it in a dataframe.

In [145]:
person = ['a','b','c','d','e']
values = [[1,2,3,4,5],[12,6,8,10,4],[20,12,15,3,2],[6,8,10,12,4]]
input_df = pd.DataFrame(np.array(values),columns = person)

In [146]:
#view client dataframe
input_df

Unnamed: 0,a,b,c,d,e
0,1,2,3,4,5
1,12,6,8,10,4
2,20,12,15,3,2
3,6,8,10,12,4


### Let's also create our dataframe.

In [147]:
df = pd.DataFrame()
a = [1,2,3,4,6,15]
b = [3,4,5,4,9,12]

df['col1'] = a
df['col2'] = b

In [148]:
#view our dataframe
df

Unnamed: 0,col1,col2
0,1,3
1,2,4
2,3,5
3,4,4
4,6,9
5,15,12


In [152]:
#Let's use a for loop to feed the client's data to our function and display what the new column would look like for each person
for col in input_df.columns:
    person = col
    result = add_column(list(input_df[col]),df)
    print(result)

   col1  col2  New Column
0     1     3         1.0
1     2     4         6.0
2     3     5         NaN
3     4     4         NaN
4     6     9         NaN
5    15    12         NaN
   col1  col2  New Column
0     1     3         2.0
1     2     4         6.0
2     3     5         NaN
3     4     4         NaN
4     6     9         NaN
5    15    12         NaN
   col1  col2  New Column
0     1     3         3.0
1     2     4        15.0
2     3     5         NaN
3     4     4         NaN
4     6     9         NaN
5    15    12         NaN
   col1  col2  New Column
0     1     3         4.0
1     2     4         3.0
2     3     5         NaN
3     4     4         NaN
4     6     9         NaN
5    15    12         NaN
   col1  col2  New Column
0     1     3         4.0
1     2     4         2.0
2     3     5         4.0
3     4     4         NaN
4     6     9         NaN
5    15    12         NaN
