# 450+ Practice Questions From Pandas, NumPy, and SQL.

Author: **Avi Chawla**

LinkedIn: https://www.linkedin.com/in/avi-chawla/

Read my newsletter here: https://www.blog.dailydoseofds.com/

## Introduction

This notebook has been created for you to practice three of the most common tools used in building any machine learning or data science applications, i.e., Pandas, NumPy, and SQL!

The practice questions provided will serve as a great resource for those who are looking to familiarize themselves with some of the most common functions used in these tools. 

Appropriate descriptions have been provided for all the questions in this entire exercise which will help you navigate through this exercise easily. If a dataset is to be loaded in the python environment, that has also been provided for you. You can find it on the right panel in the Files section. Do NOT delete any of the files/folders listed there. 

The whole exercise has been divided into nine separate notebooks. Below are the links to all the other notebooks for you to jump from one notebook to another:

- **Pandas**

1. Pandas Notebook 1: [Link](https://deepnote.com/workspace/avi-chawla-695b-aee6f4ef-2d50-4fb6-9ef2-20ee1022995a/project/Pandas-Notebook-1-d693ac55-6455-40cf-ae34-867c6a02014e/notebook/6449493c84734151b11f4b6871f045d2#99f75bf946d04b9bb1daa9e14c2cfea9) 
2. Pandas Notebook 2: [Link](https://deepnote.com/workspace/avi-chawla-695b-aee6f4ef-2d50-4fb6-9ef2-20ee1022995a/project/Pandas-Notebook-employee-dataset-7e3b6755-5d4b-464b-9b75-9c84667ae3bd/notebook/notebook-0de50f3b70834570b13b651dde44c491) 

3. Pandas Notebook 3: [Link](https://deepnote.com/workspace/avi-chawla-695b-aee6f4ef-2d50-4fb6-9ef2-20ee1022995a/project/Pandas-Notebook-employee-part-2-adc5a3ee-5f61-4725-8e46-ccb07899acfc/notebook/notebook-78e3faf901da4f14881ef24e41c80bf6) 

4. Pandas Notebook 4: [Link](https://deepnote.com/workspace/avi-chawla-695b-aee6f4ef-2d50-4fb6-9ef2-20ee1022995a/project/Pandas-after-employee-f84e02a1-fb6a-428e-af90-8dd99855749a/notebook/notebook-134ac20c38ef45e5a4432abd638e6c2e) **(This Notebook)** 

- **NumPy**

1. NumPy Notebook 1: [Link](https://deepnote.com/workspace/avi-chawla-695b-aee6f4ef-2d50-4fb6-9ef2-20ee1022995a/project/Numpy-part-1-9b9979f2-b708-4292-b466-3d0157564c91/notebook/notebook-07232b5ebafe49b198a9c55c553414f1)

2. NumPy Notebook 2: [Link](https://deepnote.com/workspace/avi-chawla-695b-aee6f4ef-2d50-4fb6-9ef2-20ee1022995a/project/NumPy-Notebook-2-4456411e-2ddd-426d-8027-4881080027db/notebook/notebook-988aba30f33a45a3861adc4f6a6f338c)

3. NumPy Notebook 3: [Link](https://deepnote.com/workspace/avi-chawla-695b-aee6f4ef-2d50-4fb6-9ef2-20ee1022995a/project/NumPy-Notebook-3-e6587114-b580-4249-b599-540de859e603/notebook/notebook-bb52759ea3f542eaaed9958b5df9c34b)

- **SQL**

1. SQL Notebook 1: [Link](https://deepnote.com/workspace/avi-chawla-695b-aee6f4ef-2d50-4fb6-9ef2-20ee1022995a/project/SQL-Notebook-1-eac9d782-a9b1-4e84-a1f9-af14080a6121/notebook/notebook-697f04297c664d02901db0f85431512e)

2. SQL Notebook 2: [Link](https://deepnote.com/workspace/avi-chawla-695b-aee6f4ef-2d50-4fb6-9ef2-20ee1022995a/project/SQL-Notebook-2-1914b214-be03-44a1-be63-ad99e98be639/notebook/notebook-e549236b988c42a5b53126a7ebb98127)

## How to use this notebook?

At the top right corner, you will find a Duplicate button. This will allow you to create a unique notebook for your own practice and write solutions to the question listed in this notebook.  

If you face any issues or have any feedback, feel free to reach out to me (Avi Chawla) either on Linkedin: https://www.linkedin.com/in/avi-chawla/ or write an email to avi@dsscholar[dot]com. 

Let's begin 🚀!

# Pandas Notebook 4

### 151. Sort DataFrame based on another list

In [4]:
import pandas as pd
import numpy as np

df = pd.DataFrame([["A", 1], ["B", 2],
                   ["C", 3], ["D", 4]], columns=["col1", "col2"])

sort_list = ["C", "A", "D", "B"]

# print(df.head())
## start your code below
df["col1"] = pd.Categorical(df.col1,categories=sort_list)
new_df = df.sort_values("col1")
## end your code here

print(new_df)

  col1  col2
2    C     3
0    A     1
3    D     4
1    B     2


### 152. Insert a column at a specific location in a DataFrame

In [2]:
df = pd.DataFrame([["A", 1], ["B", 2],
                   ["C", 3], ["D", 4]], 
                  columns=["col_A", "col_B"])

new_column = ["P", "Q", "R", "S"]
insert_position = 1 ## between col_A and col_B

## start your code below
df.insert(insert_position,"new_column",value=new_column)
new_df = df
## end your code here

print(new_df)

  col_A new_column  col_B
0     A          P      1
1     B          Q      2
2     C          R      3
3     D          S      4


### 153. Select columns based on the column's Data Type

In [3]:
df = pd.DataFrame([["A", 1, True], ["B", 2, False],
                   ["C", 3, False], ["D", 4, True]], 
                  columns=["col_A", "col_B", "col_C"])

dt_type = "bool"

## start your code below
new_df = df.select_dtypes(include=[dt_type])
## end your code here

print(new_df)

   col_C
0   True
1  False
2  False
3   True


### 154. Count the number of Non-NaN cells for each column

In [4]:
import numpy as np
df = pd.DataFrame([["A", np.NaN], [np.NaN, 2],
                   ["C", np.NaN], ["D", 4]], 
                  columns=["col_A", "col_B"])

## start your code below
new_df = df.isnull().sum()
## end your code here

print(new_df)

col_A    1
col_B    2
dtype: int64


### 155. Split DataFrame into equal parts

In [5]:
df = pd.DataFrame([["A", 1], ["B", 2],
                   ["C", 3], ["D", 4]], 
                  columns=["col_A", "col_B"])

parts = 2

## start your code below
new_df1, new_df2 = np.array_split(df,parts)
## end your code here

print(new_df1)

  col_A  col_B
0     A      1
1     B      2


In [6]:
print(new_df2)

  col_A  col_B
2     C      3
3     D      4


### 156. Reverse DataFrame row-wise

In [7]:
df = pd.DataFrame([["A", 1], ["B", 2],
                   ["C", 3], ["D", 4]], 
                  columns=["col_A", "col_B"])

## start your code below
row_reverse = df.iloc[::-1]
## end your code here

row_reverse

Unnamed: 0,col_A,col_B
3,D,4
2,C,3
1,B,2
0,A,1


### 157. Reverse DataFrame column-wise

In [8]:
df = pd.DataFrame([["A", 1], ["B", 2],
                   ["C", 3], ["D", 4]], 
                  columns=["col_A", "col_B"])

## start your code below
col_reverse = df.loc[::,::-1]
## end your code here
col_reverse

Unnamed: 0,col_B,col_A
0,1,A
1,2,B
2,3,C
3,4,D


### 158. Insert a row at an arbitrary position

In [9]:
df = pd.DataFrame([["A", 1], ["B", 2],
                   ["C", 3], ["D", 4]], 
                  columns=["col_A", "col_B"])

insert_pos = 1
insert_row = ["P", 5]

## start your code below
a = pd.DataFrame([insert_row],columns=df.columns)
new_df = pd.concat([df.iloc[:insert_pos],a,df.iloc[insert_pos:]]).reset_index(drop=True)
## end your code here

new_df

Unnamed: 0,col_A,col_B
0,A,1
1,P,5
2,B,2
3,C,3
4,D,4


### 159. Apply function to every cell of DataFrame

In [10]:
df = pd.DataFrame([[1, 5], [2, 6],
                   [3, 7], [4, 8]], 
                  columns=["col_A", "col_B"])

def func(num):
    return num + 1

## start your code below
new_df = df.apply(lambda x : func(x))
## end your code here

print(new_df)

   col_A  col_B
0      2      6
1      3      7
2      4      8
3      5      9


### 160. The cumulative sum of a column in DataFrame

In [11]:
df = pd.DataFrame([["A", 1], ["B", 2],
                   ["C", 3], ["D", 4]], columns=["col_A", "col_B"])

## start your code below
new_df = df.col_B.cumsum()
## end your code here

new_df

0     1
1     3
2     6
3    10
Name: col_B, dtype: int64

### 161. Uniquely number individual group in Pandas GroupBy

In [19]:
df = pd.DataFrame([["A", 1], ["B", 2],
                   ["C", 3], ["D", 4]], columns=["col_A", "col_B"])

## start your code below
new_df = df.copy()
new_df["Unique_ID"] = df["col_A"].astype("category").cat.codes
## end your code here
new_df

Unnamed: 0,col_A,col_B,Unique_ID
0,A,1,0
1,B,2,1
2,C,3,2
3,D,4,3


### 162. Check if column has NaN values

In [15]:
df = pd.DataFrame([["A", np.NaN], ["A", 2], ["C", np.NaN], ["D", 4]], 
                  columns=["col_A", "col_B"])

## start your code below
col_A_check = df["col_A"].isnull().any() 
col_B_check = df["col_B"].isnull().any()
## end your code here

In [16]:
col_A_check, col_B_check

(False, True)

### 163. Append a list as a row to a DataFrame

In [20]:
df = pd.DataFrame([["A", 1], ["B", 2],
                   ["C", 3], ["D", 4]], 
                  columns=["col_A", "col_B"])

new_row = ["E", 5]

## start your code below
df.loc[len(df)] = new_row
## end your code here

print(df)

  col_A  col_B
0     A      1
1     B      2
2     C      3
3     D      4
4     E      5


### 164. Identify the source of each row in Pandas Merge

In [23]:
df1 = pd.DataFrame([["A", 1], ["B", 2]], 
                  columns=["col_A", "col_B"])

df2 = pd.DataFrame([["A", 3], ["C", 4]], 
                  columns=["col_A", "col_C"])

## start your code below
new_df = df1.merge(df2, on="col_A", how="outer", indicator=True)
## end your code here

print(new_df)

  col_A  col_B  col_C      _merge
0     A    1.0    3.0        both
1     B    2.0    NaN   left_only
2     C    NaN    4.0  right_only


### 165. Filter n-largest values from a DataFrame

In [22]:
df = pd.DataFrame([["A", 200], ["B", 400],
                   ["C", 100], ["D", 300]], 
                  columns=["col_A", "col_B"])

k = 2

## start your code below
largest_k = df.nlargest(k,columns="col_B")
## end your code here

largest_k

Unnamed: 0,col_A,col_B
1,B,400
3,D,300


### 166. Filter n-smallest values from a DataFrame

In [25]:
df = pd.DataFrame([["A", 200], ["B", 400],
                   ["C", 100], ["D", 300]], 
                  columns=["col_A", "col_B"])

k = 2

## start your code below
smallest_k = df.nsmallest(k,columns="col_B")
## end your code here

smallest_k

Unnamed: 0,col_A,col_B
2,C,100
0,A,200


### 167. Map categorical data to unique integral values

In [31]:
df = pd.DataFrame([["A", 1], ["B", 2],
                   ["A", 3], ["D", 4]], 
                  columns=["col_A", "col_B"])

## start your code below
df["Category"] = df["col_A"].astype("category").cat.codes
## end your code here

print(df)

  col_A  col_B  Category
0     A      1         0
1     B      2         1
2     A      3         0
3     D      4         2


### 168. Add prefix to every column name

In [32]:
df = pd.DataFrame([["A", 1], ["B", 2],
                   ["C", 3], ["D", 4]], 
                  columns=["col_A", "col_B"])

## start your code below
new_df = df.add_prefix("sh",axis=1)
## end your code here

print(new_df)

  shcol_A  shcol_B
0       A        1
1       B        2
2       C        3
3       D        4


### 169. Delete the rows that have NaN values

In [33]:
df = pd.DataFrame([["A", np.NaN], ["B", 2],
                   ["C", np.NaN], ["D", 4]], 
                  columns=["col_A", "col_B"])

## start your code below
new_df = df.dropna(axis=0)
## end your code here

print(new_df)

  col_A  col_B
1     B    2.0
3     D    4.0


### 170. Fill NaN values with 1

In [34]:
df = pd.DataFrame([["A", np.NaN], ["B", 2],
                   ["C", np.NaN], ["D", 4]], 
                  columns=["col_A", "col_B"])

## start your code below
new_df = df.fillna(1)
## end your code here

print(new_df)

  col_A  col_B
0     A    1.0
1     B    2.0
2     C    1.0
3     D    4.0


### 171. Fill NaN values with column mean

In [12]:
df = pd.DataFrame([["A", np.NaN], ["B", 2],
                   ["C", np.NaN], ["D", 4]], 
                  columns=["col_A", "col_B"])

## start your code below
new_df = df.copy()
new_df["col_B"] = df["col_B"].fillna(df["col_B"].mean())
## end your code here

print(new_df)

  col_A  col_B
0     A    3.0
1     B    2.0
2     C    3.0
3     D    4.0


### 172. Fill NaN values with column mode

In [18]:
df = pd.DataFrame([["A", 1], [np.NaN, 2],
                   ["A", 3], [np.NaN, 4], 
                   ["B", 5], ["C", 6]], 
                  columns=["col_A", "col_B"])

## start your code below
new_df = df.copy()
new_df["col_A"] = df["col_A"].fillna(df["col_A"].mode()[0]) 
## end your code here

print(new_df)

  col_A  col_B
0     A      1
1     A      2
2     A      3
3     A      4
4     B      5
5     C      6


### 173. Swap two rows of a dataframe

In [25]:
df = pd.DataFrame([["A", 1], ["B", 2],
                   ["C", 3], ["D", 4]], 
                  columns=["col_A", "col_B"])

# swap second and last row
row_1, row_2 = 1, 3 

## start your code below
new_df = df.copy()
new_df.iloc[row_1], new_df.iloc[row_2] = df.iloc[row_2].copy(), df.iloc[row_1].copy()
## end your code here

print(new_df)

  col_A  col_B
0     A      1
1     D      4
2     C      3
3     B      2


### 174. Create a column "col4" that contains the 2nd largest value in each row

In [43]:
df = pd.DataFrame([[4, 1, 5], 
                   [5, 2, 9],
                   [2, 9, 3], 
                   [8, 5, 4]], 
                  columns=["col_A", "col_B", "col_C"])


## start your code below
new_df = df.copy()
new_df["col4"] = df.apply(lambda x: sorted(x)[-2],axis=1)
## end your code here

print(new_df)

   col_A  col_B  col_C  col4
0      4      1      5     4
1      5      2      9     5
2      2      9      3     3
3      8      5      4     5


### 175. Replace the values on the main diagonal of the dataframe with 0

In [45]:
df = pd.DataFrame([[4, 1, 5, 10], [5, 2, 9, 4],
                   [2, 9, 3, 1],  [1, 7, 3,10]], 
                  columns=["col_A", "col_B", "col_C", "col_D"])


## start your code below
new_df = df.copy()
np.fill_diagonal(new_df.values,0)
## end your code here

print(new_df)

   col_A  col_B  col_C  col_D
0      0      1      5     10
1      5      0      9      4
2      2      9      0      1
3      1      7      3      0


### 176. Get the Group "A" of the dataframe by first grouping the dataframe and then using the group key. 

In [50]:
df = pd.DataFrame([["A", 1], ["B", 2],
                   ["A", 3], ["D", 4]], 
                  columns=["col_A", "col_B"])


## start your code below
new_df = df.groupby("col_A").get_group("A")
## end your code here

print(new_df)

  col_A  col_B
0     A      1
2     A      3


### 177. Fetch the rows where the value in "col_C" does not belong to "col_B". 

In [53]:
df = pd.DataFrame([[4, 1, 5], 
                   [5, 2, 9],
                   [2, 9, 3], 
                   [8, 5, 4]], 
                  columns=["col_A", "col_B", "col_C"])


## start your code below
v = df["col_C"].unique()
new_df = df[~df["col_B"].isin(v)]
## end your code here

print(new_df)

   col_A  col_B  col_C
0      4      1      5
1      5      2      9


### 178. Get the rows where the value of "col_A" is equal to "col_B".

In [54]:
df = pd.DataFrame([[4, 1, 5], 
                   [5, 5, 9],
                   [2, 9, 3], 
                   [8, 5, 8]], 
                  columns=["col_A", "col_B", "col_C"])


## start your code below
new_df = df[df["col_A"]==df["col_B"]]
## end your code here

print(new_df)

   col_A  col_B  col_C
1      5      5      9


### 179. Get the rows where (the value of "col_A" is equal to "col_B") OR (the value of "col_A" is equal to "col_C").

In [55]:
df = pd.DataFrame([[4, 1, 5], 
                   [5, 5, 9],
                   [2, 9, 3], 
                   [8, 5, 8]], 
                  columns=["col_A", "col_B", "col_C"])


## start your code below
new_df = df[(df["col_A"]==df["col_B"]) | (df["col_A"]==df["col_C"])]
## end your code here

print(new_df)

   col_A  col_B  col_C
1      5      5      9
3      8      5      8


### 180. Sort the Data on col_A and col_B

col_A -> Ascending
col_B -> Ascending

In [57]:
df = pd.DataFrame([[4, 1, 5], 
                   [5, 5, 9],
                   [2, 9, 3], 
                   [8, 5, 8]], 
                  columns=["col_A", "col_B", "col_C"])


## start your code below
new_df = df.sort_values(by=["col_A","col_B"])
## end your code here

print(new_df)

   col_A  col_B  col_C
2      2      9      3
0      4      1      5
1      5      5      9
3      8      5      8


### 181. Sort the Data on col_A and col_B

col_A -> Ascending
col_B -> Descending

In [58]:
df = pd.DataFrame([[4, 1, 5], 
                   [5, 5, 9],
                   [2, 9, 3], 
                   [8, 5, 8]], 
                  columns=["col_A", "col_B", "col_C"])


## start your code below
new_df = df.sort_values(by="col_A").sort_values(by="col_B",ascending=False)
## end your code here

print(new_df)

   col_A  col_B  col_C
2      2      9      3
1      5      5      9
3      8      5      8
0      4      1      5


### 182. Get the mean of every column

In [59]:
df = pd.DataFrame([[4, 1, 5], 
                   [5, 5, 9],
                   [2, 9, 3], 
                   [8, 5, 8]], 
                  columns=["col_A", "col_B", "col_C"])


## start your code below
mean = df.mean()
## end your code here

print(mean)

col_A    4.75
col_B    5.00
col_C    6.25
dtype: float64


### 182. Get the mean of every row

In [60]:
df = pd.DataFrame([[4, 1, 5], 
                   [5, 5, 9],
                   [2, 9, 3], 
                   [8, 5, 8]], 
                  columns=["col_A", "col_B", "col_C"])


## start your code below
mean = df.mean(axis=1)
## end your code here

print(mean)

0    3.333333
1    6.333333
2    4.666667
3    7.000000
dtype: float64


### 183. Concatentate the two DataFrames row-wise

In [63]:
df1 = pd.DataFrame([["A", 1], ["B", 2]], 
                  columns=["col_A", "col_B"])

df2 = pd.DataFrame([["A", 3], ["C", 4]], 
                  columns=["col_A", "col_B"])

## start your code below
new_df = pd.concat((df1,df2),axis=0)
## end your code here

print(new_df)

  col_A  col_B
0     A      1
1     B      2
0     A      3
1     C      4


### 184. Concatentate the two DataFrames column-wise

In [64]:
df1 = pd.DataFrame([["A", 1], ["B", 2]], 
                  columns=["col_A", "col_B"])

df2 = pd.DataFrame([["A", 3], ["C", 4]], 
                  columns=["col_C", "col_D"])

## start your code below
new_df = pd.concat((df1,df2),axis=1)
## end your code here

print(new_df)

  col_A  col_B col_C  col_D
0     A      1     A      3
1     B      2     C      4


### 185. Change the last two values in the last column to [2,4]

Change 3 -> 2
Change 8 -> 4 

**Note: You should do this in a single line of code**

In [65]:
df = pd.DataFrame([[4, 1, 5], 
                   [5, 5, 9],
                   [2, 9, 3], 
                   [8, 5, 8]], 
                  columns=["col_A", "col_B", "col_C"])


## start your code below
df.iloc[-2:,-1] = [2,4]
## end your code here

print(df)

   col_A  col_B  col_C
0      4      1      5
1      5      5      9
2      2      9      2
3      8      5      4


### 186. Replace all '1' with '2'

In [66]:
df = pd.DataFrame([[4, 1, 5], 
                   [5, 5, 1],
                   [1, 9, 3], 
                   [8, 5, 8]], 
                  columns=["col_A", "col_B", "col_C"])


## start your code below
new_df = df.replace(1,2)
## end your code here

print(new_df)

   col_A  col_B  col_C
0      4      2      5
1      5      5      2
2      2      9      3
3      8      5      8


### 187. Replace all '1' with '2' and '5' with '6' in a single line of code

In [67]:
df = pd.DataFrame([[4, 1, 5], 
                   [5, 5, 1],
                   [1, 9, 3], 
                   [8, 5, 8]], 
                  columns=["col_A", "col_B", "col_C"])


## start your code below
new_df = df.replace(1,2).replace(5,6)
## end your code here

print(new_df)

   col_A  col_B  col_C
0      4      2      6
1      6      6      2
2      2      9      3
3      8      6      8


### 188. Sample 2 random rows from the DataFrame

In [68]:
df = pd.DataFrame([[4, 1, 5], 
                   [5, 5, 1],
                   [1, 9, 3], 
                   [8, 5, 8]], 
                  columns=["col_A", "col_B", "col_C"])


## start your code below
sample_df = df.sample(2)
## end your code here

print(sample_df)

   col_A  col_B  col_C
1      5      5      1
3      8      5      8


### 189. Convert the DataFrame to a list of lists. Don't include the header row.

In [73]:
df = pd.DataFrame([[4, 1, 5], 
                   [5, 5, 1],
                   [1, 9, 3], 
                   [8, 5, 8]], 
                  columns=["col_A", "col_B", "col_C"])


## start your code below
data_list = df.values.tolist()
## end your code here

print(data_list)

[[4, 1, 5], [5, 5, 1], [1, 9, 3], [8, 5, 8]]


### 190. Add three new columns that show the cumulative sum of every column

In [79]:
df = pd.DataFrame([[4, 1, 5], 
                   [5, 5, 1],
                   [1, 9, 3], 
                   [8, 5, 8]], 
                  columns=["col_A", "col_B", "col_C"])


## start your code below
new_df = df.copy()
new_df["cumsum_a"] = new_df["col_A"].cumsum()
new_df["cumsum_b"] = new_df["col_B"].cumsum()
new_df["cumsum_c"] = new_df["col_C"].cumsum()
## end your code here

print(new_df)

   col_A  col_B  col_C  cumsum_a  cumsum_b  cumsum_c
0      4      1      5         4         1         5
1      5      5      1         9         6         6
2      1      9      3        10        15         9
3      8      5      8        18        20        17


### 191. Print the cumulative sum of every row in a new column. 

### In other words, make a column that stores the cumulative sum of the (sum of every row).

In [89]:
df = pd.DataFrame([[4, 1, 5], 
                   [5, 5, 1],
                   [1, 9, 3], 
                   [8, 5, 8]], 
                  columns=["col_A", "col_B", "col_C"])


## start your code below
new_df = df.copy()
new_df["cumsum"] = df.sum(axis=1)
## end your code here

print(new_df)

   col_A  col_B  col_C  cumsum
0      4      1      5      10
1      5      5      1      11
2      1      9      3      13
3      8      5      8      21


### 192. Find the frequency of every element in col_A

In [90]:
df = pd.DataFrame([["A", 1, 5], 
                   ["B", 5, 1],
                   ["C", 9, 3], 
                   ["A", 5, 8]], 
                  columns=["col_A", "col_B", "col_C"])


## start your code below
new_df = df["col_A"].value_counts()
## end your code here

print(new_df)

col_A
A    2
B    1
C    1
Name: count, dtype: int64


### 193. Normalize the frequency of every element in col_A

In [91]:
df = pd.DataFrame([["A", 1, 5], 
                   ["B", 5, 1],
                   ["C", 9, 3], 
                   ["A", 5, 8]], 
                  columns=["col_A", "col_B", "col_C"])


## start your code below
new_df = df["col_A"].value_counts(normalize=True)
## end your code here

print(new_df)

col_A
A    0.50
B    0.25
C    0.25
Name: proportion, dtype: float64


### 194. GroupBy col_A, then find the sum of col_B and mean of col_C

In [93]:
df = pd.DataFrame([["A", 1, 5], 
                   ["B", 5, 1],
                   ["C", 9, 3], 
                   ["A", 5, 8]], 
                  columns=["col_A", "col_B", "col_C"])


## start your code below
new_df = df.groupby("col_A").aggregate(b_sum=pd.NamedAgg(column="col_B",aggfunc="sum"),c_mean=pd.NamedAgg(column="col_C",aggfunc=np.mean)).reset_index()
## end your code here

print(new_df)

  col_A  b_sum  c_mean
0     A      6     6.5
1     B      5     1.0
2     C      9     3.0


### 195. Find the correlation between every pair of column

In [94]:
df = pd.DataFrame([[4, 1, 5], 
                   [5, 5, 1],
                   [1, 9, 3], 
                   [8, 5, 8]], 
                  columns=["col_A", "col_B", "col_C"])


## start your code below
new_df = df.corr()
## end your code here

print(new_df)

          col_A     col_B     col_C
col_A  1.000000 -0.424264  0.599377
col_B -0.424264  1.000000 -0.273434
col_C  0.599377 -0.273434  1.000000


### 196. GroupBy col_A, then compute a new column as (10*col_B/col_C)

In [100]:
df = pd.DataFrame([[4, 1, 5], 
                   [5, 5, 1],
                   [1, 9, 3], 
                   [8, 5, 8]], 
                  columns=["col_A", "col_B", "col_C"])


## start your code below
new_df = df.copy()
new_df["computed"] = new_df.groupby("col_A").apply(lambda x: 10*x["col_B"]/x["col_C"]).reset_index(drop=True)
## end your code here

print(new_df)

   col_A  col_B  col_C  computed
0      4      1      5     30.00
1      5      5      1      2.00
2      1      9      3     50.00
3      8      5      8      6.25


### 197. GroupBy col_A, then compute a new column as (10*col_B/col_C). Project the new column back to the original dataframe

In [102]:
df = pd.DataFrame([[4, 1, 5], 
                   [5, 5, 1],
                   [1, 9, 3], 
                   [8, 5, 8]], 
                  columns=["col_A", "col_B", "col_C"])


## start your code below
new_df = df.copy()
new_df["computed"] = 10*(new_df["col_B"]/new_df["col_C"])
## end your code here

print(new_df)

   col_A  col_B  col_C  computed
0      4      1      5      2.00
1      5      5      1     50.00
2      1      9      3     30.00
3      8      5      8      6.25


### 198. Merge the two dataframes using the join() method on col_A.

In [106]:
df1 = pd.DataFrame([["A", 1], ["B", 2]], 
                  columns=["col_A", "col_B"])

df2 = pd.DataFrame([["A", 3], ["C", 4]], 
                  columns=["col_A", "col_C"])

## start your code below
df1.set_index('col_A', inplace=True)
df2.set_index('col_A', inplace=True)
new_df = df1.join(df2,on="col_A")
## end your code here

print(new_df)

       col_B  col_C
col_A              
A          1    3.0
B          2    NaN


### 199. Perform the full outer join on the two DataFrames. 

In [105]:
df1 = pd.DataFrame([["A", 1], ["B", 2]], 
                  columns=["col_A", "col_B"])

df2 = pd.DataFrame([["A", 3], ["C", 4]], 
                  columns=["col_A", "col_C"])

## start your code below
new_df = pd.merge(df1,df2,on="col_A",how="outer")
## end your code here

print(new_df)

  col_A  col_B  col_C
0     A    1.0    3.0
1     B    2.0    NaN
2     C    NaN    4.0


### 200. Convert the DataFrame to a dictionary.

In [108]:
df = pd.DataFrame([[4, 1, 5], 
                   [5, 5, 1],
                   [1, 9, 3], 
                   [8, 5, 8]], 
                  columns=["col_A", "col_B", "col_C"])


## start your code below
dict_df = df.to_dict()
## end your code here

print(dict_df)

{'col_A': {0: 4, 1: 5, 2: 1, 3: 8}, 'col_B': {0: 1, 1: 5, 2: 9, 3: 5}, 'col_C': {0: 5, 1: 1, 2: 3, 3: 8}}


### 201-210: As your next exercise, frame 10 questions on your own and solve them. 

Great Job solving this notebook. Go to NumPy Notebook 1: [Link](https://deepnote.com/workspace/avi-chawla-695b-aee6f4ef-2d50-4fb6-9ef2-20ee1022995a/project/Numpy-part-1-9b9979f2-b708-4292-b466-3d0157564c91/%2Fnotebook.ipynb)