example code demonstrates how np.where() can be used to apply complicated data wrangling on a Pandas DataFrame, allowing to manipulate and transform data in a flexible and powerful way.

Pandas DataFrame with three columns A, B, and C. 


Use np.where() to create three new columns D, E, and F based on some complicated logic.

For column D, we use np.where() to check if the value in column A is greater than 2 AND the value in column B is not equal to 'c'. If the condition is true, we multiply the value in column C by 2 and assign the result to column D. If the condition is false, we simply copy the value from column C to column D.

For column E, we use np.where() to check if the value in column A is less than or equal to 2 OR the value in column B is equal to 'd'. If the condition is true, we add 5 to the value in column C and assign the result to column E. If the condition is false, we subtract 5 from the value in column C and assign the result to column E.

For column F, we use nested np.where() statements to assign a label ('high', 'medium', or 'low') based on the value in column D. If the value in column D is greater than or equal to 70, we assign the label 'high'. If the value in column D is between 40 and 70, we assign the label 'medium'. Otherwise, we assign the label 'low'.





In [1]:
import pandas as pd
import numpy as np

# create a hardcoded Pandas DataFrame
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 
                   'B': ['a', 'b', 'c', 'd', 'e'],
                   'C': [10, 20, 30, 40, 50]})

# apply a complicated data wrangling using np.where()
df['D'] = np.where((df['A'] > 2) & (df['B'] != 'c'), df['C'] * 2, df['C'])


df['E'] = np.where((df['A'] <= 2) | (df['B'] == 'd'), df['C'] + 5, df['C'] - 5)

df['F'] = np.where(df['D'] >= 70, 'high', np.where(df['D'] >= 40, 'medium', 'low'))

print(df)


   A  B   C    D   E     F
0  1  a  10   10  15   low
1  2  b  20   20  25   low
2  3  c  30   30  25   low
3  4  d  40   80  45  high
4  5  e  50  100  45  high
