example code demonstrates usage of np.select() to apply complicated data wrangling on a Pandas DataFrame,

Manipulate and transform data in a flexible and powerful way



Pandas DataFrame with three columns A, B, and C. 


use np.select() to create two new columns D and E based on some complicated logic.

For column D, define two conditions using boolean expressions that involve columns A and B. 

If the first condition is true, multiply the value in column C by 2. 

If the second condition is true, add 5 to the value in column C. 

If neither condition is true, simply copy the value from column C. 

then pass the conditions and corresponding values to np.select() along with a default value of df['C'] to create the new column D.

For column E, define three bins and corresponding labels using pd.cut().

then pass the values in column D along with the bins and labels to pd.cut() to create a new column E that categorizes the values in column D based on the specified bins and labels.



In [5]:
import pandas as pd
import numpy as np

# create a hardcoded Pandas DataFrame
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 
                   'B': ['a', 'b', 'c', 'd', 'e'],
                   'C': [10, 20, 30, 40, 50]})

# apply a complicated data wrangling using np.select()
conditions = [
    (df['A'] > 2) & (df['B'] != 'c'),
    (df['A'] <= 2) | (df['B'] == 'd')
]
values = [
    df['C'] * 2,
    df['C'] + 5
]
df['D'] = np.select(conditions, values, default=df['C'])




In [3]:
# create the bins for the categorical variable
bins = [-np.inf, 40, 70, np.inf]

# create the labels for the categorical variable
labels = ['low', 'medium', 'high']

# use pd.cut() to create the categorical variable
df['E'] = pd.cut(df['D'], bins=bins, labels=labels, right=False)

In [4]:
df

Unnamed: 0,A,B,C,D,E
0,1,a,10,15,low
1,2,b,20,25,low
2,3,c,30,30,low
3,4,d,40,80,high
4,5,e,50,100,high
