Pandas DataFrame GroupBy Transform function returns a DataFrame with the same indexes as the original DataFrame 
just filled with the transformed value. 

The value is transformed by the function passed to the DataFrameGroupBy transform(). 
The return value is the DataFrame.


In order to perform the transform() function, first need to perform the Pandas groupBy().

In [72]:
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","PySpark","Pandas"],
    'Fee' :[20000,22000,22000,30000],
    'hours':[30,35,30,35]
              }
#create dataframe
df = pd.DataFrame(technologies)

In [73]:
df

Unnamed: 0,Courses,Fee,hours
0,Spark,20000,30
1,PySpark,22000,35
2,PySpark,22000,30
3,Pandas,30000,35


In [74]:
# Transform Groupby Object
df.groupby('Courses').transform(lambda x: x.sum())

Unnamed: 0,Fee,hours
0,20000,30
1,44000,65
2,44000,65
3,30000,35


In [75]:
df.groupby(['Courses']).transform('sum')

Unnamed: 0,Fee,hours
0,20000,30
1,44000,65
2,44000,65
3,30000,35


df.groupby('Courses').aggregate('sum')

-------

## Difference between the Groupby aggregate() and Groupby transform() 

transform() function broadcast the values to the complete dataFrame and returns the dataFrame with the same cells but Transformed values.

But aggregate() function returns the aggregate value of the specific columns.

----

In [76]:
import pandas as pd
tech = {
    'Courses':["BigData", "CyberSecurity", "IoT", "Robotics", "Analytics"],
    'Category':["Software", "Software", "Hardware", "Hardware", "Software"],
    'Fee' :[20000,22000,21000,30000, 22000],
    'hours':[30,35,30,35,30]
              }
#create dataframe
df = pd.DataFrame(tech)

In [77]:
df

Unnamed: 0,Courses,Category,Fee,hours
0,BigData,Software,20000,30
1,CyberSecurity,Software,22000,35
2,IoT,Hardware,21000,30
3,Robotics,Hardware,30000,35
4,Analytics,Software,22000,30


In [78]:
df.groupby(['Courses','Category']).transform('sum')

Unnamed: 0,Fee,hours
0,20000,30
1,22000,35
2,21000,30
3,30000,35
4,22000,30


In [79]:
df.groupby(['Courses','Category']).aggregate('sum')

Unnamed: 0_level_0,Unnamed: 1_level_0,Fee,hours
Courses,Category,Unnamed: 2_level_1,Unnamed: 3_level_1
Analytics,Software,22000,30
BigData,Software,20000,30
CyberSecurity,Software,22000,35
IoT,Hardware,21000,30
Robotics,Hardware,30000,35


## Groupby aggregate() vs Groupby transform()




In [80]:
df1 = df.copy()
df1['HoursPlus'] = df1.groupby('Category')['hours'].transform(lambda x: x + 45)

In [81]:
df1

Unnamed: 0,Courses,Category,Fee,hours,HoursPlus
0,BigData,Software,20000,30,75
1,CyberSecurity,Software,22000,35,80
2,IoT,Hardware,21000,30,75
3,Robotics,Hardware,30000,35,80
4,Analytics,Software,22000,30,75


In [90]:
df.groupby('Category')['hours'].aggregate('sum')

Category
Hardware    65
Software    95
Name: hours, dtype: int64

------

Following will not work - it will thorow error - ValueError: Must produce aggregated value
------

df.groupby('Category').hours.aggregate(lambda x: x + 45)

--------

### Applying aggregate function of all available column vs specific column

In [47]:
df.groupby(['Category']).aggregate('max')

Unnamed: 0_level_0,Courses,Fee,hours
Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Hardware,Robotics,30000,35
Software,CyberSecurity,22000,35


In [48]:
df.groupby(['Category']).aggregate({'Fee':'max'})

Unnamed: 0_level_0,Fee
Category,Unnamed: 1_level_1
Hardware,30000
Software,22000


-----

## Find out maximum fee for each category 

In [49]:
df['MaxFeeForCategory'] = df.groupby(['Category'])['Fee'].transform('max')

In [50]:
df

Unnamed: 0,Courses,Category,Fee,hours,MaxFeeForCategory
0,BigData,Software,20000,30,22000
1,CyberSecurity,Software,22000,35,22000
2,IoT,Hardware,21000,30,30000
3,Robotics,Hardware,30000,35,30000
4,Analytics,Software,22000,30,22000
