# Group and Pivot Table

In this section we'll take a look at a larger data set to continue exploring the *group by* operation. We'll also introduce the *pivot table* operation.

We consider the Data Science Jobs and Salaries dataset which contains information on:

* Data Science job titles
* experience level: Entry-level (EN), Mid-level (MI), Senior-level (SE), Executive-level (EX)
* remote vs full time ratio
* 2020 salary or 2021 expected salary (2021e)
* employment type: PT (part time), FT (full time), CT (contract), FL (Freelance)

(For a more complete view of the data, note that they were taken from Saurabh Shahane at Kaggle, which gathered information from ai-jobs.net.)

Below we begin by constructing a string of these data in comma-separated value format, assigning the name `data_science_jobs_salaries_csv`. (This type of data is most often contained within a `.csv` file, and read directly from file; but, for this exercise, we first consider it on its own. Then, we wrap it in a file-like interface, assigned to `data_science_jobs_salaries_file`.) These data are lengthy and so the cell is hidden by default; but, it's worth toggling the cell to understand what CSV data look like:

In [22]:
data_science_jobs_salaries_csv = '''\
work_year,experience_level,employment_type,job_title,salary,salary_currency,salary_in_usd,employee_residence,remote_ratio,company_location,company_size
2021e,EN,FT,Data Science Consultant,54000,EUR,64369,DE,50,DE,L
2020,SE,FT,Data Scientist,60000,EUR,68428,GR,100,US,L
2021e,EX,FT,Head of Data Science,85000,USD,85000,RU,0,RU,M
2021e,EX,FT,Head of Data,230000,USD,230000,RU,50,RU,L
2021e,EN,FT,Machine Learning Engineer,125000,USD,125000,US,100,US,S
2021e,SE,FT,Data Analytics Manager,120000,USD,120000,US,100,US,M
2020,MI,FT,Research Scientist,450000,USD,450000,US,0,US,M
2020,MI,FT,Data Analyst,41000,EUR,46759,FR,50,FR,L
2020,MI,FT,Data Engineer,65000,EUR,74130,AT,50,AT,L
2021e,SE,FT,Data Science Engineer,159500,CAD,127543,CA,50,CA,L
2021e,SE,FT,Manager Data Science,144000,USD,144000,US,100,US,L
2021e,EN,FT,Data Scientist,13400,USD,13400,UA,100,UA,L
2021e,MI,FT,Data Scientist,95000,CAD,75966,CA,100,CA,L
2021e,MI,FT,Data Scientist,150000,USD,150000,US,100,US,M
2020,MI,FT,Data Science Consultant,103000,USD,103000,US,100,US,L
2021e,SE,FT,Data Engineering Manager,153000,USD,153000,US,100,US,L
2021e,MI,FT,Data Engineer,90000,USD,90000,US,100,US,L
2021e,EN,FT,Data Analyst,90000,USD,90000,US,100,US,S
2021e,EN,FT,Data Analyst,60000,USD,60000,US,100,US,S
2021e,MI,FT,Data Scientist,50000,USD,50000,NG,100,NG,L
2021e,EN,PT,AI Scientist,12000,USD,12000,PK,100,US,M
2021e,MI,PT,3D Computer Vision Researcher,400000,INR,5423,IN,50,IN,M
2021e,MI,CT,ML Engineer,270000,USD,270000,US,100,US,L
2021e,MI,FT,Applied Data Scientist,68000,CAD,54376,GB,50,CA,L
2021e,MI,FT,Machine Learning Engineer,40000,EUR,47681,ES,100,ES,S
2021e,EX,FT,Director of Data Science,130000,EUR,154963,IT,100,PL,L
2021e,MI,FT,Data Engineer,110000,PLN,28801,PL,100,PL,L
2021e,MI,FT,Data Analytics Engineer,110000,USD,110000,US,100,US,L
2021e,EN,FT,Research Scientist,60000,GBP,83000,GB,50,GB,L
2020,EN,FT,Machine Learning Engineer,250000,USD,250000,US,50,US,L
2021e,EN,FT,Data Analyst,50000,EUR,59601,FR,50,FR,M
2021e,SE,FT,Data Analyst,80000,USD,80000,BG,100,US,S
2020,EN,FT,Data Analyst,10000,USD,10000,NG,100,NG,S
2020,EN,FT,Machine Learning Engineer,138000,USD,138000,US,100,US,S
2021e,MI,FT,Data Engineer,140000,USD,140000,US,100,US,L
2021e,SE,FT,Data Analytics Engineer,67000,EUR,79866,DE,100,DE,L
2021e,SE,FT,Lead Data Analyst,170000,USD,170000,US,100,US,L
2021e,EN,FT,Data Analyst,80000,USD,80000,US,100,US,M
2020,MI,FT,Data Scientist,45760,USD,45760,PH,100,US,S
2021e,MI,FT,BI Data Analyst,100000,USD,100000,US,100,US,M
2021e,SE,FT,Data Scientist,45000,EUR,53641,FR,50,FR,L
2021e,EX,FT,Head of Data,235000,USD,235000,US,100,US,L
2021e,EX,FT,BI Data Analyst,150000,USD,150000,IN,100,US,L
2020,EX,FT,Data Engineering Manager,70000,EUR,79833,ES,50,ES,L
2021e,EN,FT,Machine Learning Scientist,225000,USD,225000,US,100,US,L
2021e,EN,FT,Data Science Consultant,65000,EUR,77481,DE,100,DE,S
2020,MI,FT,Machine Learning Infrastructure Engineer,44000,EUR,50180,PT,0,PT,M
2021e,SE,FT,Marketing Data Analyst,75000,EUR,89402,GR,100,DK,L
2021e,SE,FT,Lead Data Engineer,75000,GBP,103750,GB,100,GB,S
2021e,SE,FT,Director of Data Engineering,82500,GBP,114125,GB,100,GB,M
2021e,SE,FT,Machine Learning Engineer,80000,EUR,95362,DE,50,DE,L
2021e,EN,FT,Data Engineer,2250000,INR,30509,IN,100,IN,L
2021e,SE,FT,Data Engineer,150000,USD,150000,US,100,US,M
2021e,SE,FT,Data Engineer,115000,USD,115000,US,100,US,S
2021e,MI,FT,Research Scientist,235000,CAD,187917,CA,100,CA,L
2021e,MI,FT,Data Analyst,37456,GBP,51814,GB,50,GB,L
2020,MI,FT,Data Engineer,106000,USD,106000,US,100,US,L
2020,MI,FT,Data Engineer,88000,GBP,112872,GB,50,GB,L
2021e,MI,FT,BI Data Analyst,11000000,HUF,36732,HU,50,US,L
2021e,SE,FT,Data Engineer,150000,USD,150000,US,100,US,L
2020,EN,PT,ML Engineer,14000,EUR,15966,DE,100,DE,S
2021e,MI,FT,Computer Vision Software Engineer,81000,EUR,96554,DE,100,US,S
2021e,EN,FT,Computer Vision Software Engineer,70000,USD,70000,US,100,US,M
2021e,MI,FT,Financial Data Analyst,450000,USD,450000,US,100,US,L
2020,MI,FT,Data Scientist,60000,GBP,76958,GB,100,GB,S
2021e,MI,FT,Cloud Data Engineer,120000,SGD,89514,SG,50,SG,L
2021e,EN,FT,Data Scientist,2200000,INR,29831,IN,50,IN,L
2021e,SE,FT,Lead Data Engineer,276000,USD,276000,US,0,US,L
2020,SE,FT,Data Engineer,188000,USD,188000,US,100,US,L
2021e,SE,FT,Cloud Data Engineer,160000,USD,160000,BR,100,US,S
2020,MI,FT,Data Scientist,105000,USD,105000,US,100,US,L
2021e,MI,FT,Data Engineer,200000,USD,200000,US,100,US,L
2021e,SE,FT,Data Engineering Manager,174000,USD,174000,US,100,US,L
2021e,MI,FT,Data Analyst,93000,USD,93000,US,100,US,L
2021e,EN,FT,Data Scientist,2100000,INR,28475,IN,100,IN,M
2021e,SE,FT,Research Scientist,51400,EUR,61270,PT,50,PT,L
2021e,EN,FT,Data Scientist,90000,USD,90000,US,100,US,S
2020,MI,FT,Data Engineer,61500,EUR,70139,FR,50,FR,L
2020,EN,FT,Data Analyst,450000,INR,6072,IN,0,IN,S
2020,SE,FT,Data Engineer,720000,MXN,33511,MX,0,MX,S
2021e,SE,FT,Principal Data Analyst,170000,USD,170000,US,100,US,M
2021e,SE,FT,Data Engineer,70000,GBP,96833,GB,50,GB,L
2021e,MI,FT,Data Engineer,108000,TRY,13105,TR,0,TR,M
2021e,EN,FT,Data Scientist,31000,EUR,36952,FR,50,FR,L
2021e,MI,FT,Data Engineer,52500,GBP,72625,GB,50,GB,L
2020,EN,FT,Data Analyst,91000,USD,91000,US,100,US,L
2021e,SE,FT,Big Data Architect,125000,CAD,99956,CA,50,CA,M
2021e,SE,FT,Data Scientist,165000,USD,165000,US,100,US,L
2021e,MI,FT,Data Analyst,80000,USD,80000,US,100,US,L
2021e,SE,FT,Data Scientist,130000,CAD,103954,CA,100,CA,L
2021e,EN,FT,Data Engineer,1600000,INR,21695,IN,50,IN,M
2020,EN,FT,Research Scientist,42000,USD,42000,NL,50,NL,L
2020,MI,FT,Lead Data Scientist,115000,USD,115000,AE,0,AE,L
2021e,MI,FT,Research Scientist,80000,CAD,63971,CA,100,CA,M
2020,SE,FT,Machine Learning Scientist,260000,USD,260000,JP,0,JP,S
2021e,MI,FT,Head of Data Science,110000,USD,110000,US,0,US,S
2021e,MI,FT,Data Architect,180000,USD,180000,US,100,US,L
2021e,SE,FT,Data Analyst,200000,USD,200000,US,100,US,L
2020,SE,FT,Big Data Engineer,85000,GBP,109024,GB,50,GB,M
2021e,SE,FT,Director of Data Engineering,200000,USD,200000,US,100,US,L
2021e,SE,FT,ML Engineer,256000,USD,256000,US,100,US,S
2021e,MI,FT,Data Engineer,110000,USD,110000,US,100,US,L
2020,MI,FT,Data Scientist,70000,EUR,79833,DE,0,DE,L
2021e,EN,FT,Data Engineer,72500,USD,72500,US,100,US,L
2021e,SE,FT,Machine Learning Engineer,185000,USD,185000,US,50,US,L
2021e,MI,PT,Data Engineer,59000,EUR,70329,NL,100,NL,L
2021e,EN,FT,Research Scientist,100000,USD,100000,JE,0,CN,L
2021e,MI,FT,Data Engineer,112000,USD,112000,US,100,US,L
2020,SE,FT,Machine Learning Engineer,150000,USD,150000,US,50,US,L
2021e,SE,FT,Data Scientist,180000,TRY,21843,TR,50,TR,L
2021e,SE,FT,AI Scientist,55000,USD,55000,ES,100,ES,L
2021e,EN,FT,Data Scientist,58000,USD,58000,US,50,US,L
2021e,EN,FT,Data Scientist,100000,USD,100000,US,100,US,M
2021e,SE,FT,Data Scientist,65720,EUR,78340,FR,50,FR,M
2021e,EN,FT,Machine Learning Engineer,85000,USD,85000,NL,100,DE,S
2021e,EN,FT,Data Science Consultant,65000,EUR,77481,DE,0,DE,L
2021e,SE,CT,Staff Data Scientist,105000,USD,105000,US,100,US,M
2020,EN,FT,Data Analyst,72000,USD,72000,US,100,US,L
2021e,EN,FT,Data Engineer,55000,EUR,65561,DE,50,DE,M
2021e,MI,FT,Data Engineer,250000,TRY,30337,TR,100,TR,M
2021e,MI,FT,Data Engineer,111775,USD,111775,US,0,US,M
2021e,MI,FT,Data Engineer,93150,USD,93150,US,0,US,M
2021e,SE,FT,Lead Data Engineer,160000,USD,160000,PR,50,US,S
2021e,MI,FT,Data Scientist,21600,EUR,25747,RS,100,DE,S
2021e,SE,FT,Machine Learning Engineer,4900000,INR,66442,IN,0,IN,L
2021e,MI,FT,Data Scientist,1250000,INR,16949,IN,100,IN,S
2021e,SE,FT,Data Analyst,54000,EUR,64369,DE,50,DE,L
2020,SE,FT,Lead Data Scientist,190000,USD,190000,US,100,US,S
2021e,EX,FT,Director of Data Science,120000,EUR,143043,DE,0,DE,L
2021e,EN,FT,Big Data Engineer,1200000,INR,16271,IN,100,IN,L
2021e,SE,FT,Data Analyst,90000,CAD,71968,CA,100,CA,M
2020,MI,FT,Data Scientist,11000000,HUF,35735,HU,50,HU,L
2021e,SE,FT,Data Scientist,135000,USD,135000,US,0,US,L
2021e,EN,FT,Machine Learning Engineer,21000,EUR,25032,DE,50,DE,M
2021e,SE,FT,Data Science Manager,4000000,INR,54238,IN,50,US,L
2021e,SE,FT,Machine Learning Engineer,1799997,INR,24407,IN,100,IN,L
2021e,EN,FT,BI Data Analyst,9272,USD,9272,KE,100,KE,S
2021e,MI,FT,Data Scientist,147000,USD,147000,US,50,US,L
2021e,SE,FT,Research Scientist,120500,CAD,96357,CA,50,CA,L
2021e,SE,FT,Data Science Manager,174000,USD,174000,US,100,US,L
2020,MI,FT,Business Data Analyst,135000,USD,135000,US,100,US,L
2021e,EN,FT,Machine Learning Engineer,21844,USD,21844,CO,50,CO,M
2020,SE,FT,Lead Data Engineer,125000,USD,125000,NZ,50,NZ,S
2020,EN,FT,Data Scientist,45000,EUR,51321,FR,0,FR,S
2020,MI,FT,Data Scientist,3000000,INR,40481,IN,0,IN,L
2021e,EX,FT,Data Science Consultant,59000,EUR,70329,FR,100,ES,S
2021e,SE,FT,Data Analytics Engineer,50000,USD,50000,VN,100,GB,M
2021e,MI,FT,Data Engineer,4000,USD,4000,IR,100,IR,M
2020,EN,FT,Data Scientist,35000,EUR,39916,FR,0,FR,M
2020,MI,FT,Lead Data Analyst,87000,USD,87000,US,100,US,L
2021e,MI,FT,Data Engineer,22000,EUR,26224,RO,0,US,L
2021e,MI,FT,Data Scientist,76760,EUR,91500,DE,50,DE,L
2021e,MI,FT,Big Data Engineer,1672000,INR,22671,IN,0,IN,L
2021e,MI,FT,Data Scientist,420000,INR,5695,IN,100,US,S
2021e,EN,FT,Machine Learning Engineer,81000,USD,81000,US,50,US,S
2021e,MI,FT,Data Scientist,30400000,CLP,40798,CL,100,CL,L
2021e,MI,FT,Data Scientist,58000,MXN,2876,MX,0,MX,S
2021e,EN,FT,Data Science Consultant,90000,USD,90000,US,100,US,S
2021e,MI,FT,Data Scientist,52000,EUR,61985,DE,50,AT,M
2021e,SE,FT,Machine Learning Infrastructure Engineer,195000,USD,195000,US,100,US,M
2021e,MI,FT,Data Scientist,32000,EUR,38144,ES,100,ES,L
2020,MI,FT,Data Analyst,85000,USD,85000,US,100,US,L
2021e,EX,CT,Principal Data Scientist,416000,USD,416000,US,100,US,S
2021e,SE,FT,Machine Learning Scientist,225000,USD,225000,US,100,CA,L
2021e,MI,FT,Data Scientist,40900,GBP,56578,GB,50,GB,L
2021e,MI,FT,Data Scientist,2500000,INR,33899,IN,0,IN,M
2021e,MI,FT,Data Scientist,85000,GBP,117583,GB,50,GB,L
2021e,MI,FT,Machine Learning Engineer,180000,PLN,47129,PL,100,PL,L
2020,MI,FT,Data Analyst,8000,USD,8000,PK,50,PK,L
2020,EN,FT,Data Engineer,4450000,JPY,41689,JP,100,JP,S
2020,SE,FT,Big Data Engineer,100000,EUR,114047,PL,100,GB,S
2021e,MI,FT,Machine Learning Engineer,75000,EUR,89402,BE,100,BE,M
2020,EN,FT,Data Science Consultant,423000,INR,5707,IN,50,IN,M
2020,MI,FT,Lead Data Engineer,56000,USD,56000,PT,100,US,M
2021e,EN,PT,Computer Vision Engineer,180000,DKK,28850,DK,50,DK,S
2021e,MI,FT,Data Scientist,75000,EUR,89402,DE,50,DE,L
2020,MI,FT,Machine Learning Engineer,299000,CNY,43331,CN,0,CN,M
2020,MI,FT,Product Data Analyst,450000,INR,6072,IN,100,IN,L
2020,SE,FT,Data Engineer,42000,EUR,47899,GR,50,GR,L
2020,MI,FT,BI Data Analyst,98000,USD,98000,US,0,US,M
2021e,MI,FT,Data Engineer,48000,GBP,66400,HK,50,GB,S
2021e,MI,FT,Research Scientist,48000,EUR,57217,FR,50,FR,S
2021e,MI,FT,Machine Learning Engineer,21000,EUR,25032,SI,50,SI,L
2021e,SE,FT,Data Analytics Manager,120000,USD,120000,US,0,US,L
2021e,MI,FL,Data Engineer,20000,USD,20000,IT,0,US,L
2020,EX,FT,Director of Data Science,325000,USD,325000,US,100,US,L
2021e,SE,FT,Machine Learning Engineer,200000,USD,200000,US,100,US,L
2020,EN,FT,AI Scientist,300000,DKK,45896,DK,50,DK,S
2021e,MI,FT,Data Scientist,160000,USD,160000,US,100,US,L
2021e,SE,FT,Research Scientist,50000,USD,50000,FR,100,US,S
2021e,MI,FT,Data Science Engineer,34000,EUR,40529,GR,100,GR,M
2021e,EX,FT,Principal Data Engineer,600000,USD,600000,US,100,US,L
2021e,MI,FT,Data Scientist,69600,BRL,13000,BR,0,BR,S
2021e,SE,FT,Data Engineer,165000,USD,165000,US,0,US,M
2021e,EN,FT,Big Data Engineer,435000,INR,5898,IN,0,CH,L
2020,MI,FT,Data Scientist,37000,EUR,42197,FR,50,FR,S
2021e,SE,FT,Principal Data Engineer,185000,USD,185000,US,100,US,L
2020,EN,FT,Data Scientist,55000,EUR,62726,DE,50,DE,S
2021e,MI,FT,Data Scientist,76760,EUR,91500,DE,50,DE,L
2020,EN,PT,Data Scientist,19000,EUR,21669,IT,50,IT,S
2020,MI,FT,Data Engineer,110000,USD,110000,US,100,US,L
2021e,SE,FT,Data Analytics Manager,140000,USD,140000,US,100,US,L
2020,SE,FT,Data Scientist,120000,USD,120000,US,50,US,L
2021e,SE,FT,Data Scientist,110000,CAD,87961,CA,100,CA,S
2021e,SE,FT,Finance Data Analyst,45000,GBP,62250,GB,50,GB,L
2021e,MI,FL,Machine Learning Scientist,12000,USD,12000,PK,50,PK,M
2021e,SE,FT,Data Engineer,65000,EUR,77481,RO,50,GB,S
2021e,MI,FT,Machine Learning Engineer,74000,USD,74000,JP,50,JP,S
2021e,SE,FT,Data Science Manager,152000,USD,152000,US,100,FR,L
2021e,MI,FT,Big Data Engineer,18000,USD,18000,MD,0,MD,S
2020,SE,FL,Computer Vision Engineer,60000,USD,60000,RU,100,US,S
2021e,MI,FT,Data Scientist,130000,USD,130000,US,50,US,L
2021e,SE,FT,Computer Vision Engineer,102000,BRL,19052,BR,0,BR,M
2021e,EN,FT,Business Data Analyst,50000,EUR,59601,LU,100,LU,L
2021e,SE,FT,Principal Data Scientist,147000,EUR,175228,DE,100,DE,M
2020,SE,FT,Principal Data Scientist,130000,EUR,148261,DE,100,DE,M
2020,MI,FT,Data Scientist,34000,EUR,38776,ES,100,ES,M
2021e,MI,FT,Data Scientist,39600,EUR,47204,ES,100,ES,M
2021e,EN,FT,Data Scientist,4000,USD,4000,VN,0,VN,M
2021e,EN,FT,AI Scientist,1335000,INR,18102,IN,100,AS,S
2020,SE,FT,Data Scientist,80000,EUR,91237,AT,0,AT,S
2020,MI,FT,Data Scientist,55000,EUR,62726,FR,50,LU,S
2021e,MI,FT,Data Scientist,115000,USD,115000,US,50,US,L
2021e,SE,FT,Principal Data Scientist,235000,USD,235000,US,100,US,L
2021e,MI,FT,Lead Data Analyst,1450000,INR,19661,IN,100,IN,L
2021e,EN,PT,AI Scientist,12000,USD,12000,BR,100,US,S
2021e,MI,FT,Data Analyst,75000,USD,75000,US,0,US,L
2021e,MI,FT,Data Analyst,62000,USD,62000,US,0,US,L
2021e,MI,FT,Data Scientist,73000,USD,73000,US,0,US,L
2021e,MI,FT,Data Engineer,38400,EUR,45773,NL,100,NL,L
2020,SE,FT,Data Science Manager,190200,USD,190200,US,100,US,M
2020,MI,FT,Data Scientist,118000,USD,118000,US,100,US,M
2020,MI,FT,Data Scientist,138350,USD,138350,US,100,US,M
2020,MI,FT,Data Engineer,130800,USD,130800,ES,100,US,M
2020,SE,FT,Machine Learning Engineer,40000,EUR,45618,HR,100,HR,S
2021e,SE,FT,Director of Data Science,168000,USD,168000,JP,0,JP,S
2021e,MI,FT,Data Scientist,160000,SGD,119353,SG,100,IL,M
2021e,MI,FT,Applied Machine Learning Scientist,423000,USD,423000,US,50,US,L
2021e,MI,FT,Data Engineer,24000,EUR,28608,MT,50,MT,L
2021e,SE,FT,Data Specialist,165000,USD,165000,US,100,US,L
2020,SE,FT,Data Scientist,412000,USD,412000,US,100,US,L
2021e,MI,FT,Principal Data Scientist,151000,USD,151000,US,100,US,L
2020,EN,FT,Data Scientist,105000,USD,105000,US,100,US,S
2020,EN,CT,Business Data Analyst,100000,USD,100000,US,100,US,L
2021e,SE,FT,Data Science Manager,7000000,INR,94917,IN,50,IN,L
'''

from io import StringIO

data_science_jobs_salaries_file = StringIO(data_science_jobs_salaries_csv)

And, as ever, we begin the real work by importing the pandas library, and assigning it the local name, `pd`:

In [4]:
import pandas as pd

## More on Groups

It's presumed that our CSV data is contained within a file; and so we'll ask pandas to parse this version of it:

In [23]:
salary_data = pd.read_csv(data_science_jobs_salaries_file)

salary_data

Unnamed: 0,work_year,experience_level,employment_type,job_title,salary,salary_currency,salary_in_usd,employee_residence,remote_ratio,company_location,company_size
0,2021e,EN,FT,Data Science Consultant,54000,EUR,64369,DE,50,DE,L
1,2020,SE,FT,Data Scientist,60000,EUR,68428,GR,100,US,L
2,2021e,EX,FT,Head of Data Science,85000,USD,85000,RU,0,RU,M
3,2021e,EX,FT,Head of Data,230000,USD,230000,RU,50,RU,L
4,2021e,EN,FT,Machine Learning Engineer,125000,USD,125000,US,100,US,S
...,...,...,...,...,...,...,...,...,...,...,...
240,2020,SE,FT,Data Scientist,412000,USD,412000,US,100,US,L
241,2021e,MI,FT,Principal Data Scientist,151000,USD,151000,US,100,US,L
242,2020,EN,FT,Data Scientist,105000,USD,105000,US,100,US,S
243,2020,EN,CT,Business Data Analyst,100000,USD,100000,US,100,US,L


This is only the first few and the last few rows; this DataFrame contains a lot of information! We can extract or sort the information we want by grouping the data in a convenient way. The `groupby` method in pandas will help us do exactly that! 

For example, we can group the `salary_data` by the experience level of the employees. Recall we may do this by first splitting the data into `experience_level` groups and then extracting the group of employees with `experience_level` equal to `EN`, or entry-level experience. The resulting DataFrame contains only entries from the specficed group: entry-level experience employees.

In [24]:
experience_level_groups = salary_data.groupby('experience_level')

experience_level_groups.get_group('EN')

Unnamed: 0,work_year,experience_level,employment_type,job_title,salary,salary_currency,salary_in_usd,employee_residence,remote_ratio,company_location,company_size
0,2021e,EN,FT,Data Science Consultant,54000,EUR,64369,DE,50,DE,L
4,2021e,EN,FT,Machine Learning Engineer,125000,USD,125000,US,100,US,S
11,2021e,EN,FT,Data Scientist,13400,USD,13400,UA,100,UA,L
17,2021e,EN,FT,Data Analyst,90000,USD,90000,US,100,US,S
18,2021e,EN,FT,Data Analyst,60000,USD,60000,US,100,US,S
20,2021e,EN,PT,AI Scientist,12000,USD,12000,PK,100,US,M
28,2021e,EN,FT,Research Scientist,60000,GBP,83000,GB,50,GB,L
29,2020,EN,FT,Machine Learning Engineer,250000,USD,250000,US,50,US,L
30,2021e,EN,FT,Data Analyst,50000,EUR,59601,FR,50,FR,M
32,2020,EN,FT,Data Analyst,10000,USD,10000,NG,100,NG,S


We can group by more than one category as well. Below we group first by `employment_type` and within each `employment_type` group we group by `experience_level`. The `first` method then return the first row, if it exists, of each group.

In [25]:
two_groups = salary_data.groupby(['employment_type','experience_level']) 

two_groups.first()

Unnamed: 0_level_0,Unnamed: 1_level_0,work_year,job_title,salary,salary_currency,salary_in_usd,employee_residence,remote_ratio,company_location,company_size
employment_type,experience_level,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
CT,EN,2020,Business Data Analyst,100000,USD,100000,US,100,US,L
CT,EX,2021e,Principal Data Scientist,416000,USD,416000,US,100,US,S
CT,MI,2021e,ML Engineer,270000,USD,270000,US,100,US,L
CT,SE,2021e,Staff Data Scientist,105000,USD,105000,US,100,US,M
FL,MI,2021e,Data Engineer,20000,USD,20000,IT,0,US,L
FL,SE,2020,Computer Vision Engineer,60000,USD,60000,RU,100,US,S
FT,EN,2021e,Data Science Consultant,54000,EUR,64369,DE,50,DE,L
FT,EX,2021e,Head of Data Science,85000,USD,85000,RU,0,RU,M
FT,MI,2020,Research Scientist,450000,USD,450000,US,0,US,M
FT,SE,2020,Data Scientist,60000,EUR,68428,GR,100,US,L


Having grouped the data into single or mulitple categories, we'll now apply functions to these categories!

Perhaps we want to know the average salary grouped by experience level. We use the line below, which groups the data by `experience_level` and takes the `mean` across the `salary_in_usd` column within each group.

In [26]:
salary_data.groupby('experience_level').salary_in_usd.mean() 

experience_level
EN     59753.462963
EX    226288.000000
MI     85738.135922
SE    128841.298701
Name: salary_in_usd, dtype: float64

We can also count how many people work (or get salaries) at each size company: small, medium, large.

First we group different company sizes and count all employees who receive a salary.

(Recall that the double brackets on `'salary'` ensure our result is a DataFrame and not a series. The output above, for average salary by experience level, was a simple series – see if you can write this as a DataFrame!)

In [28]:
salary_data.groupby('company_size').count()[['salary']]

Unnamed: 0_level_0,salary
company_size,Unnamed: 1_level_1
L,132
M,55
S,58


We can also use the `groupby` method to split the data into multiple groups, apply a function, and then reform it into a DataFrame.

Similar to the above example, we first group by multiple columns, subdividing into `experience_level` and then further by `company_size`:

In [29]:
experience_and_compsize = salary_data.groupby(['experience_level', 'company_size'])

experience_and_compsize.first()

Unnamed: 0_level_0,Unnamed: 1_level_0,work_year,employment_type,job_title,salary,salary_currency,salary_in_usd,employee_residence,remote_ratio,company_location
experience_level,company_size,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
EN,L,2021e,FT,Data Science Consultant,54000,EUR,64369,DE,50,DE
EN,M,2021e,PT,AI Scientist,12000,USD,12000,PK,100,US
EN,S,2021e,FT,Machine Learning Engineer,125000,USD,125000,US,100,US
EX,L,2021e,FT,Head of Data,230000,USD,230000,RU,50,RU
EX,M,2021e,FT,Head of Data Science,85000,USD,85000,RU,0,RU
EX,S,2021e,FT,Data Science Consultant,59000,EUR,70329,FR,100,ES
MI,L,2020,FT,Data Analyst,41000,EUR,46759,FR,50,FR
MI,M,2020,FT,Research Scientist,450000,USD,450000,US,0,US
MI,S,2021e,FT,Machine Learning Engineer,40000,EUR,47681,ES,100,ES
SE,L,2020,FT,Data Scientist,60000,EUR,68428,GR,100,US


If we want to compute the average salary within each experience level and dependent on company size we can use our multi-grouped DataFrame and aggregate the average salary:

In [30]:
experience_and_compsize['salary_in_usd'].mean()

experience_level  company_size
EN                L                75148.000000
                  M                41063.923077
                  S                57502.000000
EX                L               239729.875000
                  M                85000.000000
                  S               243164.500000
MI                L                96285.451613
                  M                83982.800000
                  S                47610.000000
SE                L               134465.604651
                  M               122572.125000
                  S               120978.055556
Name: salary_in_usd, dtype: float64

From this we can see that the average salary for an entry level employee at a large company is $75,148.

To turn this into a DataFrame, we use the `unstack` method, which pivots the inner-most group, `company_size`, and rearranges as columns.

In [32]:
experience_and_compsize['salary_in_usd'].mean().unstack()

company_size,L,M,S
experience_level,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
EN,75148.0,41063.923077,57502.0
EX,239729.875,85000.0,243164.5
MI,96285.451613,83982.8,47610.0
SE,134465.604651,122572.125,120978.055556


This idea of splitting a DataFrame into multiple groups and computing multi-dimensional aggregations of values is such an important feature that it has its own function in pandas! It is called a *pivot table*.

## Pivot Tables

A *pivot table* allows for cross-classification of groups in a DataFrame.

The general format for a *pivot table* is –

```python
df.pivot_table(data, index='group_1', columns='group_2', aggfunc='function')
```

– where:

* each unique value in `index` gets its own row
* each unique value in `columns` gets its own column
* and `data` specifies the value in the DataFrame to which we want to apply `aggfunc`

The default option for `aggfunc` is `mean`.

Above we used the `groupby` method to cross-classify experience level with company size using the code:

```python
salary_data.groupby(['experience_level', 'company_size'])['salary_in_usd'].mean().unstack()
```

We can reimplement this with a pivot table by specifying `experience_level` for the rows, `company_size` for the columns, and the data of `salary_in_usd`. (Again, the default for `pivot_table` is to aggregate the mean.)

In [33]:
salary_data.pivot_table('salary_in_usd', index='experience_level', columns='company_size')

company_size,L,M,S
experience_level,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
EN,75148.0,41063.923077,57502.0
EX,239729.875,85000.0,243164.5
MI,96285.451613,83982.8,47610.0
SE,134465.604651,122572.125,120978.055556


While taking the average is the default, we can also calculate the sum, maximum, minimum, or variance to name a few.

We find the maximum salary with respect to experience level and company size below:

In [34]:
salary_data.pivot_table('salary_in_usd', index='experience_level', columns='company_size', aggfunc='max')

company_size,L,M,S
experience_level,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
EN,250000,100000,138000
EX,600000,85000,416000
MI,450000,450000,110000
SE,412000,195000,260000


We also have the option to calculate multiple statistics or aggregate functions in one line of code.

We enter a list of functions we want to compute in the aggfunc argument and the output is one large DataFrame with each statistic computed.

Below we find the minimum, maximum, and mean salary dependent on experience level and company size:

In [35]:
salary_data.pivot_table('salary_in_usd', index='experience_level', columns='company_size', aggfunc=['min', 'max', 'mean'])

Unnamed: 0_level_0,min,min,min,max,max,max,mean,mean,mean
company_size,L,M,S,L,M,S,L,M,S
experience_level,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
EN,5898,4000,6072,250000,100000,138000,75148.0,41063.923077,57502.0
EX,79833,85000,70329,600000,85000,416000,239729.875,85000.0,243164.5
MI,6072,4000,2876,450000,450000,110000,96285.451613,83982.8,47610.0
SE,21843,19052,33511,412000,195000,260000,134465.604651,122572.125,120978.055556


In general, the ability to merge, group, or pivot DataFrames provides easy access to gather statistics about a DataFrame. These operations also provide a way better visualize or sort grouped data.