# Case Study on Measures of Central Tendency and Dispersion
An Institution wishes to find out their student’s ability in maths, reading and
writing skills. The Institution wants to do an exploratory study to check the
following information.
1. Find out how many males and females participated in the test.
2. What do you think about the students' parental level of education?
3. Who scores the most on average for math, reading and writing based on
● Gender
● Test preparation course
4. What do you think about the scoring variation for math, reading and writing
based on
● Gender
● Test preparation course
5. The management needs your help to give bonus points to the top 25% of
students based on their maths score, so how will you help the management
to achieve this.

Submitted by : Vishnu Vidyadharan
Submitted on : 29-11-2021

In [280]:
import numpy as np
import pandas as pd
sp=pd.read_excel('E:\PAATSHAALA\Assignments\StudentsPerformance1.xlsx')
sp.head()

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
0,female,group C,some high school,free/reduced,none,0,17,10
1,female,group B,high school,free/reduced,none,8,24,23
2,female,group B,some high school,free/reduced,none,18,32,28
3,female,group B,some college,standard,none,11,38,32
4,female,group C,some college,free/reduced,none,22,39,33


# 1. Find out how many males and females participated in the test.

In [281]:
df=pd.DataFrame(sp)

In [282]:
print('Number of females participated in the test :', df['gender'].value_counts()['female'])
print('Number of males participated in the test :', df['gender'].value_counts()['male'])

Number of females participated in the test : 518
Number of males participated in the test : 482


# 2. What do you think about the students' parental level of education?

In [283]:
sp2=pd.DataFrame(df['parental level of education'].value_counts().sort_values(ascending=False))

In [284]:
sp2

Unnamed: 0,parental level of education
some college,226
associate's degree,222
high school,196
some high school,179
bachelor's degree,118
master's degree,59


In [285]:
sp.groupby('parental level of education').mean()

Unnamed: 0_level_0,math score,reading score,writing score
parental level of education,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
associate's degree,67.977477,71.018018,70.031532
bachelor's degree,69.288136,73.0,73.381356
high school,61.821429,64.602041,62.403061
master's degree,70.254237,75.949153,75.677966
some college,67.128319,69.566372,69.035398
some high school,63.134078,66.759777,64.888268


Insights:-
    
    From the count based on the parental level education of students, majority had higher educations as there are parents 
    with Master's, bachelor's degree, associate''s degree or atleast some college degree which covers more than 60%.  
    
    Parents with only high school or some high school level education is less than 40 %
    
    By finding the mean of the student's scores in distinct skills, it is clear that higher parental level of eductaion is 
    influencing student's performance.
    
    Higher parental level education, higher is the performance level of those students.
    
    

# 3. Who scores the most on average for math, reading and writing based on ● Gender ● Test preparation course

In [286]:
sp.groupby('gender').mean()

Unnamed: 0_level_0,math score,reading score,writing score
gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,63.376448,72.590734,72.467181
male,68.821577,65.545643,63.446058


Comparing the mean genderwise, it looks like females are performing better when it comes to reading and writing but
males are better than females in Maths.

In [287]:
sp.groupby('test preparation course').mean()

Unnamed: 0_level_0,math score,reading score,writing score
test preparation course,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
completed,69.96648,74.175978,74.684358
none,63.78972,66.417445,64.457944


It clearly shows that students who completed test preparation course are scoring high marks in all areas than those who didn't.

# 4. What do you think about the scoring variation for math, reading and writing        based on ● Gender ● Test preparation course

In [288]:
sp.groupby('gender').mean()

Unnamed: 0_level_0,math score,reading score,writing score
gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,63.376448,72.590734,72.467181
male,68.821577,65.545643,63.446058


In [289]:
sp.groupby('gender').std()

Unnamed: 0_level_0,math score,reading score,writing score
gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,16.029928,14.411018,14.844842
male,14.556411,14.149594,14.227225


Based on scoring variation, males are somewhat consistent with the range of performance in all listed skills
Females performance variation is on the higher side comparing all 3 skills and notably they have higher variation in
Maths score compared to their own scores in other skills.

In [290]:
sp.groupby('test preparation course').mean()

Unnamed: 0_level_0,math score,reading score,writing score
test preparation course,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
completed,69.96648,74.175978,74.684358
none,63.78972,66.417445,64.457944


In [291]:
sp.groupby('test preparation course').std()

Unnamed: 0_level_0,math score,reading score,writing score
test preparation course,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
completed,14.521847,13.537572,13.236412
none,15.705689,14.608896,15.041667


Students who didn't complete test prepearation course shows higher variations in their scores in all skills whereas
variation of those who completed the course shows consistency in the variations in all skills.

# 5.The management needs your help to give bonus points to the top 25% of students based on their maths score, so how will you help the management to achieve this.

In [292]:
df2=df.sort_values(by =['math score'],ascending=False)

In [293]:
i=.25*(1000+1)

In [294]:
i

250.25

In [295]:
df2.head(250)

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
999,male,group E,bachelor's degree,standard,completed,100,100,100
996,male,group A,some college,standard,completed,100,96,86
515,female,group E,some college,standard,none,100,92,97
517,female,group E,associate's degree,standard,none,100,100,100
516,female,group E,bachelor's degree,standard,none,100,100,100
...,...,...,...,...,...,...,...,...
856,male,group E,some high school,standard,completed,77,76,77
855,male,group E,associate's degree,free/reduced,completed,77,69,68
854,male,group D,some high school,standard,completed,77,68,69
853,male,group D,associate's degree,free/reduced,none,77,78,73


Insights:-
    Data is sorted in ascending order of Math score and saved to a dataframe. 
    Finding the 1st quartile value, it is 250.25 and hence the position is 250th.
    That means the first 250 students in the latest dataframe is taken as the top 25 % of students based on their maths score