# Case Study on Measures of Central Tendency and Dispersion

#### An Institution wishes to find out their student’s ability in maths, reading and writing skills. The Institution wants to do an exploratory study to check the following information.

#### 1. Find out how many males and females participated in the test.

#### 2. What do you think about the students' parental level of education?


#### 3. Who scores the most on average for math, reading and writing based on
● Gender
● Test preparation course

#### 4. What do you think about the scoring variation for math, reading and writing based on
● Gender
● Test preparation course

#### 5. The management needs your help to give bonus points to the top 25% of students based on their maths score, so how will you help the management to achieve this.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

In [2]:
stud_data = pd.read_csv(r'C:\Users\Dell\Downloads\StudentsPerformance.csv')

In [3]:
stud_data

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
0,female,group C,some high school,free/reduced,none,0,17,10
1,female,group B,high school,free/reduced,none,8,24,23
2,female,group B,some high school,free/reduced,none,18,32,28
3,female,group B,some college,standard,none,11,38,32
4,female,group C,some college,free/reduced,none,22,39,33
...,...,...,...,...,...,...,...,...
995,male,group E,some college,standard,completed,99,87,81
996,male,group A,some college,standard,completed,100,96,86
997,male,group D,some college,standard,completed,100,97,99
998,male,group E,associate's degree,free/reduced,completed,100,100,93


## 1. Find out how many males and females participated in the test.

In [11]:
male_data = stud_data.loc[stud_data['gender'] == 'male'].count()
female_data = stud_data.loc[stud_data['gender'] == 'female'].count()
print('No. of Male students =',male_data['gender'])
print('No. of Female students =',female_data['gender'])

No. of Male students = 482
No. of Female students = 518


### Insight:
We can see that there are 36 more female participants for the test than the male partcipants.

## 2. What do you think about the students' parental level of education?

In [12]:
education_parent = stud_data["parental level of education"].value_counts()
education_parent

some college          226
associate's degree    222
high school           196
some high school      179
bachelor's degree     118
master's degree        59
Name: parental level of education, dtype: int64

### Insights:
It is clear from the above output that 44.8% of parents are either gone to some college or have an associate degree. 

Only very least percentage(18%) of the parents are either graduates or post graduates.


## 3. Who scores the most on average for math, reading and writing based on

● Gender
● Test preparation course

In [16]:
gender_based_avg = stud_data.groupby('gender')['math score','reading score','writing score'].mean()
gender_based_avg

Unnamed: 0_level_0,math score,reading score,writing score
gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,63.376448,72.590734,72.467181
male,68.821577,65.545643,63.446058


In [17]:
prep_based_avg = stud_data.groupby('test preparation course')['math score','reading score','writing score'].mean()
prep_based_avg

Unnamed: 0_level_0,math score,reading score,writing score
test preparation course,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
completed,69.96648,74.175978,74.684358
none,63.78972,66.417445,64.457944


### Insights:
The students who completed the test prepation course have a better avregare marks compared to others who have not completed the course.

Also, it is clear that the average marks of male students is higher than female students in math test where as female students performed better in reading and writing tests.

## 4. What do you think about the scoring variation for math, reading and writing based on

● Gender
● Test preparation course

In [18]:
gender_based_std = stud_data[['test preparation course',
      'gender',
      'math score',
      'writing score',
      'reading score']].groupby(['gender']).agg(['var','std'])
gender_based_std

Unnamed: 0_level_0,math score,math score,writing score,writing score,reading score,reading score
Unnamed: 0_level_1,var,std,var,std,var,std
gender,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
female,256.958593,16.029928,220.369327,14.844842,207.677438,14.411018
male,211.889097,14.556411,202.413924,14.227225,200.21101,14.149594


In [19]:
prep_based_std = stud_data[['test preparation course',
      'gender',
      'math score',
      'writing score',
      'reading score']].groupby(['test preparation course']).agg(['var','std'])
prep_based_std

Unnamed: 0_level_0,math score,math score,writing score,writing score,reading score,reading score
Unnamed: 0_level_1,var,std,var,std,var,std
test preparation course,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
completed,210.884027,14.521847,175.202612,13.236412,183.265864,13.537572
none,246.668662,15.705689,226.251739,15.041667,213.419851,14.608896


### Insight:
For all the 3 subjects, the students who have completed the test preparation course	have more consistent scores compared to the students who haven't completed the course, since the standard deviation is lesser for the former than the latter.

In the case of male students, the standard deviation for maths, wrting and reading test scores is almost similiar(14.55, 14.22, 14.14) which means that there is not much variation in the data in all 3 categories.

For female students, the standard deviation is showing a difference in math score(16.02) compared to writing(14.84) and reading(14.41) scores. Thus, we could conclude that the data is a little bit more spread in this case. That is, we can see a diffenece in pattern here.

## 5. The management needs your help to give bonus points to the top 25% of students based on their maths score, so how will you help the management to achieve this.

In [21]:
sorting_math= stud_data.sort_values(by=['math score'], ascending=False)
percent = int(input('Enter the required percentage of students eligible for bonus points: '))
sorting_math.head(int(len(sorting_math)*(percent/100)))

Enter the required percentage of students eligible for bonus points: 25


Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
999,male,group E,bachelor's degree,standard,completed,100,100,100
996,male,group A,some college,standard,completed,100,96,86
515,female,group E,some college,standard,none,100,92,97
517,female,group E,associate's degree,standard,none,100,100,100
516,female,group E,bachelor's degree,standard,none,100,100,100
...,...,...,...,...,...,...,...,...
856,male,group E,some high school,standard,completed,77,76,77
855,male,group E,associate's degree,free/reduced,completed,77,69,68
854,male,group D,some high school,standard,completed,77,68,69
853,male,group D,associate's degree,free/reduced,none,77,78,73


### Insights:
From the above result, we can see the top 25% of students based on the math score who are eligible for bonus points. Students who have score 77 and above for maths are eligible for bonus points.