# Analyzing Student Performance in Bangladesh: Insights for Academic Excellence

## **Overview**

This notebook aims to analyze the **Student Performance-BD Dataset**, which contains academic and demographic data of students in Bangladesh. The dataset includes 24 attributes covering aspects such as demographics, parental involvement, academic performance, and extracurricular activities. By exploring this data, we seek to uncover actionable insights to enhance educational outcomes and assist students in choosing their best-fit academic groups (Science, Commerce, Arts).

---

## **Objectives**

1. Analyze student demographics, including gender and age distributions.
2. Examine academic performance across subjects and student groups.
3. Identify factors influencing academic outcomes, such as parental education, extracurricular activities, and attendance.
4. Provide insights to help educators and policymakers improve educational strategies.
5. Suggest actionable recommendations for students to achieve academic success.

---

## **Key Questions**

1. What does the dataset look like, and what are its key characteristics?
2. What is the gender and age distribution among students?
3. What are the average scores across different subjects?
4. How does academic performance vary among student groups (Science, Commerce, Arts)?
5. How do parental education levels impact academic performance?
6. What is the role of extracurricular activities in academic success?
7. How does attendance relate to performance in science?
8. Does internet access influence academic outcomes?
9. Which student group demonstrates the most consistent performance?

---

## **Methods**

To answer these questions, we will utilize:
- **Pandas**: For data manipulation and summary statistics.
- **Matplotlib & Seaborn**: For creating visualizations to understand trends and relationships.

---

## **Expected Outcomes**

By the end of this analysis, we will:
- Gain a comprehensive understanding of student performance patterns.
- Identify key factors influencing academic outcomes.
- Provide data-driven insights to improve educational strategies and support students in making informed academic decisions.


In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

data = pd.read_csv('/kaggle/input/student-performance-bd/bd_students_per.csv')

data.head()


Unnamed: 0,id,full_name,age,gender,location,family_size,mother_education,father_education,mother_job,father_job,...,tutoring,school_type,attendance,extra_curricular_activities,english,math,science,social_science,art_culture,stu_group
0,2,Avi Biswas,16,Male,Urban,6,SSC,HSC,No,No,...,Yes,Private,95,Yes,95,98,92,94,98,Science
1,3,Taslima Sultana,18,Female,Rural,6,SSC,HSC,No,Yes,...,No,Semi_Govt,92,No,65,71,40,78,80,Commerce
2,4,Md Adilur Rahman,15,Male,Rural,4,SSC,SSC,Yes,Yes,...,Yes,Govt,81,Yes,64,78,58,86,74,Commerce
3,5,Saleh Ahmed,16,Male,Rural,6,SSC,SSC,Yes,Yes,...,Yes,Private,90,Yes,84,90,85,86,88,Science
4,6,Din Islam,17,Male,Urban,5,Honors,Masters,No,Yes,...,Yes,Semi_Govt,75,Yes,54,70,45,79,76,Commerce


In [2]:
df.info()

NameError: name 'df' is not defined

## 2. What is the gender distribution of students?

In [None]:
gender_count = data['gender'].value_counts()

# Plot gender distribution
sns.barplot(x=gender_count.index, y=gender_count.values, palette='pastel')
plt.title('Gender Distribution of Students')
plt.xlabel('Gender')
plt.ylabel('Number of Students')
plt.show()

## How is the age of students distributed?

In [None]:
sns.histplot(data['age'], kde=True, bins=10, color='skyblue')
plt.title('Age Distribution of Students')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.show()

## What are the average scores in each subject?

In [None]:
# Calculate average scores
average_scores = data[['english', 'math', 'science', 'social_science', 'art_culture']].mean()

# Plot average scores
average_scores.plot(kind='bar', color='purple')
plt.title('Average Scores by Subject')
plt.xlabel('Subjects')
plt.ylabel('Average Score')
plt.show()


## How does academic performance vary by student group (Science, Commerce, Arts)?

In [None]:
# Grouped subject averages
group_scores = data.groupby('stu_group')[['english', 'math', 'science', 'social_science', 'art_culture']].mean()

# Plot subject-wise averages
group_scores.T.plot(kind='bar', figsize=(10, 6))
plt.title('Subject Performance by Student Group')
plt.xlabel('Subjects')
plt.ylabel('Average Score')
plt.legend(title='Student Group')
plt.show()


## Does parental education level affect student performance?

In [None]:
# Plot parental education vs average math score
sns.boxplot(data=data, x='mother_education', y='math', palette='Set3')
plt.title('Math Scores by Mother\'s Education Level')
plt.xlabel('Mother\'s Education Level')
plt.ylabel('Math Score')
plt.show()


In [None]:
sns.boxplot(data=data, x='father_education', y='math', palette='Set2')
plt.title('Math Scores by Father\'s Education Level')
plt.xlabel('Father\'s Education Level')
plt.ylabel('Math Score')
plt.show()


##  How does extracurricular activity participation affect performance?

In [None]:
# Average scores with/without extracurricular activities
extracurricular_performance = data.groupby('extra_curricular_activities')[['english', 'math', 'science']].mean()

# Plot results
extracurricular_performance.T.plot(kind='bar', figsize=(10, 6), color=['skyblue', 'salmon'])
plt.title('Performance vs. Extracurricular Activities')
plt.xlabel('Subjects')
plt.ylabel('Average Score')
plt.legend(title='Participation')
plt.show()


## How does attendance relate to performance?

In [None]:
# Plot attendance vs average science score
sns.scatterplot(data=data, x='attendance', y='science', hue='stu_group', palette='coolwarm')
plt.title('Attendance vs Science Score')
plt.xlabel('Attendance (%)')
plt.ylabel('Science Score')
plt.show()


## Does internet access affect academic performance?

In [None]:
# Boxplot of internet access vs math score
sns.boxplot(data=data, x='internet_access', y='math', palette='pastel')
plt.title('Math Scores by Internet Access')
plt.xlabel('Internet Access')
plt.ylabel('Math Score')
plt.show()


##  Which student group shows the most consistent performance?

In [None]:
# Calculate standard deviation of scores for each group
group_std = data.groupby('stu_group')[['english', 'math', 'science', 'social_science', 'art_culture']].std()

# Plot the consistency
group_std.T.plot(kind='bar', figsize=(10, 6), color=['gold', 'lightgreen', 'lightcoral'])
plt.title('Consistency of Performance by Group')
plt.xlabel('Subjects')
plt.ylabel('Standard Deviation (Lower is Better)')
plt.legend(title='Student Group')
plt.show()
