# Student Academic Performance  Analysis with Pandas
📊 Student Academic Performance Analysis with Pandas
This Python-based project analyzes the academic performance of students using the powerful Pandas library along with Matplotlib and Seaborn for data visualization. The dataset includes student information from various African countries and contains scores in Math, Reading, and Writing.

🔹 Key Steps and Features
1. Data Loading
The dataset is read directly from a GitHub URL in CSV format.

It contains 100 student records with attributes such as name, age, gender, country, school, and academic scores.

2. Data Exploration
Displays all student records and column names.

Columns include:

StudentID, Name, Country, Gender, Age

Math_Score, Reading_Score, Writing_Score

School, Class

3. Feature Engineering
Adds a new column: total_score, which is the sum of Math, Reading, and Writing scores.

Computes percentage_score based on a maximum of 300 points, rounded to two decimal places.

Assigns a Status of "Passed" or "Failed" based on whether the percentage is 50 or above.

4. Visualization
Uses Seaborn and Matplotlib to plot a box plot of Math scores categorized by gender, revealing performance trends across sexes.

5. Data Cleaning and Renaming
Renames Math_Score to Mathematics_Score and Gender to Sex for clarity.

Drops the School column to simplify the dataset.

6. Saving Processed Data
The modified DataFrame is exported to multiple formats:

CSV: student_result.csv

Excel: my_excel.xlsm

You can also export it to SQL, JSON, or HTML if needed.

🔍 Final Outcome
The dataset now contains:

Cleaned and enriched information.

Added insights through total_score, percentage_score, and Status.

Visualization that helps to understand the gender-wise distribution in Math performance.

Data ready for reporting or further machine learning tasks.

In [None]:
#installing pandas to our environment 
!pip istall pandas

In [1]:
# Import pandas library
import pandas as pd 

# Define the URL for the CSV file
url = "https://raw.githubusercontent.com/ritaafrica/data/refs/heads/main/student_scores.csv"

# Read the CSV file into a DataFrame
df = pd.read_csv(url)





# Displaying Student Academic Performance

In [2]:
# Display the 'Student Academic Performance' DataFrame
df

Unnamed: 0,StudentID,Name,Country,Gender,Age,Math_Score,Reading_Score,Writing_Score,School,Class
0,1,Ifeanyi Mugisha,Zimbabwe,Female,18,99,77,12,Nelson Mandela School,JSS1
1,2,Yemi Okeke,Tanzania,Male,16,60,29,98,Ubuntu Academy,JSS2
2,3,Fatou Mugisha,Zimbabwe,Male,15,49,46,84,Ubuntu Academy,SS3
3,4,Chinedu Okafor,Ethiopia,Female,17,34,57,45,Ubuntu Academy,SS2
4,5,Yemi Moyo,Senegal,Male,16,22,16,38,Ubuntu Academy,SS3
...,...,...,...,...,...,...,...,...,...,...
95,96,Thabo Mensah,Tanzania,Male,17,31,86,75,Freedom College,SS1
96,97,Jelani Obasi,Nigeria,Female,15,75,3,62,Freedom College,JSS3
97,98,Abdi Okafor,Nigeria,Male,16,57,68,54,Freedom College,JSS1
98,99,Zanele Banda,Zambia,Female,17,74,20,44,Freedom College,JSS2


In [20]:
# Display all the attributes (columns) the student Academic student performance
df.columns

Index(['StudentID', 'Name', 'Country', 'Gender', 'Age', 'Math_Score',
       'Reading_Score', 'Writing_Score', 'School', 'Class'],
      dtype='object')

# Adding new columns to an exsisting DataFrame

In [32]:
# Calculate the total score by summing Math, Reading, and Writing scores
df["total_score"] = df['Math_Score'] + df['Reading_Score'] + df['Writing_Score']

In [22]:
#Displaying the first 10 rows for checking the dataframe with the new column
df.head(10)

Unnamed: 0,StudentID,Name,Country,Gender,Age,Math_Score,Reading_Score,Writing_Score,School,Class,total_score
0,1,Ifeanyi Mugisha,Zimbabwe,Female,18,99,77,12,Nelson Mandela School,JSS1,188
1,2,Yemi Okeke,Tanzania,Male,16,60,29,98,Ubuntu Academy,JSS2,187
2,3,Fatou Mugisha,Zimbabwe,Male,15,49,46,84,Ubuntu Academy,SS3,179
3,4,Chinedu Okafor,Ethiopia,Female,17,34,57,45,Ubuntu Academy,SS2,136
4,5,Yemi Moyo,Senegal,Male,16,22,16,38,Ubuntu Academy,SS3,76
5,6,Thabo Banda,Ethiopia,Male,16,73,60,9,Nelson Mandela School,JSS2,142
6,7,Fatou Toure,Ethiopia,Male,16,53,83,80,Nelson Mandela School,JSS1,216
7,8,Chinedu Maphosa,Zambia,Female,18,21,52,88,Freedom College,JSS2,161
8,9,Amina Abebe,Senegal,Male,16,3,62,16,Ubuntu Academy,SS2,81
9,10,Jelani Moyo,Zambia,Male,15,82,64,92,Ubuntu Academy,SS1,238


# Calculate the percentage score based on the total score

In [None]:
# Calculate the percentage score based on the total score
df["percentage_score"] = (df["total_score"] / 300) * 100

# Round the percentage score to two decimal places
df["percentage_score"] = df["percentage_score"].round(2)

# Display the first 7 rows of the updated DataFrame
df.head(7)

# Student Status computation using student Percentage score

In [36]:
# Initialize 'Status' column as an empty column 
df["Status"] = ''

# Loop through each row of the DataFrame
for i in range(len(df)):
    # Check if the current 'percentage_score' is greater than or equal to 50
    if df.at[i, "percentage_score"] >= 50:
        # Assign 'Passed' to the 'Status' column for scores 50 and above
        df.at[i, "Status"] = "Passed"
    else:
        # Assign 'Failed' to the 'Status' column for scores below 50
        df.at[i, "Status"] = "Failed"

# Display the updated DataFrame
df

Unnamed: 0,StudentID,Name,Country,Gender,Age,Math_Score,Reading_Score,Writing_Score,School,Class,total_score,percentage_score,Status
0,1,Ifeanyi Mugisha,Zimbabwe,Female,18,99,77,12,Nelson Mandela School,JSS1,188,62.67,Passed
1,2,Yemi Okeke,Tanzania,Male,16,60,29,98,Ubuntu Academy,JSS2,187,62.33,Passed
2,3,Fatou Mugisha,Zimbabwe,Male,15,49,46,84,Ubuntu Academy,SS3,179,59.67,Passed
3,4,Chinedu Okafor,Ethiopia,Female,17,34,57,45,Ubuntu Academy,SS2,136,45.33,Failed
4,5,Yemi Moyo,Senegal,Male,16,22,16,38,Ubuntu Academy,SS3,76,25.33,Failed
...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,96,Thabo Mensah,Tanzania,Male,17,31,86,75,Freedom College,SS1,192,64.00,Passed
96,97,Jelani Obasi,Nigeria,Female,15,75,3,62,Freedom College,JSS3,140,46.67,Failed
97,98,Abdi Okafor,Nigeria,Male,16,57,68,54,Freedom College,JSS1,179,59.67,Passed
98,99,Zanele Banda,Zambia,Female,17,74,20,44,Freedom College,JSS2,138,46.00,Failed


# Visualizing the student academic Performance using Gender and Math_score coluns

In [None]:
# Import the pyplot module from Matplotlib for plotting
import matplotlib.pyplot as plt

# Import Seaborn for enhanced statistical visualizations
import seaborn as sns

# Create a new figure with specified dimensions (width, height)
plt.figure(figsize=(12, 8))

# Create a box plot to visualize the distribution of Math scores based on Gender
sns.boxplot(data=df, x='Gender', y='Math_Score')

# Add a title to the box plot for context
plt.title('Math Score Distribution by Gender"\n\n\n"')

# Display the plot
plt.show()

# Renaming Columns in a given dataframe

In [7]:
# Rename the 'Math_Score' column to 'Mathematics_Score' for clarity and consistency
df.rename(columns={'Math_Score': 'Mathematics_Score'}, inplace=True)

# Rename the 'Gender' column to 'sex' for clarity and consistency
df.rename(columns={'Gender':'Sex'}, inplace=True)






In [38]:
df

Unnamed: 0,StudentID,Name,Country,Sex,Age,Mathematics_Score,Reading_Score,Writing_Score,School,Class,total_score,percentage_score,Status
0,1,Ifeanyi Mugisha,Zimbabwe,Female,18,99,77,12,Nelson Mandela School,JSS1,188,62.67,Passed
1,2,Yemi Okeke,Tanzania,Male,16,60,29,98,Ubuntu Academy,JSS2,187,62.33,Passed
2,3,Fatou Mugisha,Zimbabwe,Male,15,49,46,84,Ubuntu Academy,SS3,179,59.67,Passed
3,4,Chinedu Okafor,Ethiopia,Female,17,34,57,45,Ubuntu Academy,SS2,136,45.33,Failed
4,5,Yemi Moyo,Senegal,Male,16,22,16,38,Ubuntu Academy,SS3,76,25.33,Failed
...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,96,Thabo Mensah,Tanzania,Male,17,31,86,75,Freedom College,SS1,192,64.00,Passed
96,97,Jelani Obasi,Nigeria,Female,15,75,3,62,Freedom College,JSS3,140,46.67,Failed
97,98,Abdi Okafor,Nigeria,Male,16,57,68,54,Freedom College,JSS1,179,59.67,Passed
98,99,Zanele Banda,Zambia,Female,17,74,20,44,Freedom College,JSS2,138,46.00,Failed


# Droping a column from a dataframe

In [None]:
df.drop(columns={'School'}, inplace=True)

In [61]:
df

Unnamed: 0,StudentID,Name,Country,Gender,Age,Math_Score,Reading_Score,Writing_Score,Class,total_score,percentage_score,status,tatus,Tesfit,Status
0,1,Ifeanyi Mugisha,Zimbabwe,Female,18,99,77,12,JSS1,188,62.67,passed,Passed,Passed,Passed
1,2,Yemi Okeke,Tanzania,Male,16,60,29,98,JSS2,187,62.33,passed,Passed,Passed,Passed
2,3,Fatou Mugisha,Zimbabwe,Male,15,49,46,84,SS3,179,59.67,passed,Passed,Passed,Passed
3,4,Chinedu Okafor,Ethiopia,Female,17,34,57,45,SS2,136,45.33,failed,Failed,Failed,Failed
4,5,Yemi Moyo,Senegal,Male,16,22,16,38,SS3,76,25.33,failed,Failed,Failed,Failed
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,96,Thabo Mensah,Tanzania,Male,17,31,86,75,SS1,192,64.00,passed,Passed,Passed,Passed
96,97,Jelani Obasi,Nigeria,Female,15,75,3,62,JSS3,140,46.67,failed,Failed,Failed,Failed
97,98,Abdi Okafor,Nigeria,Male,16,57,68,54,JSS1,179,59.67,passed,Passed,Passed,Passed
98,99,Zanele Banda,Zambia,Female,17,74,20,44,JSS2,138,46.00,failed,Failed,Failed,Failed


# Writing a datafrane as CSV,EXCEL,SQL,HTML,JSON format to our system

In [62]:
df.to_csv('student_result.csv',index=False)

In [4]:
pip install openpyxl

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [5]:
df.to_excel("my_excel.xlsm", index=False)

In [None]:
print(df['total_score'].min())