# AI-Based Student Performance Prediction System
### Internship Program – Artificial Intelligence
**Presented by: SHIV (SKILL ORBIT)**

---

## ⚙️ Module Breakdown (Step-by-Step)

This project follows the official module structure for the AI/ML Internship submission:
1. **Data Collection**
2. **Data Preprocessing**
3. **Data Visualization**
4. **Model Selection**
5. **Model Training**
6. **Model Testing**
7. **Result & Conclusion**

## 1️⃣ Module 1: Data Collection

In this module, we collect or create our student academic data and store it in CSV format.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
import pickle

# Load the dataset
data_path = '../data/students_data.csv'
df = pd.read_csv(data_path)

print(f"Dataset successfully loaded with {len(df)} student records.")
df.head()

## 2 Module 2: Data Preprocessing

We handle missing values, encode categorical data (Pass/Fail → 1/0), and prepare features for training.

In [None]:
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

# 1. Handle Missing Values & Scaling
X = df.drop(columns=['result', 'final_score'])
y_class = df['result'].map({'Pass': 1, 'Fail': 0}) # Classification target
y_reg = df['final_score'] # Regression target

num_pipeline = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
])

preprocessor = ColumnTransformer([
    ('num_pipeline', num_pipeline, X.columns)
])

print("Data Preprocessing pipeline ready.")

## 3️⃣ Module 3: Data Visualization

Visualizing patterns between study habits and performance, as required by the project description.

In [None]:
sns.set_style('whitegrid')
plt.figure(figsize=(12, 5))

# Study hours vs marks
plt.subplot(1, 2, 1)
sns.regplot(x='study_hours', y='final_score', data=df, scatter_kws={'alpha':0.5})
plt.title('Study Hours vs Final Score')

# Attendance vs result
plt.subplot(1, 2, 2)
sns.boxplot(x='result', y='attendance', data=df)
plt.title('Attendance vs Result')

plt.tight_layout()
plt.show()

## 4 Module 4: Model Selection

We select models based on the project requirements:
- **Logistic Regression** (Binary Classification for Pass/Fail)
- **Linear Regression** (for Final Score prediction)
- *Note: Logistic Regression is chosen for the production website due to superior classification performance.*

In [None]:
from sklearn.linear_model import LogisticRegression, LinearRegression

log_reg = LogisticRegression()
lin_reg = LinearRegression()

print("Models selected: Logistic Regression and Linear Regression.")

## 5 Model Training

Splitting data into 80% training and 20% testing sets.

In [None]:
from sklearn.model_selection import train_test_split

# Focus on Classification (Pass/Fail) using Logistic Regression
X_train, X_test, y_train, y_test = train_test_split(X, y_class, test_size=0.2, random_state=42)

# Fit Preprocessor
X_train_processed = preprocessor.fit_transform(X_train)
X_test_processed = preprocessor.transform(X_test)

# Train Logistic Regression Model
log_reg.fit(X_train_processed, y_train)
print("Model Training (Logistic Regression) Completed.")

## 6 Model Testing

Checking accuracy score and reviewing model performance.

In [None]:
from sklearn.metrics import accuracy_score, classification_report

y_pred = log_reg.predict(X_test_processed)
accuracy = accuracy_score(y_test, y_pred)

print(f"Model Accuracy (Logistic Regression): {accuracy*100:.2f}%")
print("\nClassification Report:\n")
print(classification_report(y_test, y_pred))

## 7 Result & Conclusion

The project successfully predicts student performance using Logistic Regression. We now save our model artifacts for the web application.

In [None]:
# Save artifacts
os.makedirs('../artifacts', exist_ok=True)
pickle.dump(log_reg, open('../artifacts/model.pkl', 'wb'))
pickle.dump(preprocessor, open('../artifacts/preprocessor.pkl', 'wb'))

print("✅ Project Modules Completed. Logistic Regression model and Preprocessor saved.")