# Task 01: Prediction using Supervised ML

## Submitted By: Yashuv Baskota
### Language- Python
### Data: http://bit.ly/w-data

#### Description:
To predict the percentage of a student based on the number of study hours, I have used a `linear regression` model. A linear regression model is a statistical model that allows us to predict a continuous outcome variable (in this case, the percentage score) based on one or more predictor variables (in this case, the number of study hours).

## 1. Load Libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

## 2. Load Dataset

In [None]:
data = pd.read_csv('data/data.csv')
data.head()

## 3. EDA

### Check missing values

In [None]:
data.isnull().sum()

### Calculate summary statistics

In [None]:
data.describe()

### Check for outliers

In [None]:
plt.boxplot(data.Scores)
plt.show()

### Data Visualization

In [None]:
plt.scatter(data.Hours, data.Scores)
plt.xlabel("Study hours")
plt.ylabel("Scores")
plt.show()

### Plot Line of Best Fit

In [None]:
# Fit the model to the data
model = LinearRegression()
X = data.Hours.values.reshape(-1,1)
y = data.Scores.values
model.fit(X, y)

# Calculate the curve equation using the slope and y-intercept of the model
m = model.coef_[0]
b = model.intercept_
curve = f"Y = {m:.2f}X + {b:.2f}"

# Plot the data and the regression line
plt.scatter(data.Hours, data.Scores)
plt.plot(X, model.predict(X), color='red')

# Add the curve equation to the plot
plt.title("Regression Line")
plt.text(6, 50, curve, fontsize=14, fontweight='bold')
plt.show()

## 4. Model Building and Performance Evaluation

In [None]:
# feature variable (Hours)
X = data.Hours.values.reshape(-1, 1)

# target variable (Score)
y = data.Scores.values

In [None]:
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [None]:
# Fit the model to the training data
model = LinearRegression()
model.fit(X_train, y_train)

In [None]:
# The predicted labels of model
y_pred = model.predict(X_test)

### Evaluation Metrics:

In [None]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Make predictions on the test set
predictions = model.predict(X_test)

# Calculate evaluation metrics
mae = mean_absolute_error(y_test, predictions)
mse = mean_squared_error(y_test, predictions)
r2 = r2_score(y_test, predictions)

print(f"MAE: {mae:.2f}")
print(f"MSE: {mse:.2f}")
print(f"R^2: {r2:.2f}")

## 5. Making Prediction (Given study hrs/day, predict % of student)

In [None]:
#  prediction function for % score given study hrs/day as input
def make_prediction(input):
    prediction = model.predict([[input]])
    print(f"Percentage of Student: {prediction[0]:.2f}")

In [None]:
# Make a prediction for a student who studies for 9.25 hours per day
make_prediction(9.25)

<br>

__Thank You!__