# Sales Prediction (Codsoft)

Importing the necessary dependencies

In [60]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

Loading the Dataset from Google Drive

In [61]:
from google.colab import drive
drive.mount("/content/drive", force_remount=True)

Mounted at /content/drive


Load the dataset

In [62]:
path = '/content/drive/MyDrive/CodSoft/advertising.csv'
data = pd.read_csv(path , encoding='latin-1')

Exploring the data


Display the first few rows of the dataset

In [63]:
data.head()

Unnamed: 0,TV,Radio,Newspaper,Sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,12.0
3,151.5,41.3,58.5,16.5
4,180.8,10.8,58.4,17.9


Display concise summary of the dataset

In [64]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   TV         200 non-null    float64
 1   Radio      200 non-null    float64
 2   Newspaper  200 non-null    float64
 3   Sales      200 non-null    float64
dtypes: float64(4)
memory usage: 6.4 KB


Display the shape of the dataset

In [65]:
data.shape

(200, 4)

Display the size of the dataset

In [66]:
data.size

800

Cheking the Statistical informstion of the data

In [67]:
data.describe()

Unnamed: 0,TV,Radio,Newspaper,Sales
count,200.0,200.0,200.0,200.0
mean,147.0425,23.264,30.554,15.1305
std,85.854236,14.846809,21.778621,5.283892
min,0.7,0.0,0.3,1.6
25%,74.375,9.975,12.75,11.0
50%,149.75,22.9,25.75,16.0
75%,218.825,36.525,45.1,19.05
max,296.4,49.6,114.0,27.0


Cheking for missing values in dataset

In [68]:
data.isnull().sum()

Unnamed: 0,0
TV,0
Radio,0
Newspaper,0
Sales,0


Splitting the dataset in to features X and target Y

In [69]:
X = data[['TV', 'Radio', 'Newspaper']]
y = data['Sales']

Splitting the dataset into training and testing data (80% training, 20% testing)

In [70]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Training the Linear Regression model

In [71]:
model = LinearRegression()

In [72]:
model.fit(X_train , y_train)

Making the predictions

In [73]:
y_pred = model.predict(X_test)

Evaluating the model

In [74]:
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("\nModel Evaluation:")
print("Mean Squared Error:", mse)
print("R-squared:", r2)


Model Evaluation:
Mean Squared Error: 2.9077569102710896
R-squared: 0.9059011844150826


Cheking the Model with user define inputs

In [76]:
tv_ad = float(input("Enter the TV advertising expenditure: "))
radio_ad = float(input("Enter the Radio advertising expenditure: "))
newspaper_ad = float(input("Enter the Newspaper advertising expenditure: "))

# Creating a new DataFrame with the user input
new_data = pd.DataFrame({'TV': [tv_ad], 'Radio': [radio_ad], 'Newspaper': [newspaper_ad]})

# Making predictions using the trained model
predicted_sales = model.predict(new_data)


print("\nPredicted Sales for the given advertising expenditures:")
print(f"${predicted_sales[0]:,.3f}")

Enter the TV advertising expenditure: 7
Enter the Radio advertising expenditure: 7
Enter the Newspaper advertising expenditure: 7

Predicted Sales for the given advertising expenditures:
$5.833
