# Linear Regression
Linear regression is a type of statistical modeling technique used to analyze the relationship between a dependent variable and one or more independent variables. It is a supervised learning algorithm that is commonly used for predictive modeling, where the goal is to predict the value of the dependent variable based on the values of the independent variables.

In linear regression, a linear relationship is assumed between the dependent variable and the independent variables, which means that the relationship can be represented by a straight line. The line is determined by finding the best-fit line that minimizes the distance between the predicted values and the actual values of the dependent variable.

The equation of the linear regression line is typically expressed as y = a + bx, where y is the dependent variable, x is the independent variable, a is the intercept or the value of y when x is 0, and b is the slope or the change in y per unit change in x.

Linear regression can be used for both simple and multiple regression, depending on the number of independent variables used. Simple linear regression involves one independent variable, while multiple linear regression involves two or more independent variables.

Linear regression is widely used in various fields, such as finance, economics, biology, social sciences, and engineering, to analyze and predict the relationship between variables. It is a powerful tool that can provide insights and help make informed decisions based on data.





# Uber Price Prediction

In [33]:
#Loading the dataset
import numpy as np
import pandas as pd
#model_selection
from sklearn.model_selection import train_test_split
#StandardScaler
from sklearn.preprocessing import StandardScaler
#linear model
from sklearn.linear_model import LinearRegression

# Loading the Dataset

In [34]:
cab_df=pd.read_csv('/kaggle/input/uber-lyft-cab-prices/cab_rides.csv')
#showing the dataset
weather_df=pd.read_csv('/kaggle/input/uber-lyft-cab-prices/weather.csv')
weather_df

Unnamed: 0,temp,location,clouds,pressure,rain,time_stamp,humidity,wind
0,42.42,Back Bay,1.00,1012.14,0.1228,1545003901,0.77,11.25
1,42.43,Beacon Hill,1.00,1012.15,0.1846,1545003901,0.76,11.32
2,42.50,Boston University,1.00,1012.15,0.1089,1545003901,0.76,11.07
3,42.11,Fenway,1.00,1012.13,0.0969,1545003901,0.77,11.09
4,43.13,Financial District,1.00,1012.14,0.1786,1545003901,0.75,11.49
...,...,...,...,...,...,...,...,...
6271,44.72,North Station,0.89,1000.69,,1543819974,0.96,1.52
6272,44.85,Northeastern University,0.88,1000.71,,1543819974,0.96,1.54
6273,44.82,South Station,0.89,1000.70,,1543819974,0.96,1.54
6274,44.78,Theatre District,0.89,1000.70,,1543819974,0.96,1.54


In [35]:
cab_df

Unnamed: 0,distance,cab_type,time_stamp,destination,source,price,surge_multiplier,id,product_id,name
0,0.44,Lyft,1544952607890,North Station,Haymarket Square,5.0,1.0,424553bb-7174-41ea-aeb4-fe06d4f4b9d7,lyft_line,Shared
1,0.44,Lyft,1543284023677,North Station,Haymarket Square,11.0,1.0,4bd23055-6827-41c6-b23b-3c491f24e74d,lyft_premier,Lux
2,0.44,Lyft,1543366822198,North Station,Haymarket Square,7.0,1.0,981a3613-77af-4620-a42a-0c0866077d1e,lyft,Lyft
3,0.44,Lyft,1543553582749,North Station,Haymarket Square,26.0,1.0,c2d88af2-d278-4bfd-a8d0-29ca77cc5512,lyft_luxsuv,Lux Black XL
4,0.44,Lyft,1543463360223,North Station,Haymarket Square,9.0,1.0,e0126e1f-8ca9-4f2e-82b3-50505a09db9a,lyft_plus,Lyft XL
...,...,...,...,...,...,...,...,...,...,...
693066,1.00,Uber,1543708385534,North End,West End,13.0,1.0,616d3611-1820-450a-9845-a9ff304a4842,6f72dfc5-27f1-42e8-84db-ccc7a75f6969,UberXL
693067,1.00,Uber,1543708385534,North End,West End,9.5,1.0,633a3fc3-1f86-4b9e-9d48-2b7132112341,55c66225-fbe7-4fd5-9072-eab1ece5e23e,UberX
693068,1.00,Uber,1543708385534,North End,West End,,1.0,64d451d0-639f-47a4-9b7c-6fd92fbd264f,8cf7e821-f0d3-49c6-8eba-e679c0ebcf6a,Taxi
693069,1.00,Uber,1543708385534,North End,West End,27.0,1.0,727e5f07-a96b-4ad1-a2c7-9abc3ad55b4e,6d318bcc-22a3-4af6-bddd-b409bfce1546,Black SUV


# Checking for Preliminary Information

In [36]:
cab_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 693071 entries, 0 to 693070
Data columns (total 10 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   distance          693071 non-null  float64
 1   cab_type          693071 non-null  object 
 2   time_stamp        693071 non-null  int64  
 3   destination       693071 non-null  object 
 4   source            693071 non-null  object 
 5   price             637976 non-null  float64
 6   surge_multiplier  693071 non-null  float64
 7   id                693071 non-null  object 
 8   product_id        693071 non-null  object 
 9   name              693071 non-null  object 
dtypes: float64(3), int64(1), object(6)
memory usage: 52.9+ MB


# Dropping the rows with missing values in target column

In [37]:
cab_df=cab_df.dropna().reset_index(drop=True)

# Checking for Missing Values

In [38]:
cab_df.isna().sum()

distance            0
cab_type            0
time_stamp          0
destination         0
source              0
price               0
surge_multiplier    0
id                  0
product_id          0
name                0
dtype: int64

In [39]:
weather_df

Unnamed: 0,temp,location,clouds,pressure,rain,time_stamp,humidity,wind
0,42.42,Back Bay,1.00,1012.14,0.1228,1545003901,0.77,11.25
1,42.43,Beacon Hill,1.00,1012.15,0.1846,1545003901,0.76,11.32
2,42.50,Boston University,1.00,1012.15,0.1089,1545003901,0.76,11.07
3,42.11,Fenway,1.00,1012.13,0.0969,1545003901,0.77,11.09
4,43.13,Financial District,1.00,1012.14,0.1786,1545003901,0.75,11.49
...,...,...,...,...,...,...,...,...
6271,44.72,North Station,0.89,1000.69,,1543819974,0.96,1.52
6272,44.85,Northeastern University,0.88,1000.71,,1543819974,0.96,1.54
6273,44.82,South Station,0.89,1000.70,,1543819974,0.96,1.54
6274,44.78,Theatre District,0.89,1000.70,,1543819974,0.96,1.54


# Checking for Missing Values

In [40]:
weather_df.isna().sum()

temp             0
location         0
clouds           0
pressure         0
rain          5382
time_stamp       0
humidity         0
wind             0
dtype: int64

# Dropping the Value with Missing Rows

In [41]:
weather_df=weather_df.dropna(axis=1).reset_index(drop=True)

# Creating mean by their grouping for location column

# Groupby
groupby is a method in the Pandas library for Python that allows you to group a DataFrame by one or more columns, and then apply a function to each group. This is useful for performing calculations or other operations on subsets of the data based on the values in a particular column or columns

In [42]:
aver_weather_df=weather_df.groupby('location').mean().reset_index(drop=False)
#dropping the time_stamp


In [43]:

aver_weather_df=aver_weather_df.rename(columns={'location':'source',
                                               'temp':'source_temp','clouds':'source_clouds','pressure':'source_pressure',
                                               'humidity':'source_humidity','wind':'source_wind'})
aver_weather_df=aver_weather_df.drop('time_stamp',axis=1)

In [44]:
aver_weather_df

Unnamed: 0,source,source_temp,source_clouds,source_pressure,source_humidity,source_wind
0,Back Bay,39.082122,0.678432,1008.44782,0.764073,6.778528
1,Beacon Hill,39.047285,0.677801,1008.448356,0.765048,6.810325
2,Boston University,39.047744,0.679235,1008.459254,0.763786,6.69218
3,Fenway,38.964379,0.679866,1008.453289,0.767266,6.711721
4,Financial District,39.410822,0.67673,1008.435793,0.754837,6.860019
5,Haymarket Square,39.067897,0.676711,1008.445239,0.764837,6.843193
6,North End,39.090841,0.67673,1008.441912,0.764054,6.853117
7,North Station,39.035315,0.676998,1008.442811,0.765545,6.835755
8,Northeastern University,38.975086,0.678317,1008.444168,0.767648,6.749426
9,South Station,39.394092,0.677495,1008.438031,0.755468,6.848948


# Merging the cab_df and weather_df on location column

In [45]:
df=pd.merge(cab_df,aver_weather_df,on='source',how='left')

In [46]:
df=pd.merge(df,aver_weather_df.rename(columns={'source':'destination'}),on='destination',how='left')

In [47]:
df

Unnamed: 0,distance,cab_type,time_stamp,destination,source,price,surge_multiplier,id,product_id,name,source_temp_x,source_clouds_x,source_pressure_x,source_humidity_x,source_wind_x,source_temp_y,source_clouds_y,source_pressure_y,source_humidity_y,source_wind_y
0,0.44,Lyft,1544952607890,North Station,Haymarket Square,5.0,1.0,424553bb-7174-41ea-aeb4-fe06d4f4b9d7,lyft_line,Shared,39.067897,0.676711,1008.445239,0.764837,6.843193,39.035315,0.676998,1008.442811,0.765545,6.835755
1,0.44,Lyft,1543284023677,North Station,Haymarket Square,11.0,1.0,4bd23055-6827-41c6-b23b-3c491f24e74d,lyft_premier,Lux,39.067897,0.676711,1008.445239,0.764837,6.843193,39.035315,0.676998,1008.442811,0.765545,6.835755
2,0.44,Lyft,1543366822198,North Station,Haymarket Square,7.0,1.0,981a3613-77af-4620-a42a-0c0866077d1e,lyft,Lyft,39.067897,0.676711,1008.445239,0.764837,6.843193,39.035315,0.676998,1008.442811,0.765545,6.835755
3,0.44,Lyft,1543553582749,North Station,Haymarket Square,26.0,1.0,c2d88af2-d278-4bfd-a8d0-29ca77cc5512,lyft_luxsuv,Lux Black XL,39.067897,0.676711,1008.445239,0.764837,6.843193,39.035315,0.676998,1008.442811,0.765545,6.835755
4,0.44,Lyft,1543463360223,North Station,Haymarket Square,9.0,1.0,e0126e1f-8ca9-4f2e-82b3-50505a09db9a,lyft_plus,Lyft XL,39.067897,0.676711,1008.445239,0.764837,6.843193,39.035315,0.676998,1008.442811,0.765545,6.835755
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
637971,1.00,Uber,1543708385534,North End,West End,9.5,1.0,353e6566-b272-479e-a9c6-98bd6cb23f25,9a0e7b09-b92b-4c41-9779-2ad22b4d779d,WAV,38.983403,0.677247,1008.441090,0.767266,6.816233,39.090841,0.676730,1008.441912,0.764054,6.853117
637972,1.00,Uber,1543708385534,North End,West End,13.0,1.0,616d3611-1820-450a-9845-a9ff304a4842,6f72dfc5-27f1-42e8-84db-ccc7a75f6969,UberXL,38.983403,0.677247,1008.441090,0.767266,6.816233,39.090841,0.676730,1008.441912,0.764054,6.853117
637973,1.00,Uber,1543708385534,North End,West End,9.5,1.0,633a3fc3-1f86-4b9e-9d48-2b7132112341,55c66225-fbe7-4fd5-9072-eab1ece5e23e,UberX,38.983403,0.677247,1008.441090,0.767266,6.816233,39.090841,0.676730,1008.441912,0.764054,6.853117
637974,1.00,Uber,1543708385534,North End,West End,27.0,1.0,727e5f07-a96b-4ad1-a2c7-9abc3ad55b4e,6d318bcc-22a3-4af6-bddd-b409bfce1546,Black SUV,38.983403,0.677247,1008.441090,0.767266,6.816233,39.090841,0.676730,1008.441912,0.764054,6.853117


In [48]:
df.isna().sum()

distance             0
cab_type             0
time_stamp           0
destination          0
source               0
price                0
surge_multiplier     0
id                   0
product_id           0
name                 0
source_temp_x        0
source_clouds_x      0
source_pressure_x    0
source_humidity_x    0
source_wind_x        0
source_temp_y        0
source_clouds_y      0
source_pressure_y    0
source_humidity_y    0
source_wind_y        0
dtype: int64

In [49]:
df

Unnamed: 0,distance,cab_type,time_stamp,destination,source,price,surge_multiplier,id,product_id,name,source_temp_x,source_clouds_x,source_pressure_x,source_humidity_x,source_wind_x,source_temp_y,source_clouds_y,source_pressure_y,source_humidity_y,source_wind_y
0,0.44,Lyft,1544952607890,North Station,Haymarket Square,5.0,1.0,424553bb-7174-41ea-aeb4-fe06d4f4b9d7,lyft_line,Shared,39.067897,0.676711,1008.445239,0.764837,6.843193,39.035315,0.676998,1008.442811,0.765545,6.835755
1,0.44,Lyft,1543284023677,North Station,Haymarket Square,11.0,1.0,4bd23055-6827-41c6-b23b-3c491f24e74d,lyft_premier,Lux,39.067897,0.676711,1008.445239,0.764837,6.843193,39.035315,0.676998,1008.442811,0.765545,6.835755
2,0.44,Lyft,1543366822198,North Station,Haymarket Square,7.0,1.0,981a3613-77af-4620-a42a-0c0866077d1e,lyft,Lyft,39.067897,0.676711,1008.445239,0.764837,6.843193,39.035315,0.676998,1008.442811,0.765545,6.835755
3,0.44,Lyft,1543553582749,North Station,Haymarket Square,26.0,1.0,c2d88af2-d278-4bfd-a8d0-29ca77cc5512,lyft_luxsuv,Lux Black XL,39.067897,0.676711,1008.445239,0.764837,6.843193,39.035315,0.676998,1008.442811,0.765545,6.835755
4,0.44,Lyft,1543463360223,North Station,Haymarket Square,9.0,1.0,e0126e1f-8ca9-4f2e-82b3-50505a09db9a,lyft_plus,Lyft XL,39.067897,0.676711,1008.445239,0.764837,6.843193,39.035315,0.676998,1008.442811,0.765545,6.835755
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
637971,1.00,Uber,1543708385534,North End,West End,9.5,1.0,353e6566-b272-479e-a9c6-98bd6cb23f25,9a0e7b09-b92b-4c41-9779-2ad22b4d779d,WAV,38.983403,0.677247,1008.441090,0.767266,6.816233,39.090841,0.676730,1008.441912,0.764054,6.853117
637972,1.00,Uber,1543708385534,North End,West End,13.0,1.0,616d3611-1820-450a-9845-a9ff304a4842,6f72dfc5-27f1-42e8-84db-ccc7a75f6969,UberXL,38.983403,0.677247,1008.441090,0.767266,6.816233,39.090841,0.676730,1008.441912,0.764054,6.853117
637973,1.00,Uber,1543708385534,North End,West End,9.5,1.0,633a3fc3-1f86-4b9e-9d48-2b7132112341,55c66225-fbe7-4fd5-9072-eab1ece5e23e,UberX,38.983403,0.677247,1008.441090,0.767266,6.816233,39.090841,0.676730,1008.441912,0.764054,6.853117
637974,1.00,Uber,1543708385534,North End,West End,27.0,1.0,727e5f07-a96b-4ad1-a2c7-9abc3ad55b4e,6d318bcc-22a3-4af6-bddd-b409bfce1546,Black SUV,38.983403,0.677247,1008.441090,0.767266,6.816233,39.090841,0.676730,1008.441912,0.764054,6.853117


# Checking length of Unique column

In [50]:
{column:len(df[column].unique()) for column in df.columns if df[column].dtypes=='object'}

{'cab_type': 2,
 'destination': 12,
 'source': 12,
 'id': 637976,
 'product_id': 12,
 'name': 12}

# Checking the List of Unique Column in each object column

In [51]:
#{column:list(df[column].unique()) for column in df.columns if df[column].dtypes=='object'}

# Preprocessing Function

In [52]:
def onehot_encode(df,columns):
    df=df.copy()
    for column in columns:
        dummies=pd.get_dummies(df[column],prefix=column)
        df=pd.concat([df,dummies],axis=1)
        df=df.drop(column,axis=1)
    return df

In [57]:
def preprocess_inputs(df):
    df=df.copy()
    #dropping the id column
    df=df.drop('id',axis=1)
    df['cab_type']=df['cab_type'].replace({'Lyft':0,
                                          'Uber':1})
    onehot_columns=[column for column in df.columns if df[column].dtypes=='object']
    df=onehot_encode(df,onehot_columns)
    y=df['price']
    x=df.drop('price',axis=1)
    #train_test_split
    x_train,x_test,y_train,y_test=train_test_split(x,y,train_size=0.7)
    scaler=StandardScaler()
    scaler.fit(x_train)
    x_train=pd.DataFrame(scaler.transform(x_train),columns=x_train.columns)
    x_test=pd.DataFrame(scaler.transform(x_test),columns=x_test.columns)
    return x_train,x_test,y_train,y_test

In [58]:
x_train,x_test,y_train,y_test=preprocess_inputs(df)
print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(y_test.shape)

(446583, 62)
(191393, 62)
(446583,)
(191393,)


In [60]:
x_train

Unnamed: 0,distance,cab_type,time_stamp,surge_multiplier,source_temp_x,source_clouds_x,source_pressure_x,source_humidity_x,source_wind_x,source_temp_y,...,name_Lux,name_Lux Black,name_Lux Black XL,name_Lyft,name_Lyft XL,name_Shared,name_UberPool,name_UberX,name_UberXL,name_WAV
0,0.909312,0.964033,1.223067,-0.157413,-0.794975,0.552856,-0.166549,0.873281,-0.977959,-0.382519,...,-0.295540,-0.295101,-0.295527,-0.295818,-0.295518,-0.29519,-0.307385,-0.307411,-0.308296,-0.307067
1,0.265945,-1.037309,-0.227676,-0.157413,-0.299821,0.025540,0.513515,0.255840,0.136277,-0.795109,...,-0.295540,-0.295101,-0.295527,-0.295818,3.383894,-0.29519,-0.307385,-0.307411,-0.308296,-0.307067
2,-0.597752,0.964033,-0.464071,-0.157413,-0.299821,0.025540,0.513515,0.255840,0.136277,2.075209,...,-0.295540,-0.295101,-0.295527,-0.295818,-0.295518,-0.29519,-0.307385,3.252969,-0.308296,-0.307067
3,-0.562499,-1.037309,-0.620787,-0.157413,-0.299821,0.025540,0.513515,0.255840,0.136277,-0.002150,...,3.383637,-0.295101,-0.295527,-0.295818,-0.295518,-0.29519,-0.307385,-0.307411,-0.308296,-0.307067
4,-0.315729,-1.037309,-0.365242,-0.157413,-0.001103,-1.068154,-0.532976,0.019760,0.919215,-0.300524,...,-0.295540,-0.295101,3.383791,-0.295818,-0.295518,-0.29519,-0.307385,-0.307411,-0.308296,-0.307067
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
446578,-0.386235,-1.037309,-0.606422,2.456281,-0.715246,-0.013521,-0.073389,0.955001,0.574974,-0.159327,...,-0.295540,-0.295101,-0.295527,3.380462,-0.295518,-0.29519,-0.307385,-0.307411,-0.308296,-0.307067
446579,-0.509620,0.964033,-0.780995,-0.157413,-0.715246,-0.013521,-0.073389,0.955001,0.574974,-0.159327,...,-0.295540,-0.295101,-0.295527,-0.295818,-0.295518,-0.29519,-0.307385,-0.307411,-0.308296,-0.307067
446580,-1.382131,0.964033,1.341909,-0.157413,-0.158461,-1.087684,0.007349,0.205900,0.737649,-0.738132,...,-0.295540,-0.295101,-0.295527,-0.295818,-0.295518,-0.29519,-0.307385,-0.307411,-0.308296,-0.307067
446581,1.059137,0.964033,-0.309960,-0.157413,-0.296674,1.490308,2.283545,-0.043800,-2.025375,-0.382519,...,-0.295540,-0.295101,-0.295527,-0.295818,-0.295518,-0.29519,-0.307385,-0.307411,-0.308296,-0.307067


# Training the Model

In [63]:
model=LinearRegression()
model.fit(x_train,y_train)
print("R^2 Score:",model.score(x_test,y_test))

R^2 Score: 0.9280815089240182


In [64]:
y_pred=model.predict(x_test)
root_mean_square_error=np.sqrt(np.mean((y_pred-y_test)**2))
print("Root Mean Square Error",root_mean_square_error)

Root Mean Square Error 2.5029014186682055


In [56]:
x

Unnamed: 0,distance,cab_type,time_stamp,price,surge_multiplier,source_temp_x,source_clouds_x,source_pressure_x,source_humidity_x,source_wind_x,...,name_Lux,name_Lux Black,name_Lux Black XL,name_Lyft,name_Lyft XL,name_Shared,name_UberPool,name_UberX,name_UberXL,name_WAV
0,0.44,0,1544952607890,5.0,1.0,39.067897,0.676711,1008.445239,0.764837,6.843193,...,0,0,0,0,0,1,0,0,0,0
1,0.44,0,1543284023677,11.0,1.0,39.067897,0.676711,1008.445239,0.764837,6.843193,...,1,0,0,0,0,0,0,0,0,0
2,0.44,0,1543366822198,7.0,1.0,39.067897,0.676711,1008.445239,0.764837,6.843193,...,0,0,0,1,0,0,0,0,0,0
3,0.44,0,1543553582749,26.0,1.0,39.067897,0.676711,1008.445239,0.764837,6.843193,...,0,0,1,0,0,0,0,0,0,0
4,0.44,0,1543463360223,9.0,1.0,39.067897,0.676711,1008.445239,0.764837,6.843193,...,0,0,0,0,1,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
637971,1.00,1,1543708385534,9.5,1.0,38.983403,0.677247,1008.441090,0.767266,6.816233,...,0,0,0,0,0,0,0,0,0,1
637972,1.00,1,1543708385534,13.0,1.0,38.983403,0.677247,1008.441090,0.767266,6.816233,...,0,0,0,0,0,0,0,0,1,0
637973,1.00,1,1543708385534,9.5,1.0,38.983403,0.677247,1008.441090,0.767266,6.816233,...,0,0,0,0,0,0,0,1,0,0
637974,1.00,1,1543708385534,27.0,1.0,38.983403,0.677247,1008.441090,0.767266,6.816233,...,0,0,0,0,0,0,0,0,0,0
