# Linear Regression

Linear regression analysis is used to predict the value of a variable based on the value of another variable. The variable you want to predict is called the dependent variable. The variable you are using to predict the other variable's value is called the independent variable

This week, your task involves conducting multi-class linear regression on batsmen salaries. You'll use the average runs scored per game and the strike rate as independent variables. The goal is to predict the salary as the dependent variable. Additionally, you'll be categorizing the data based on the years.

The dataset is Data_Mendeley.csv given on GitHub. Feel free to create any new functions required.

In [2]:
#import important libraries
import numpy as np
from sklearn import datasets
import matplotlib.pyplot as plt
import pandas as pd

preparing data

In [3]:
#mounting gdrive
from google.colab import drive
drive.mount('/content/drive')
file_path = '/content/drive/My Drive/wids/Data_Mendeley.csv'
data=pd.read_csv(file_path)

Mounted at /content/drive


Implement Linear regression here :)

In [11]:
data = np.genfromtxt(file_path, delimiter=',', dtype=None, names=True, encoding='utf-8')
years = data['Year']
average_runs = data['Ave']
strike_rate = data['StrRate']
salary = data['Final_Price']
average_runs = np.nan_to_num(average_runs, nan=np.nanmean(average_runs))
strike_rate = np.nan_to_num(strike_rate, nan=np.nanmean(strike_rate))
salary = np.nan_to_num(salary, nan=np.nanmean(salary))

def compute_coefficients(X, y):
    X_with_intercept = np.hstack((np.ones((X.shape[0], 1)), X))  # Add intercept
    beta = np.linalg.inv(X_with_intercept.T @ X_with_intercept) @ X_with_intercept.T @ y
    return beta


unique_years = np.unique(years)
models = {}
performance = []

for year in unique_years:

    mask = years == year
    X_year = np.column_stack((average_runs[mask], strike_rate[mask]))
    y_year = salary[mask]


    beta = compute_coefficients(X_year, y_year)
    models[year] = beta


    X_with_intercept = np.hstack((np.ones((X_year.shape[0], 1)), X_year))
    y_pred = X_with_intercept @ beta



# Inputs
year = 2008
average_runs_input = 2
strike_rate_input = 23.52

# Combine inputs into a feature vector with an intercept
X_input = np.array([1, average_runs_input, strike_rate_input])

# Perform prediction
if year in models:
    beta = models[year]
    predicted_salary = X_input @ beta
    print(f"Predicted Salary for Year {year}: {predicted_salary:.2f}")
else:
    print(f"No model found for Year {year}")



Predicted Salary for Year 2008: 8210624.47


# Logistic Regression

Logistic regression is a process of modeling the probability of a discrete outcome given an input variable. The most common logistic regression models a binary outcome; something that can take two values such as true/false, yes/no, and so on.

In this week you will be doing logistic regression on breast cancer dataset using sklearn library. Feel free to create any new functions required.

In [None]:
#importinf libraries
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

Prepare Data

In [None]:
breast_cancer = datasets.load_breast_cancer()
X, y = breast_cancer.data, breast_cancer.target

In [None]:
#spliting data for training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1234)
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Implement Logistic Regression here :)