# Logistic regression:
In this notebook I referenced from [Machine learning from scratch](https://github.com/AssemblyAI-Examples/Machine-Learning-From-Scratch/tree/main/03%20Logistic%20Regression), and added a personal study to obtain a better accuracy with a a better learning rate.
I used the algorythm on a clean and small dataset as i wanted to focus more on learning how logistic regression works. Data was downloaded from [kaggle ](https://https://www.kaggle.com/datasets/akshaydattatraykhare/diabetes-dataset).

**About the dataset:**

This dataset is originally from the National Institute of Diabetes and Digestive and Kidney
Diseases. The objective of the dataset is to diagnostically predict whether a patient has diabetes,
based on certain diagnostic measurements included in the dataset. Several constraints were placed
on the selection of these instances from a larger database. In particular, all patients here are females
at least 21 years old of Pima Indian heritage.2
From the data set in the (.csv) File We can find several variables, some of them are independent
(several medical predictor variables) and only one target dependent variable (Outcome)

**What I am trying to achieve:**
From this dataset , I will develop an algorythm that can predict if yes or no a patients has diabetes based on data provided

In [1]:
# data manipulation
import pandas as pd
import numpy as np
import scipy.stats as stats

# visualisation
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_style('darkgrid')
import warnings 
warnings.filterwarnings('ignore')

# sklearn
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

In [10]:
data_path="/content/drive/MyDrive/Personal Project/diabetes project/diabetes.csv"

In [11]:
df = pd.read_csv(data_path)
predict= df.loc[:,df.columns !='Outcome']
target=df['Outcome']
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(predict, target, test_size=0.15, random_state=42)

In [30]:
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [12]:
import numpy as np

def sigmoid(x):
    return 1/(1+np.exp(-x))

class LogisticRegression():

    def __init__(self, lr=0.001, n_iters=1000):
        self.lr = lr
        self.n_iters = n_iters
        self.weights = None
        self.bias = None

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0

        for _ in range(self.n_iters):
            linear_pred = np.dot(X, self.weights) + self.bias
            predictions = sigmoid(linear_pred)

            dw = (1/n_samples) * np.dot(X.T, (predictions - y))
            db = (1/n_samples) * np.sum(predictions-y)

            self.weights = self.weights - self.lr*dw
            self.bias = self.bias - self.lr*db


    def predict(self, X):
        linear_pred = np.dot(X, self.weights) + self.bias
        y_pred = sigmoid(linear_pred)
        class_pred = [0 if y<=0.5 else 1 for y in y_pred]
        return class_pred

In [13]:
def accuracy(y_pred, y_test):
    return np.sum(y_pred==y_test)/len(y_test)

In [29]:
accuracy_result={}
lr_p=np.arange(0.01, 1, 0.01)
for i in lr_p:
  clf = LogisticRegression(lr=i)
  clf.fit(x_train,y_train)
  y_pred = clf.predict(x_test)
  acc = accuracy(y_pred, y_test)
  accuracy_result[i] = acc
  learning_rate=max(accuracy_result, key=accuracy_result.get)

print(accuracy_result)
print(f"The highest accuracy is with the learning rate {learning_rate}, with accuracy of :{accuracy_result[learning_rate]}")






{0.01: 0.3793103448275862, 0.02: 0.6896551724137931, 0.03: 0.6896551724137931, 0.04: 0.35344827586206895, 0.05: 0.3706896551724138, 0.060000000000000005: 0.3706896551724138, 0.06999999999999999: 0.6551724137931034, 0.08: 0.6810344827586207, 0.09: 0.6551724137931034, 0.09999999999999999: 0.6379310344827587, 0.11: 0.35344827586206895, 0.12: 0.6551724137931034, 0.13: 0.6551724137931034, 0.14: 0.6724137931034483, 0.15000000000000002: 0.6896551724137931, 0.16: 0.6637931034482759, 0.17: 0.6551724137931034, 0.18000000000000002: 0.5344827586206896, 0.19: 0.6551724137931034, 0.2: 0.6551724137931034, 0.21000000000000002: 0.4396551724137931, 0.22: 0.6551724137931034, 0.23: 0.6551724137931034, 0.24000000000000002: 0.4396551724137931, 0.25: 0.6551724137931034, 0.26: 0.4051724137931034, 0.27: 0.6551724137931034, 0.28: 0.6551724137931034, 0.29000000000000004: 0.3793103448275862, 0.3: 0.6551724137931034, 0.31: 0.6551724137931034, 0.32: 0.6379310344827587, 0.33: 0.3793103448275862, 0.34: 0.706896551724

In [28]:
np.arange(0.01, 1, 0.01)

array([0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 , 0.11,
       0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2 , 0.21, 0.22,
       0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3 , 0.31, 0.32, 0.33,
       0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4 , 0.41, 0.42, 0.43, 0.44,
       0.45, 0.46, 0.47, 0.48, 0.49, 0.5 , 0.51, 0.52, 0.53, 0.54, 0.55,
       0.56, 0.57, 0.58, 0.59, 0.6 , 0.61, 0.62, 0.63, 0.64, 0.65, 0.66,
       0.67, 0.68, 0.69, 0.7 , 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77,
       0.78, 0.79, 0.8 , 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88,
       0.89, 0.9 , 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99])