# Logistic Regression

## Questions:

### What is the wikipedia link to the algorithm?

- [Link](https://en.wikipedia.org/wiki/Logistic_regression)

### Which type of machine learning algorithm is this?

- Supervised learning

### What is the best video tutorial on this algorithm?

- [Video](https://www.youtube.com/watch?v=yIYKR4sgzI8)

### What is the best text?

- [Link](https://ml-cheatsheet.readthedocs.io/en/latest/logistic_regression.html)

### What is the best picture which describes the algorithm?

- ![Linear Regression](Images/BinaryLogisticRegression.png)

### What is one case for which the algorithm is used for?

- Determining the probability of a patient developing a particular disease.

## Sample data on which the algorithm gets proofed on:

## From scratch implementation:

### Steps for reproducing the algorithm:
1. **Hypothesis**
    - Sigmoid function outputs 0 or 1 based on a set treshold with formula from linear regression plugged in.
2. **Cost function** 
    - Cross entropy which is divided into two fucntions - one for $y = 0$ and for $y = 1$
3. **Gradient descent**
    - Using the partial derivative of the sigmoid hypothesis function
4. **Logistic regression**
    - Putting everything into one model
5. **Validation**
    - Comparison with scikit-learn 

In [359]:
'''
Logistic regression:
getting input 

Same like multiple linear regression but this time i output 0 or 1 with my hypothesis
The cost function and with that the gradient looks different but from the logic its the same 
'''

'\nLogistic regression:\ngetting input \n\nSame like multiple linear regression but this time i output 0 or 1 with my hypothesis\nThe cost function and with that the gradient looks different but from the logic its the same \n\n'

In [360]:
import numpy as np
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression

### Data

**Task:** Iris dataset contains three classes (three different types of the iris flower) and four features. Given an input of four features from an Iris flower I need to use logistic regression for outputting the respective class.

**Dataset**: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html

In [361]:
X, y = load_iris(return_X_y=True)
xTest = X[130:] 
yTest = y[130:] 
X = X[:130]
y = y[:130]
thetas = np.ones(X[0].size)
learningRate = 0.01
iterations = 1000

In [378]:
print(y)

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]


### Hypothesis

$S(z)= \frac{1}{1+\exp^{-z}}$ for binary and $\sigma(z_i)= \frac{e^{z_{(i)}}}{\sum^{K}_{j=1}e^{z_{(j)}}}$ for multi class classification

In [362]:
#in my thinking i have to implement the decision boundary with the hypothesis somehow if i have three classes than class one is <= 0.33 class 2 <= 0.66 etc
#need to check if this is correct. check in sheets how decision boundary works exactly
#can i not just get the max value from my predictions. I mean first entry corresponds to first class etc always

"""
In words: we apply the standard exponential function to each element 𝑧𝑖 of the input vector 𝑧 and normalize these values by dividing by the sum of all these
exponentials; this normalization ensures that the sum of the components of the output vector σ(𝑧) is 1
"""

def hypothesis2(x):
    """Function for converting input values into probabilites.

    Args:
        z (list): input vector

    Returns:
        np.array: probabilities for every element from the input vector
    """
    l=[]
    for i in range(len(x)):
        softmax = np.exp(x[i]) / sum(np.exp(x))
        l.append(softmax)
    return np.array(l)

In [363]:

def hypothesis(x, thetas):
    """Function for converting input values into probabilites.

    Args:
        z (list): input vector

    Returns:
        np.array: probabilities for every element from the input vector
    """
    l=[]
    for i in range(len(x)):
        softmax = np.exp(x[i]) / sum(np.exp(np.dot(X, thetas)))
        l.append(softmax)
    return np.array(l)

In [377]:
hypothesis2(X[0])
#problem is that this result does not represent the three classes what it normally should do
#how do i know many classes exist?
#i need to count the number of numbers which are different from each other and than range through this number? yes and also 

array([0.81032902, 0.16360261, 0.02003419, 0.00603418])

### Prediction 

Returns highest probability from hypothesis with respective class identifier

In [364]:
def prediction(probabilities):
    """On a given array of probabilities class with highest probabilitie value is returned.

    Args:
        probabilities (np.array): Array of probabilities

    Returns:
        int: Class
    """
    highestValueClass = np.where(probabilities == np.max(probabilities))[0][0]
    return highestValueClass

### Cost function

$J(\theta)=-\frac{1}{m} \sum^{m}_{i=1}[y^{(i)}\log(h_{\theta}(x^{(i)}))+(1-y^{(i)})\log(1-h_{\theta}(x^{(i)}))]$

In [365]:
def costFunction(x, y, theta):
    """Shows differences between predicted and actual y value.
    
    Args:
        x (np.array): Vector of probabilities
        y (int): Class prediction

    Returns:
        int: Error between actual and predicted y.
    """
    j = y * np.log(hypothesis(x, theta)) + (1 - y) * np.log(1 - hypothesis(x, theta))
    return j

### Gradient descent

$\theta_j \leftarrow \theta_j - \alpha \frac{1}{m} \sum^{m}_{i=1}(h_\theta(x^{(i)})-y^{(i)})x^{(i)}_j$

$f'(\theta_1) = -x_1(y - (\theta_1 x_1 + \theta_2 x_2 + \cdots + \theta_n x_n)) \\$



In [366]:
def updateTheta(x, y, thetas, learningRate):
    """Updates weights for finding optimum. https://www.baeldung.com/cs/gradient-descent-logistic-regression
    Args:
        x (np.array): Vector of probabilities
        y (int): Class prediction
        theta (np.array): Weights which get updated
        learningRate (float): Determines step size

    Returns:
        np.array: Updated weights
    """
    thetaList = []

    for i in range(len(thetas)):
        #x = X[i] müsste ein wert sein ist aber ganzes array y = y[i]
        newTheta = thetas[i] - learningRate*(-x[i]*(y-(np.dot(x,thetas))))
        #thetaNew = thetas[i] - learningRate*(-X[i]*(y-(np.dot(X,thetas))))
        #newTheta = thetas[i] - learningRate * 1/len(x) * sum((hypothesis(x)-y) * x)
        #thetaNew = thetas[i] - learningRate*(-X[i]*(y-(np.dot(X,thetas))))
        thetaList.append(newTheta)
    return thetaList

### Building the model

In [367]:
#i have confusion torwards how i have to implement my decision boundary 
def logisticRegression(X, y, theta, learningRate, iterations):
    resultsCostFunction = []
    resultsTheta = []
    #resultPrediction
    for i in range(iterations):
        #is there one bias term which i use constant or does this value as often as x differently
        #i have two partial derivatives for weight and bias
        #i have only given one gradient descent equation - which one is the right one now? 
        for i in range(len(X)):
            j = costFunction(X[i], y[i], theta)
            theta = updateTheta(X[i], y[i], theta, learningRate)
            resultsCostFunction.append(j)
            resultsTheta.append(theta)
    return resultsCostFunction, resultsTheta
    

### Training model

In [368]:
costFunctionResults, thetasResults = logisticRegression(X, y, thetas, learningRate, iterations)

  j = y * np.log(hypothesis(x, theta)) + (1 - y) * np.log(1 - hypothesis(x, theta))


### Predicting dependent variables with trained model

In [369]:
def val(X, thetas):
    yPred = []
    for i in range(len(X)):
        y = prediction(hypothesis(X[i], thetas))
        yPred.append(y)
    print(yPred)
    return np.array(yPred)

In [370]:
print(thetasResults[-1])

[0.037586007019736335, -0.09810514835691256, 0.23112464993204357, 0.4248515844717129]


In [371]:
"""
1. check if hypothesis is alright (call function and input xTest should return always 3 prob)
    - now i call it like 
    - 
2. compare hypothesis with xTest (do i get right results?)
"""

<function __main__.hypothesis(x, thetas)>

In [372]:
val(xTest, thetasResults[-1])
#get wrong results because my theta is lacking

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]


array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [373]:
logisticRegression = LogisticRegression().fit(X, y)

resultsScikitLearn = logisticRegression.predict(xTest)
print(resultsScikitLearn)

[2 2 2 1 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2]


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


### Notes

--> really code explanation https://mlcorner.com/multiple-logistic-regression-explained-for-machine-learning/ I should save it 

1. Go again from top to to button through every function and fix code

1. Pick dataset
    - Iris from example page https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
2. Understand what I need to do for a logistic regression with this dataset
    1. load data and play around to understand
    2. Put data creation section on top because this helps a lot for understanding the problem
    3. Link to description of dataset 
    4. Own description of the task which needs to be accomplished
3. Make a plan
    1. hypothesis
        - instead of sigma implementing softmax?
        1. Implementing softmax
        2. Adding docstrings 
    2. decision boundary
    3. cost
    4. gradient
    5. logistic regression
    6. validation


### Starting new and understanding the whole workflow 

1. Z = product of my weights and the input features
2. softmax with Z in counter 