**Answer-1**
---
---

In [3]:
import numpy as np
print(np.__version__)
np.random.seed(200)

1.23.5


In [12]:
def Generator(n, m, theta = 0.0):
    beta = np.random.randn(m+1, 1)
    X = np.random.randn(n, m+1)
    X[:, 0] = 1
    h = 1 / (1 + np.exp(-np.matmul(X, beta)))
    Y = np.round(h)
    noise = np.random.binomial(n=1, p=theta, size = (len(Y), 1))
    Y_noisy = np.logical_xor(Y, noise).astype(int)

    return X, Y_noisy, beta

In [13]:
X, Y, beta = Generator(5, 5, theta=0.3)
print(f"X :{X}\nY ={Y}\n\nβ ={beta}")

X :[[ 1.          1.27186338 -0.01231281 -0.26572854 -0.50239708 -0.03190136]
 [ 1.          0.97012715  1.18645564  0.1833277  -0.15419088  0.25202674]
 [ 1.          0.59786374 -0.95718825  1.04975291 -1.29226185  0.11316318]
 [ 1.         -0.59901089  2.01641332  0.1851378  -0.57764425 -0.1445562 ]
 [ 1.         -0.08129256 -0.46365774  0.01162035 -0.24131524  0.23495992]]
Y =[[0]
 [1]
 [1]
 [1]
 [1]]

β =[[ 0.32714315]
 [ 0.58067026]
 [-0.12032434]
 [ 0.52964066]
 [-0.02615513]
 [ 0.26900105]]


**Answer-2**
---
---

In [14]:
def Logistic_Regression(X, Y, k, t, alpha = 0.05):
    beta = np.random.randn(X.shape[1], 1)
    for i in range(k):
        h = 1 / (1 + np.exp(-np.matmul(X, beta)))
        cost = np.average(- Y * np.log(h) - (1 - Y) * np.log(1 - h))
        gradient = np.matmul(X.T, (h - Y)) / len(Y)
        beta -= alpha * gradient
        new_h =  1 / (1 + np.exp(-np.matmul(X, beta)))
        new_cost = np.average(- Y * np.log(new_h) - (1 - Y) * np.log(1 - new_h))
        if np.abs(new_cost - cost) <= t:
            break
    return beta, new_cost

In [33]:
print("Original parameters : ")
print(f"β :{beta}\n")
print("Learned parameters : ")
(beta,f_cost)=Logistic_Regression(X, Y, 10000, 0.0001, 0.9)
print(f"β :{beta}\nThe final cost function value is : {f_cost}")

Original parameters : 
β :[[ 3.68509465]
 [-4.60310549]
 [ 2.40816574]
 [ 4.31111469]
 [-0.90283285]
 [ 1.52595119]]

Learned parameters : 
β :[[ 3.55583661]
 [-3.95697134]
 [ 1.94085857]
 [ 4.99602135]
 [ 0.12338401]
 [ 1.95500641]]
The final cost function value is : 0.024769326753600655


**Answer-3**
---
---

**Impact of Dataset Size (n)** :

Larger dataset sizes generally improve the logistic regression model's ability to learn coefficients β, leading to better predictive performance due to increased diversity and generalization capacity.

**Impact of Label Noise (θ)** :

Higher levels of label noise (θ) can hinder the model's learning process and degrade prediction accuracy, emphasizing the importance of clean, reliable data for effective logistic regression modeling.
```
The derivation of gradient of the cost function with respect to the parameters of the model is submitted through a scanned pdf in Edu-collab.
```

**Answer-4**
---
---

Adding L1 and L2 regularization to the Logistic Regression cost function helps prevent overfitting by penalizing large coefficients. The impact of regularization on the learned models and the β vector depends on the choice of the regularization constant.

**L1 Regularization (Lasso)** :
L1 regularization adds the sum of the absolute values of the coefficients to the cost function. This encourages sparsity in the coefficients, as it tends to push some coefficients to zero.
With higher values of the regularization constant (λ), more coefficients are likely to be pushed to zero, resulting in a simpler model with fewer features contributing to the prediction.
The choice of λ balances between model simplicity (higher λ) and fitting the training data well (lower λ).

**L2 Regularization (Ridge)** :

L2 regularization adds the sum of the squared values of the coefficients to the cost function. It penalizes large coefficients but does not usually lead to sparsity in the coefficients. Increasing the regularization constant (λ) shrinks the coefficients towards zero, reducing the impact of individual features on the model's output. L2 regularization tends to distribute the weight more evenly among features compared to L1 regularization.

In [37]:
def logistic_regression_regularized(X, Y, k, t, alpha, reg = "L1"):
    beta = np.random.randn(X.shape[1], 1)
    lambda_reg  = 1
    for i in range(k):
        h = 1 / (1 + np.exp(-np.matmul(X, beta)))
        if reg == "L1":
            cost = np.average(- Y * np.log(h) - (1 - Y) * np.log(1 - h)) + lambda_reg * np.sum(np.abs(beta))
            gradient = np.matmul(X.T, (h - Y)) / len(Y) + lambda_reg * np.sign(beta)
        elif reg == "L2":
            cost = np.average(- Y * np.log(h) - (1 - Y) * np.log(1 - h)) + lambda_reg * np.sum(np.square(beta))
            gradient = np.matmul(X.T, (h - Y)) / len(Y) + lambda_reg * beta
        beta -= alpha * gradient
        new_h =  1 / (1 + np.exp(-np.matmul(X, beta)))
        new_cost = np.average(- Y * np.log(new_h) - (1 - Y) * np.log(1 - new_h))
        if np.abs(new_cost - cost) <= t:
            break
    return beta, new_cost

print("Original parameters : ")
print(f"β :{beta}\n")
print("Learned parameters : ")
(new_beta1, cost1) = logistic_regression_regularized(X, Y, 100000, 0.0001, 0.01, reg="L1")
print(f"β :{new_beta1}\nThe final cost function value is : {cost1}")

Original parameters : 
β :[[ 3.55583661]
 [-3.95697134]
 [ 1.94085857]
 [ 4.99602135]
 [ 0.12338401]
 [ 1.95500641]]

Learned parameters : 
β :[[ 0.00492722]
 [ 0.0005495 ]
 [-0.00795176]
 [-0.00791548]
 [-0.00508957]
 [ 0.0080309 ]]
The final cost function value is : 0.6931830726171009


**The choice of the regularization constant (λ) impacts the β vector learned as follows** :

For both L1 and L2 regularization, smaller values of λ result in less regularization, allowing the model to fit the training data more closely. This can lead to overfitting if λ is too small.

Larger values of λ increase the amount of regularization, which can lead to simpler models with smaller coefficients. However, if λ is too large, the model may underfit the data.

The optimal value of λ is usually determined through techniques like cross-validation, where different values of λ are tried and the one that yields the best performance on a validation set is selected.

**Answer-5**
---
---

In [41]:
class Regression:
    def __init__(self, type="linear"):
        self.type = type

    def train(self, X, Y, k, t, alpha=0.5):
        beta = np.random.randn(X.shape[1], 1)
        for i in range(k):
            cost = self._cost(X, Y, beta)
            gradient = self._gradient(X, Y, beta)
            beta -= alpha * gradient
            new_cost = self._cost(X, Y, beta)
            if np.abs(new_cost - cost) <= t:
                break

        return beta

    def _gradient(self, X, Y, beta):
        if self.type == "linear":
            error = np.matmul(X, beta) - Y
            return X.T.dot(error)/len(Y)
        elif self.type == "logistic":
            h = 1 / (1 + np.exp(-np.matmul(X, beta)))
            return np.matmul(X.T, (h - Y)) / len(Y)

    def _cost(self, X, Y, beta):
        if self.type == "linear":
            error = np.matmul(X, beta) - Y
            return np.sum(np.square(error))/(2*len(Y))
        elif self.type == "logistic":
            h = 1 / (1 + np.exp(-np.matmul(X, beta)))
            return np.average(- Y * np.log(h) - (1 - Y) * np.log(1 - h))

In [42]:
X, Y, beta = Generator(5, 5, theta=0.3)

regression = Regression()
regression.train(X,Y,1000,0.0001,alpha=0.5)

array([[ 1.04491996],
       [ 0.35643135],
       [-0.31211607],
       [ 0.23966457],
       [ 0.06084374],
       [-0.06743868]])