### Binary Classification 

**Definition:**
Binary classification is a machine learning task in which data is classified into **two possible classes only**.


**Explanation:**
In binary classification, the model learns from given data and predicts one of two outcomes such as **Yes/No**, **True/False**, or **0/1**.


**Example:**
Email spam detection is a binary classification problem:

* Spam → **1**
* Not Spam → **0**

Here, the model decides whether an email belongs to **spam or not spam**, which are only two classes.

**Conclusion:**
Thus, binary classification involves assigning data to exactly **two distinct categories**.



# Sigmoid Function 

**Definition:**  
The sigmoid function is an S-shaped function that converts any real number into a value **between 0 and 1**.

**Formula:**  
sigma(x) = 1 / (1 + e^(-x))

**Behavior:**  
- If x → +∞, sigma(x) ≈ 1 → output close to 1  
- If x → -∞, sigma(x) ≈ 0 → output close to 0  
- If x = 0, sigma(x) = 0.5 → output is in the middle  

**Real-Life Example:**  
- **Email Spam Detection:**  
  - Suppose a model calculates a score x = 3 for an email.  
  - sigma(3) ≈ 0.95 → 95% probability the email is spam → likely “Yes, Spam”  
  - If x = -2, sigma(-2) ≈ 0.12 → 12% probability → likely “No, Not Spam”

**Conclusion:**  
- Sigmoid is widely used in **binary classification** problems.  
- It **converts any number into a probability between 0 and 1**, helping in decisions like Yes/No, True/False, or Spam/Not Spam.


# Log-Loss (Binary Cross-Entropy)

**Definition:**  
Log-loss, also called **Binary Cross-Entropy**, is a loss function used in **binary classification** to measure how well a model's predicted probabilities match the actual labels.  
- Lower log-loss → better predictions  
- Higher log-loss → worse predictions

**Formula:**  
For a single example:  
LogLoss = - [y * log(p) + (1 - y) * log(1 - p)]

Where:  
- y = actual label (0 or 1)  
- p = predicted probability of y = 1  



**Behavior:**  
- If the model predicts **1 for actual 1** → loss is small  
- If the model predicts **0 for actual 1** → loss is large  
- It **penalizes wrong predictions more if the model is confident**  



**Real-Life Example:**  
**Email Spam Detection:**  
- Suppose an email is actually spam (y = 1)  
- Model predicts p = 0.9 → Log-loss = -[1 * log(0.9) + 0 * log(0.1)] ≈ 0.105 → good prediction  
- Model predicts p = 0.1 → Log-loss = -[1 * log(0.1) + 0 * log(0.9)] ≈ 2.302 → bad prediction  



**Conclusion:**  
- Log-loss is used to **evaluate probability predictions in binary classification**.  
- Lower log-loss means the **model's predicted probabilities are closer to actual outcomes**.


In [1]:
# 16. Write a sigmoid(z) function using only math.exp().

import math

def sigmoid(z):
    return 1 / (1 + math.exp(-z))

# Example usage
print(sigmoid(2))   
print(sigmoid(-3))  


0.8807970779778823
0.04742587317756678


In [2]:
# 17. Implement predict_proba(X, w, b) for a single feature.

import math

# Sigmoid function
def sigmoid(z):
    return 1 / (1 + math.exp(-z))

# Predict probability for single feature
def predict_proba(X, w, b):
    z = w * X + b
    return sigmoid(z)

# Example
X = 2
w = 0.5
b = -1
print(predict_proba(X, w, b)) 


0.5


In [3]:
# 18. Convert probabilities into class labels using threshold 0.5.

# Assume predict_proba function is already defined
def predict_class(X, w, b):
    prob = predict_proba(X, w, b)  # Get probability
    if prob >= 0.5:
        return 1  # Class 1
    else:
        return 0  # Class 0

# Example
X = 2
w = 0.5
b = -1

print(predict_class(X, w, b))  


1


In [13]:
# 19. Create a small binary dataset and print predicted probabilities.

import csv
import math

# Sigmoid function
def sigmoid(z):
    return 1 / (1 + math.exp(-z))

# Path to uploaded dataset
filename ="C:/Users/shres/OneDrive/Documents/titanic.csv"

X = []  # Age
y = []  # Survived

# Read CSV file
with open(filename, 'r') as file:
    reader = csv.reader(file)
    header = next(reader)  # skip header

    for row in reader:
        age = row[1]          # Age column
        survived = row[-1]    # Last column = Survived

        if age != "":         # ignore missing age
            X.append(float(age))
            y.append(int(survived))

# Assume weight and bias
w = -0.05
b = 1.5

# Print predicted probabilities
for i in range(len(X)):
    z = w * X[i] + b
    prob = sigmoid(z)
    print(f"Age: {X[i]:>4} | Actual: {y[i]} | Predicted Probability: {prob:.2f}")



Age: 22.0 | Actual: 0 | Predicted Probability: 0.60
Age: 38.0 | Actual: 1 | Predicted Probability: 0.40
Age: 35.0 | Actual: 1 | Predicted Probability: 0.44
Age: 35.0 | Actual: 0 | Predicted Probability: 0.44
Age: 54.0 | Actual: 0 | Predicted Probability: 0.23
Age:  2.0 | Actual: 0 | Predicted Probability: 0.80
Age: 27.0 | Actual: 1 | Predicted Probability: 0.54
Age: 14.0 | Actual: 1 | Predicted Probability: 0.69
Age:  4.0 | Actual: 1 | Predicted Probability: 0.79
Age: 58.0 | Actual: 1 | Predicted Probability: 0.20
Age: 20.0 | Actual: 0 | Predicted Probability: 0.62
Age: 39.0 | Actual: 0 | Predicted Probability: 0.39
Age: 14.0 | Actual: 0 | Predicted Probability: 0.69
Age: 55.0 | Actual: 1 | Predicted Probability: 0.22
Age:  2.0 | Actual: 0 | Predicted Probability: 0.80
Age: 28.0 | Actual: 1 | Predicted Probability: 0.52
Age: 31.0 | Actual: 0 | Predicted Probability: 0.49
Age: 28.0 | Actual: 1 | Predicted Probability: 0.52
Age: 35.0 | Actual: 0 | Predicted Probability: 0.44
Age: 34.0 | 

In [14]:
# 20 Write log_loss(y_true, y_pred) from scratch.
import math

def log_loss(y_true, y_pred):
    n = len(y_true)
    loss = 0

    for i in range(n):
        y = y_true[i]
        p = y_pred[i]

        # avoid log(0)
        if p == 0:
            p = 1e-15
        if p == 1:
            p = 1 - 1e-15

        loss += y * math.log(p) + (1 - y) * math.log(1 - p)

    return -loss / n

y_true = [1, 0, 1, 1]
y_pred = [0.9, 0.2, 0.8, 0.7]

print(log_loss(y_true, y_pred))


0.22708064055624455


In [15]:
# 21. Compute gradients for Logistic Regression using one training sample.
import math

def sigmoid(z):
    return 1 / (1 + math.exp(-z))

# One training sample
x = 25      
y = 1        

# Initial parameters
w = -0.05
b = 1.2

# Forward pass
z = w * x + b
y_hat = sigmoid(z)

# Compute gradients
dw = (y_hat - y) * x
db = (y_hat - y)

print("Predicted probability:", round(y_hat, 3))
print("Gradient w.r.t weight (dw):", round(dw, 3))
print("Gradient w.r.t bias (db):", round(db, 3))


Predicted probability: 0.488
Gradient w.r.t weight (dw): -12.812
Gradient w.r.t bias (db): -0.512


In [20]:
# 22 Extend gradient computation to the full dataset.
import math
import csv

def sigmoid(z):
    return 1 / (1 + math.exp(-z))

def compute_gradients(X, y, w, b):
    n = len(X)
    dw = 0
    db = 0

    for i in range(n):
        z = w * X[i] + b
        y_hat = sigmoid(z)

        error = y_hat - y[i]

        dw += error * X[i]
        db += error

    dw = dw / n
    db = db / n

    return dw, db
    
# Path to uploaded dataset
filename ="C:/Users/shres/OneDrive/Documents/titanic.csv"

X = []  # Age
y = []  # Survived

# Read CSV file
with open(filename, 'r') as file:
    reader = csv.reader(file)
    header = next(reader)  # skip header

    for row in reader:
        age = row[1]          # Age column
        survived = row[-1]    # Last column = Survived

        if age != "":         # ignore missing age
            X.append(float(age))
            y.append(int(survived))

w = -0.05
b = 1.2

dw, db = compute_gradients(X, y, w, b)

print("dw:", round(dw, 3))
print("db:", round(db, 3))


dw: 3.781
db: 0.178


In [21]:
# 23. Implement one Gradient Descent update step.

def gradient_descent_step(w, b, dw, db, lr):
    w = w - lr * dw
    b = b - lr * db
    return w, b
    
# Initial values
w = -0.05
b = 1.2

# Gradients (from previous step)

# Learning rate
lr = 0.01

# One gradient descent step
w, b = gradient_descent_step(w, b, dw, db, lr)

print("Updated weight:", round(w, 3))
print("Updated bias:", round(b, 3))


Updated weight: -0.088
Updated bias: 1.198


In [34]:
# 24. Train the model for 100 epochs and store loss values.

def log_loss(y_true, y_pred):
    epsilon = 1e-15
    y_pred = max(min(y_pred, 1 - epsilon), epsilon)
    return - (y_true * math.log(y_pred) + (1 - y_true) * math.log(1 - y_pred))
    
w = 0.0
b = 0.0
learning_rate = 0.1
epochs = 100
loss_history = []

for epoch in range(epochs):
    # Compute gradients
    dw, db = compute_gradients(X, y, w, b)
    
    # Update weights
    w, b = gradient_descent_step(w, b, dw, db, learning_rate)
    
    # Compute average loss for this epoch
    total_loss = 0
    for xi, yi in zip(X, y):
        y_hat = sigmoid(w * xi + b)
        total_loss += log_loss(yi, y_hat)
    avg_loss = total_loss / len(X)
    loss_history.append(avg_loss)
    
    print(f"Epoch {epoch+1}: Loss = {avg_loss:.4f}")

Epoch 1: Loss = 0.6842
Epoch 2: Loss = 0.6782
Epoch 3: Loss = 0.6737
Epoch 4: Loss = 0.6699
Epoch 5: Loss = 0.6665
Epoch 6: Loss = 0.6634
Epoch 7: Loss = 0.6603
Epoch 8: Loss = 0.6573
Epoch 9: Loss = 0.6544
Epoch 10: Loss = 0.6515
Epoch 11: Loss = 0.6486
Epoch 12: Loss = 0.6458
Epoch 13: Loss = 0.6430
Epoch 14: Loss = 0.6402
Epoch 15: Loss = 0.6374
Epoch 16: Loss = 0.6347
Epoch 17: Loss = 0.6319
Epoch 18: Loss = 0.6292
Epoch 19: Loss = 0.6266
Epoch 20: Loss = 0.6239
Epoch 21: Loss = 0.6213
Epoch 22: Loss = 0.6187
Epoch 23: Loss = 0.6161
Epoch 24: Loss = 0.6135
Epoch 25: Loss = 0.6110
Epoch 26: Loss = 0.6084
Epoch 27: Loss = 0.6059
Epoch 28: Loss = 0.6034
Epoch 29: Loss = 0.6010
Epoch 30: Loss = 0.5985
Epoch 31: Loss = 0.5961
Epoch 32: Loss = 0.5937
Epoch 33: Loss = 0.5913
Epoch 34: Loss = 0.5890
Epoch 35: Loss = 0.5866
Epoch 36: Loss = 0.5843
Epoch 37: Loss = 0.5820
Epoch 38: Loss = 0.5797
Epoch 39: Loss = 0.5774
Epoch 40: Loss = 0.5752
Epoch 41: Loss = 0.5729
Epoch 42: Loss = 0.5707
E

In [42]:
# 25 Print loss every 10 epochs.
for epoch in range(epochs):
    # Compute gradients
    dw, db = compute_gradients(X, y, w, b)
    
    # Update weights
    w, b = gradient_descent_step(w, b, dw, db, learning_rate)
    
    # Compute average loss for this epoch
    total_loss = 0
    for xi, yi in zip(X, y):
        y_hat = sigmoid(w * xi + b)
        total_loss += log_loss(yi, y_hat)
    avg_loss = total_loss / len(X)
    loss_history.append(avg_loss)
    
    # Print loss every 10 epochs
    if (epoch + 1) % 10 == 0:
        print(f"Epoch {epoch+1}: Loss = {avg_loss:.4f}")
print("Trained weight w:", w)

Epoch 10: Loss = 8.0598
Epoch 20: Loss = 1.5495
Epoch 30: Loss = 8.0580
Epoch 40: Loss = 1.5483
Epoch 50: Loss = 8.0570
Epoch 60: Loss = 1.5520
Epoch 70: Loss = 8.0562
Epoch 80: Loss = 1.5573
Epoch 90: Loss = 8.0555
Epoch 100: Loss = 1.5631
Trained weight w: -0.13949432321604083


In [36]:
#   26. Change the classification threshold from 0.5 to 0.7 and compare predictions.

# Assuming trained w and b
X = [1, 2, 3, 4, 5]
y = [0, 0, 0, 1, 1]

print(w)
print(b)

threshold_1 = 0.5
threshold_2 = 0.7

predictions_0_5 = []
predictions_0_7 = []

for xi in X:
    y_hat = 1 / (1 + math.exp(-(w * xi + b)))  # sigmoid
    
    # Default threshold 0.5
    pred_0_5 = 1 if y_hat >= threshold_1 else 0
    predictions_0_5.append(pred_0_5)
    
    # New threshold 0.7
    pred_0_7 = 1 if y_hat >= threshold_2 else 0
    predictions_0_7.append(pred_0_7)

print("Predictions with threshold 0.5:", predictions_0_5)
print("Predictions with threshold 0.7:", predictions_0_7)



0.7543269050132171
-2.348310426667632
Predictions with threshold 0.5: [0, 0, 0, 1, 1]
Predictions with threshold 0.7: [0, 0, 0, 0, 1]


In [37]:
# 27. Write code to count false positives and false negatives.

# Example: predictions and true labels
y_true = [0, 0, 0, 1, 1]
y_pred = [0, 0, 0, 0, 1]  

false_positives = 0
false_negatives = 0

for true, pred in zip(y_true, y_pred):
    if pred == 1 and true == 0:
        false_positives += 1
    elif pred == 0 and true == 1:
        false_negatives += 1

print("False Positives:", false_positives)
print("False Negatives:", false_negatives)


False Positives: 0
False Negatives: 1


In [40]:
# 28. Train the model without a bias term and observe the results.
import math

# Sigmoid function
def sigmoid(z):
    return 1 / (1 + math.exp(-z))

# Loss function
def log_loss(y_true, y_pred):
    epsilon = 1e-15
    y_pred = max(min(y_pred, 1 - epsilon), epsilon)
    return - (y_true * math.log(y_pred) + (1 - y_true) * math.log(1 - y_pred))

# Compute gradient without bias
def compute_gradient_no_bias(X, y, w):
    n = len(X)
    dw = 0
    for i in range(n):
        z = w * X[i]  # no bias
        y_hat = sigmoid(z)
        error = y_hat - y[i]
        dw += error * X[i]
    dw /= n
    return dw

# Gradient descent step
def gradient_descent_step_no_bias(w, dw, lr):
    w -= lr * dw
    return w

# Dataset
X = []  # Age
y = []  # Survived

# Read CSV file
with open(filename, 'r') as file:
    reader = csv.reader(file)
    header = next(reader)  # skip header

    for row in reader:
        age = row[1]          # Age column
        survived = row[-1]    # Last column = Survived

        if age != "":         # ignore missing age
            X.append(float(age))
            y.append(int(survived))
            
# Training parameters
w = 0.0
learning_rate = 0.1
epochs = 100
loss_history = []

# Training loop
for epoch in range(epochs):
    dw = compute_gradient_no_bias(X, y, w)
    w = gradient_descent_step_no_bias(w, dw, learning_rate)
    
    # Compute average loss
    total_loss = 0
    for xi, yi in zip(X, y):
        y_hat = sigmoid(w * xi)  # no bias
        total_loss += log_loss(yi, y_hat)
    avg_loss = total_loss / len(X)
    loss_history.append(avg_loss)
    
    # Print loss every 10 epochs
    if (epoch + 1) % 10 == 0:
        print(f"Epoch {epoch+1}: Loss = {avg_loss:.4f}")

print("Trained weight w (no bias):", w)


Epoch 10: Loss = 8.0194
Epoch 20: Loss = 2.1185
Epoch 30: Loss = 8.0289
Epoch 40: Loss = 2.2141
Epoch 50: Loss = 8.0354
Epoch 60: Loss = 2.2809
Epoch 70: Loss = 8.0401
Epoch 80: Loss = 2.3294
Epoch 90: Loss = 8.0435
Epoch 100: Loss = 2.3652
Trained weight w (no bias): -0.3175724677246916


In [43]:
#29. Modify the model to support multiple input features.
import math

# Sigmoid function
def sigmoid(z):
    return 1 / (1 + math.exp(-z))

# Loss function
def log_loss(y_true, y_pred):
    epsilon = 1e-15
    y_pred = max(min(y_pred, 1 - epsilon), epsilon)
    return - (y_true * math.log(y_pred) + (1 - y_true) * math.log(1 - y_pred))

# Compute gradients for multiple features
def compute_gradients_multi(X, y, w, b):
    n = len(X)
    num_features = len(X[0])
    dw = [0.0] * num_features
    db = 0.0
    
    for i in range(n):
        z = sum(w[j] * X[i][j] for j in range(num_features)) + b
        y_hat = sigmoid(z)
        error = y_hat - y[i]
        for j in range(num_features):
            dw[j] += error * X[i][j]
        db += error
    
    dw = [d / n for d in dw]
    db /= n
    return dw, db

# Gradient descent step for multiple features
def gradient_descent_step_multi(w, b, dw, db, lr):
    w = [w[j] - lr * dw[j] for j in range(len(w))]
    b -= lr * db
    return w, b

# Example dataset (3 features)
X = [
    [1, 2, 1],
    [2, 1, 3],
    [3, 0, 2],
    [4, 1, 0],
    [5, 2, 1]
]
y = [0, 0, 0, 1, 1]

# Initialize weights and bias
num_features = len(X[0])
w = [0.0] * num_features
b = 0.0
learning_rate = 0.1
epochs = 100
loss_history = []

# Training loop
for epoch in range(epochs):
    dw, db = compute_gradients_multi(X, y, w, b)
    w, b = gradient_descent_step_multi(w, b, dw, db, learning_rate)
    
    # Compute average loss
    total_loss = 0
    for xi, yi in zip(X, y):
        z = sum(w[j] * xi[j] for j in range(num_features)) + b
        y_hat = sigmoid(z)
        total_loss += log_loss(yi, y_hat)
    avg_loss = total_loss / len(X)
    loss_history.append(avg_loss)
    
    if (epoch + 1) % 10 == 0:
        print(f"Epoch {epoch+1}: Loss = {avg_loss:.4f}")

print("Trained weights:", w)
print("Trained bias:", b)


Epoch 10: Loss = 0.4514
Epoch 20: Loss = 0.3345
Epoch 30: Loss = 0.2675
Epoch 40: Loss = 0.2239
Epoch 50: Loss = 0.1931
Epoch 60: Loss = 0.1701
Epoch 70: Loss = 0.1521
Epoch 80: Loss = 0.1377
Epoch 90: Loss = 0.1258
Epoch 100: Loss = 0.1159
Trained weights: [0.8725170411861485, -0.06927138844192729, -1.822118656934662]
Trained bias: -0.5497184706197988


In [44]:
# 30. Compare predictions before training and after training.
import math

# Sigmoid function
def sigmoid(z):
    return 1 / (1 + math.exp(-z))

# Dataset
X = [1, 2, 3, 4, 5]
y = [0, 0, 0, 1, 1]

# Initialize weights and bias
w = 0.0
b = 0.0
learning_rate = 0.1
epochs = 100

# Predictions BEFORE training 
pred_before = []
for xi in X:
    y_hat = sigmoid(w * xi + b)
    pred = 1 if y_hat >= 0.5 else 0
    pred_before.append(pred)

print("Predictions before training:", pred_before)

# Training loop (Gradient Descent) 
for epoch in range(epochs):
    dw = 0
    db = 0
    n = len(X)
    for xi, yi in zip(X, y):
        y_hat = sigmoid(w * xi + b)
        error = y_hat - yi
        dw += error * xi
        db += error
    dw /= n
    db /= n
    
    # Update weights and bias
    w -= learning_rate * dw
    b -= learning_rate * db

# Predictions AFTER training 
pred_after = []
for xi in X:
    y_hat = sigmoid(w * xi + b)
    pred = 1 if y_hat >= 0.5 else 0
    pred_after.append(pred)

print("Predictions after training:", pred_after)


Predictions before training: [1, 1, 1, 1, 1]
Predictions after training: [0, 0, 1, 1, 1]
