# Machine Learning
- Machine learning (ML) is a branch of artificial intelligence (AI) focused on enabling computers and machines to imitate the way that humans learn
- ML helps to perform tasks autonomously, and to improve their performance and accuracy through experience and exposure to more data
## Types of Machine Learning
- Supervised Machine Learning
- Unsupervised Machine Learning
- Reinforcement learning

## Supervised Machine Learning
- It is a machine learning technique that uses labeled datasets to train artificial intelligence algorithm models to identify the underlying patterns and relationships between input features and outputs
- There are mainly two sub-sections of supervised Machine Learning

### Regression
Regression is used for predicting the continuous value
- Simple Linear Regression
- Multiple Linear Regression
- Polynomial Regression
### Classification
Classification is for predicting the categorial (discreate) values (e.g. Yes/No, image classification etc)
- Classical Machine Learning Algorithms
    - Naive Baye's Classifier: (GNB)
    - Decision Tree Classifier
    - Random Forest Classifier
    - Support Vector Machine (SVM)
    - Logistic Regression: Used for binary classification problems

- Neural Networks: Deep Learning based classification



## Simple Linear Regression
Simple Linear Regression only has one in-dependent variable (x) and one dependent variable (y). 
- x input
- y target

We use the equation of Straight line to find out the value of dependent variable by
``x = mx +c``

**Example**
Below is the data for hours student studied and the marks obtained

<img src="./assets/linear-regression.jpg" alt="simple linear regression" style="width: 700px; height: auto;">


### Solve the Linear Regression and Predict the marks for students who studied 6 hours

In [9]:
# Predict the marks when hours of studied = 6
x = 6
print(f" Marks when hours of studied is 6 = ?")

 Marks when hours of studied is 6 = ?


### HW -  **Predict Demand using the given data**
| Price | Demand |
|-------|---------|
|   10	    |   500    |
|   15	    |   450    |
|   20	    |   400    |
|   25	    |   350   | 
|   30	    |   300   | 

- Find the linear equation for above data (D=mP+c) where price is independent variable (P) and demand is dependable variable (D)
- Predict the Demand when Price is 37.6 
- Attach the Handwrittern screenshot of your solution
- Also solve the problem using Python (with pandas or numpy)


## Classification
### Naive Bayes Classifier
- The Naive Bayes classifier uses Bayes Theorem to solve the classification problems by computing class probabilities of the given feature values.
- It is used for Spam Detection, Sentiment Analysis and Text Classification etc.

### Naive Bayes Theorem
$$
P(H|E) = \frac{P(E|H) \cdot P(H)}{P(E)}
$$
Where:

- \( P(H|E) \) = **Posterior probability** (Probability of hypothesis \( H \) given evidence \( E \))
- \( P(E|H) \) = **Likelihood** (Probability of evidence \( E \) given hypothesis \( H \))
- \( P(H) \) = **Prior probability** (Initial probability of hypothesis \( H \))
- \( P(E) \) = **Evidence probability** (Overall probability of evidence \( E \))

### Example of Naive Bayes Theorem
If the test result for a dieses is +ve what is the probability of having the disease based on the historical data

| Disease  | Test Positive | Test Negative | Total |
|----------|--------------|--------------|-------|
| Yes (D)  | 40           | 10           | 50    |
| No (~D)  | 30           | 120          | 150   |
| **Total** | 70           | 130          | 200   |


<img src="./assets/Bayes-theorem.jpg" alt="bayes theorem" style="width: 700px; height: auto;">


#### **HW - Create a  function in Python that that accepts above historical table as a Pandas DataFrame and calculate the probability of having disease if tested positive using Bayes Theorem**

### Naive Bayes Classifier

Let us suppose we have our the following sentence from our email and also the label for email as "Spam" or "Not"

|Text   | Label|
|-------|------|
|Win a free prize now|Spam|
|Meeting at 5PM tomorrow|Not Spam|
|Claim your discount today| Spam|
|Reminder: Project deadline| Not Spam|

Now we have to predict that email containing **Win a discount today** is spam or not

#### Step 1- Represent the above data into pandas data frame

In [11]:
email_data = pd.DataFrame({
    'text':['Win a free prize now','Meeting at 5PM tomorrow','Claim your discount today','Reminder: Project deadline'],
    'label':['Spam','Not Spam','Spam','Not Spam']
})

#### Step 2 Calculate the Probablity of email being spam if it contains the given words (features)
$$
P(\text{Spam} \mid \text{Features}) = \frac{P(\text{Features} \mid \text{Spam}) \cdot P(\text{Spam})}{P(\text{Features})}
$$

- For this lets consider each words as Features and represent each sentence as boolean vector
​
 


In [18]:
# extract all unique features (words)
all_features  = set()
for text in email_data['text']:
    words = text.lower().split()
    all_features.update(words)

all_features = sorted(list(all_features))

def create_binary_vector(sentence):
    words = sentence.lower().split()  
    return [1 if word in words else 0 for word in all_features]  

email_data['sentence_vector'] = email_data['text'].apply(create_binary_vector)
email_data

Unnamed: 0,text,label,sentence_vector
0,Win a free prize now,Spam,"[0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0]"
1,Meeting at 5PM tomorrow,Not Spam,"[1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0]"
2,Claim your discount today,Spam,"[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1]"
3,Reminder: Project deadline,Not Spam,"[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0]"


#### Step 3: Apply Naive Bayes Classifier to predict the email containing the sentence "**Win a discount today**" is spam or not

Sub-Steps
##### Step 1: Calculate Prior Probabilities
##### Step 2: Calculate the Likelihoods
##### Step 3: Use Bayes' Theorem** to **Classify the Sentence "Win a discount today"
Using the likelihoods and prior probabilities ,  calculate the posterior probabilities for Spam and Not Spam:
$$ P(\text{Spam} \mid \text{Features}) \propto P(\text{Spam}) \cdot \prod_{i=1}^{n} P(\text{word}_i \mid \text{Spam})$$
$$ P(\text{Not Spam} \mid \text{Features}) \propto P(\text{Not Spam}) \cdot \prod_{i=1}^{n} P(\text{word}_i \mid \text{Not Spam}) $$

After calculating both values,  choose the class with the higher posterior probability.


In [19]:
# Calculate prior probabilities
spam_count = len(email_data[email_data['label'] == 'Spam'])
not_spam_count = len(email_data[email_data['label'] == 'Not Spam'])
total_count = len(email_data)

P_spam = spam_count / total_count
P_not_spam = not_spam_count / total_count


# Calculate likelihoods for each feature (word) given Spam and Not Spam
spam_data = email_data[email_data['label'] == 'Spam']
not_spam_data = email_data[email_data['label'] == 'Not Spam']


# Initialize dictionaries to hold probabilities
P_word_given_spam = {}
P_word_given_not_spam = {}

# For each word, calculate P(word|Spam) and P(word|Not Spam)
for word in all_features:
    P_word_given_spam[word] = (spam_data['sentence_vector'].apply(lambda x: x[all_features.index(word)]).sum() + 1) / (len(spam_data) + 2)
    P_word_given_not_spam[word] = (not_spam_data['sentence_vector'].apply(lambda x: x[all_features.index(word)]).sum() + 1) / (len(not_spam_data) + 2)

### Predict for new sentence (Win a discount today)

In [21]:
# New sentence to classify
new_sentence = "Win a discount today"
new_vector = create_binary_vector(new_sentence)

# Calculate P(Spam | Features) and P(Not Spam | Features)
P_spam_given_features = P_spam
P_not_spam_given_features = P_not_spam

for i, word in enumerate(all_features):
    if new_vector[i] == 1:
        P_spam_given_features *= P_word_given_spam[word]
        P_not_spam_given_features *= P_word_given_not_spam[word]
    else:
        P_spam_given_features *= (1 - P_word_given_spam[word])
        P_not_spam_given_features *= (1 - P_word_given_not_spam[word])

# Normalize the results
total_prob = P_spam_given_features + P_not_spam_given_features
P_spam_given_features /= total_prob
P_not_spam_given_features /= total_prob


if P_spam_given_features > P_not_spam_given_features:
    print("The sentence is classified as Spam.")
else:
    print("The sentence is classified as Not Spam.")

The sentence is classified as Spam.


### Decision Tree Classifier
<img src="./assets/decision-tree.png" alt="Decision Tree" style="width: 400px; height: auto;">


| Day   | Rain | Cloudy | Bring Umbrella? |
|-------|------|--------|-----------------|
| Day 1 | Yes  | Yes    | Yes             |
| Day 2 | Yes  | No     | Yes             |
| Day 3 | No   | Yes    | Yes             |
| Day 4 | No   | No     | No              |
| Day 5 | Yes  | Yes    | Yes             |



- **Training**: The tree learns from past data.
- It tries to split the data into different categories (e.g., "Bring Umbrella: Yes" or "No") by asking questions that best separate the data points.
- Classifying new data: When we get new weather data, the decision tree will ask the same questions:

Is it raining? → If yes, the tree says bring an umbrella.

### Mathematics Behind Decision Tree
- In Decision Tree Classifier we split the data such data the information Gain is optimal
- For optimal Information Gain we use Entropy

<img src="./assets/better-split.png" alt="Decision Tree" style="width: 300px; height: auto;">
<img src="./assets/Decision-tree.png" alt="Decision Tree" style="width: 400px; height: auto;">



$$ Entropy(S) = - \sum_{i=1}^{c} p_i \log_2 p_i $$

$$ IG(S, A) = Entropy(S) - \sum_{i} \omega_i Entropy(S_i) $$


### Random Forest Classifier
<img src="./assets/random-forest.jpg" alt="Random Forest" style="width: 500px; height: auto;">

- Random Forest is an ensemble machine learning technique that builds multiple decision trees and combines their results to improve accuracy 


### Logistic Regression Classifier
- Logistic Regression is similar to Linear Regression, but it predicts a probability and classifies data into categories like Yes/No 1,0 etc.

##### Working
- Predict the output based on input as like Linear regression 
- y = mx +c

- then apply the Sigmoid function on y to find the probablity
- if Probablity is greater than/equal to 0.5 its Yes, otherwise is no

$$
LR(y) = \frac{1}{1 + e^{-z}}
$$

Now decision is base don the LR(y) as 
$$
\text{If } P(Y=1) > 0.5 \Rightarrow \text{ Predict Class } 1
$$

$$
\text{If } P(Y=1) \leq 0.5 \Rightarrow \text{ Predict Class } 0
$$


### Deep Learning
- Lets create a Neural Network with 
    - One Input Neuron
    - One Hidden Layer
    - One Output Neuron

- Train this Neural Network to find the the pattern in our dataset
- Out dataset contains 
    - x=Investment On Stock Market
    - y= Return (= x*3)
- Train The Neural Network and later predict the Return for new investment


In [3]:
import numpy as np

np.random.seed(42) 
x_train = np.random.normal(0, 1, (1000, 1))
y_train = 3 * x_train 

# Initialize weights and biases
w1 = np.random.randn(1, 1) 
b1 = np.zeros((1,)) 
w2 = np.random.randn(1, 1)  
b2 = np.zeros((1,)) 

def relu(x):
    return np.maximum(0, x)

def relu_derivative(x):
    return (x > 0).astype(float)

# Learning rate
lr = 0.01
epochs = 1000
print(w1,b1,w2,b2)
# Training loop
for epoch in range(epochs):
    # Forward pass
    hidden_input = np.dot(x_train, w1) + b1
    hidden_output = relu(hidden_input)  # Hidden layer activation
    
    output = np.dot(hidden_output, w2) + b2  # Output layer (linear activation)
    
    # Compute loss (Mean Squared Error)
    loss = np.mean((output - y_train) ** 2)

    # Backpropagation
    d_loss_output = 2 * (output - y_train) / len(y_train)  # dL/dy
    d_w2 = np.dot(hidden_output.T, d_loss_output)  # Gradient for w2
    d_b2 = np.sum(d_loss_output, axis=0)  # Gradient for b2

    d_hidden = np.dot(d_loss_output, w2.T) * relu_derivative(hidden_input)  # Backprop through ReLU
    d_w1 = np.dot(x_train.T, d_hidden)  # Gradient for w1
    d_b1 = np.sum(d_hidden, axis=0)  # Gradient for b1

    # Update weights and biases
    w1 -= lr * d_w1
    b1 -= lr * d_b1
    w2 -= lr * d_w2
    b2 -= lr * d_b2

    # Print loss every 100 epochs
    if epoch % 100 == 0:
        print(f"Epoch {epoch}, Loss: {loss:.4f}")

# Testing the trained model
x_test = np.array([[2.0], [7.5], [0.0]])  # Test inputs
hidden_input_test = np.dot(x_test, w1) + b1
hidden_output_test = relu(hidden_input_test)
y_pred = np.dot(hidden_output_test, w2) + b2  # Output layer prediction

# Print results
for i, x_val in enumerate(x_test):
    print(f"Input: {x_val[0]}, Predicted: {y_pred[i][0]:.4f}, Expected: {3 * x_val[0]:.4f}")


[[1.39935544]] [0.] [[0.92463368]] [0.]
Epoch 0, Loss: 5.5317
Epoch 100, Loss: 1.8797
Epoch 200, Loss: 1.1632
Epoch 300, Loss: 0.7821
Epoch 400, Loss: 0.5618
Epoch 500, Loss: 0.4258
Epoch 600, Loss: 0.3359
Epoch 700, Loss: 0.2741
Epoch 800, Loss: 0.2299
Epoch 900, Loss: 0.1971
Input: 2.0, Predicted: 6.0045, Expected: 6.0000
Input: -1.5, Predicted: -4.2607, Expected: -4.5000
Input: 0.0, Predicted: -0.0115, Expected: 0.0000


In [4]:
print(w1,b1,w2,b2)

[[1.39004524]] [1.96361602] [[2.16395982]] [-4.2607315]


- Assignment:
- Use scikit-learn (https://scikit-learn.org/stable/) and use following algorithms for regression and Classification
- Regression (Predict House Price) - Can use any publicly available data
- GNB (Classify the Email as Spam or Not)
- Decision Tree Classifier (Any classification Task)

- Guidelines
- Apply data cleaning and pre-processing 
- Prepare Test and Train Dataset
- Train the Model with the processed Data
- Use the Classifier/Regression and Calculate the Accuracy
- Use the Trained Model for Inference