# Assignment 9

# Please save your file with your name.ipynb and share this Jupter notebook with solutions

In [None]:
# Question 1
# What is the role of convolutional layer in CNN in case of image classification model?

Role of Convolutional Layer in CNN for Image Classification
The convolutional layer is the fundamental building block of a Convolutional Neural Network (CNN). It is designed to extract features from the input image and plays a crucial role in enabling the network to perform tasks like image classification. Here's a detailed explanation of its role:

1. Feature Extraction
The convolutional layer applies convolutional filters (kernels) to the input image.
These filters scan the image in small regions (receptive fields) and detect specific patterns, such as edges, corners, textures, or shapes.
Early layers typically capture low-level features like edges and gradients, while deeper layers capture high-level features like object parts or entire objects.
2. Spatial Hierarchy of Features
By stacking multiple convolutional layers, CNNs build a hierarchical representation of the input image:
Low-level features: Detected by the initial layers (e.g., edges, textures).
Mid-level features: Captured by intermediate layers (e.g., shapes, contours).
High-level features: Learned by deeper layers (e.g., object parts, semantic patterns).
3. Translational Invariance
The convolutional process ensures that features are detected regardless of their position in the image. For example, if a cat's ear appears in a different location in two images, the filter still recognizes it.
4. Reducing Complexity with Weight Sharing
The convolutional layer uses the same filter weights across the entire image (weight sharing). This:
Reduces the number of parameters compared to fully connected layers.
Makes the network more efficient and less prone to overfitting.
5. Dimensionality Reduction
The convolution operation, often combined with stride and padding, reduces the spatial dimensions of the feature map.
This helps focus on essential features and reduces computational overhead.
6. Enabling Non-Linearity
After applying the convolution operation, an activation function (like ReLU) is applied element-wise to introduce non-linearity, enabling the network to learn complex patterns.
Example in Image Classification
For an image classification task (e.g., recognizing whether an image contains a cat or a dog):

Initial Layers: Detect low-level features such as edges and corners of the cat or dog.
Intermediate Layers: Identify specific shapes like eyes, ears, or tails.
Deeper Layers: Recognize the arrangement of these features to classify the entire object as a cat or dog.
Key Benefits in Image Classification:
Efficient Feature Learning: Automatically learns relevant features without manual engineering.
Hierarchical Understanding: Builds a layered understanding of the image, from basic patterns to complex structures.
Improved Generalization: Captures universal patterns that generalize well to unseen data.
In summary, the convolutional layer is critical for feature extraction, dimensionality reduction, and enabling translational invariance, forming the foundation of accurate image classification models.

In [None]:
# Question 2
# Explain the use of pooling layer and dropout layer?

A pooling layer is used in Convolutional Neural Networks (CNNs) to reduce the spatial dimensions of feature maps (width and height) while retaining the most important information. This helps in reducing computational complexity, memory usage, and the risk of overfitting.

Types of Pooling:

Max Pooling:
Takes the maximum value from each patch of the feature map.
Captures the most prominent features.
Helps detect sharp features like edges or textures.

Average Pooling:
Takes the average of all values in each patch.
Captures smoother features and reduces noise.

Global Pooling:
Reduces the entire feature map to a single value (e.g., Global Max Pooling or Global Average Pooling).
Often used as a bridge between convolutional layers and fully connected layers.

Benefits:
Dimensionality Reduction: Makes the model computationally efficient.
Translation Invariance: Helps the network focus on the presence of features rather than their exact location.
Feature Extraction: Reduces the feature map size while retaining meaningful information.

Dropout Layer
The dropout layer is a regularization technique used to prevent overfitting in neural networks by randomly "dropping out" (setting to zero) a fraction of neurons during training.

How It Works:
At each training iteration, a specified percentage of neurons are temporarily deactivated.
These neurons do not contribute to the forward pass or backpropagation.
This forces the network to learn redundant representations and reduces reliance on specific neurons.

Key Parameters:
Dropout Rate (p): The fraction of neurons to deactivate (e.g., 0.5 means 50% of neurons are dropped).
Benefits:Reduces Overfitting: Prevents the model from becoming too reliant on specific neurons.
Encourages Robust Learning: Ensures the network learns distributed representations of data.
Improves Generalization: Leads to better performance on unseen data.

In Summary:
Pooling Layer: Primarily reduces spatial dimensions and retains important features, making the model efficient and invariant to minor translations.

Dropout Layer: Helps mitigate overfitting by randomly deactivating neurons during training, improving generalization and robustness.

In [None]:
# Question 3
# What are the applications of CNN?


Applications of Convolutional Neural Networks (CNNs)
CNNs have revolutionized numerous fields due to their ability to automatically extract features from data, particularly images, with minimal preprocessing. Here are some prominent applications:

1. Computer Vision
CNNs are widely used for visual data processing and analysis.

a. Image Classification
Identifying the class of an object in an image.
Example: Recognizing cats, dogs, cars, etc., in photos.

b. Object Detection
Locating and identifying multiple objects in an image.
Example: Detecting pedestrians, vehicles, and road signs in autonomous driving.

c. Image Segmentation
Dividing an image into meaningful regions (pixel-level classification).
Example: Medical imaging for tumor segmentation or scene understanding.

d. Face Recognition
Identifying or verifying a person's identity from facial features.
Example: Face ID systems on smartphones or security systems.

e. Optical Character Recognition (OCR)
Extracting text from images.
Example: Digitizing printed documents or reading vehicle license plates.

2. Healthcare and Medical Imaging
CNNs have become essential tools for diagnosing diseases and analyzing medical images.

Disease Detection: Identifying abnormalities in X-rays, MRIs, and CT scans (e.g., cancer, pneumonia).
Histopathology: Analyzing tissue samples for diseases.
Retinal Analysis: Diagnosing diabetic retinopathy or glaucoma.
Endoscopy and Ultrasound: Enhancing image clarity and detecting issues.

3. Autonomous Vehicles
CNNs play a critical role in enabling self-driving cars by:

Lane Detection: Identifying road lanes and markings.
Obstacle Detection: Recognizing pedestrians, other vehicles, and road hazards.
Traffic Sign Recognition: Reading and understanding traffic signs.

4. Natural Language Processing (NLP)
Although RNNs and Transformers are common in NLP, CNNs are used for:

Text Classification: Sentiment analysis, spam detection, or topic categorization.
Document Summarization: Extracting key points from text.
Language Translation: Enhancing visual features in image-based text translation.

5. Robotics
Scene Understanding: Enabling robots to perceive and navigate environments.
Grasp Detection: Assisting robotic arms in picking and manipulating objects.

6. Retail and E-commerce
Product Recommendation: Using visual search to find similar products.
Visual Inspection: Detecting defects in manufacturing.
Inventory Management: Monitoring stock using image recognition.

7. Agriculture
Crop Monitoring: Analyzing aerial or satellite images to detect crop health and growth patterns.
Weed Detection: Identifying and differentiating weeds from crops for precision farming.
Livestock Monitoring: Detecting diseases or behaviors in animals.

In [None]:
# Question 4
# What are the disadvanatges of traditional RNN?

Disadvantages of Traditional RNNs (Recurrent Neural Networks)
While traditional RNNs are powerful for processing sequential data, they come with several limitations that make them less effective for many real-world applications. Below are the key disadvantages:

1. Vanishing Gradient Problem
In RNNs, gradients are propagated back through time during training.
Over long sequences, gradients can shrink exponentially, leading to vanishing gradients, where the network fails to update weights for earlier layers effectively.
This limits the RNN’s ability to learn long-term dependencies in the data.

2. Exploding Gradient Problem
Conversely, gradients can also grow exponentially during backpropagation, leading to unstable training and large weight updates.
While gradient clipping can mitigate this issue, it doesn't eliminate the fundamental limitation.

3. Difficulty in Learning Long-Term Dependencies
Traditional RNNs struggle to retain information over long sequences due to their architecture.
As a result, they are ineffective for tasks that require remembering context from far back in time, such as long-form text processing or time series analysis.

4. Lack of Parallelization
RNNs process input sequentially, meaning the computation for the next time step depends on the previous one.
This makes training and inference slower compared to architectures like CNNs or Transformers, which allow for parallel processing.

5. Inefficiency with Long Sequences
Processing long sequences is computationally expensive due to sequential dependencies.
Memory usage increases significantly, as RNNs need to store intermediate states for each time step.

6. Limited Flexibility in Handling Variable-Length Sequences
Although RNNs can process variable-length input, their performance often deteriorates when the sequence length varies significantly across samples.

7. Susceptibility to Overfitting
RNNs have a large number of parameters, making them prone to overfitting, especially when the training data is limited.

8. Difficulty in Capturing Hierarchical Structures
Traditional RNNs cannot effectively model hierarchical or multiscale dependencies in data, such as paragraph structures in text or multi-level patterns in time series.

9. Poor Memory Retention
The "memory" of traditional RNNs is limited due to their recurrent connections and lack of explicit mechanisms to control memory retention or forgetting.

10. Challenges in Gradient Descent Optimization
The recurrent connections and deep time-step unfolding make optimization harder, requiring advanced techniques like gradient clipping or specialized optimizers.

Solutions to Overcome RNN Limitations
LSTMs (Long Short-Term Memory Networks):
Introduce memory cells and gates to retain information over long sequences and handle vanishing gradients.
GRUs (Gated Recurrent Units):
Simplify LSTMs while still addressing long-term dependencies effectively.
Transformers:
Leverage self-attention mechanisms for parallel processing, overcoming the sequential nature of RNNs and enabling efficient learning of long-range dependencies.

In summary, traditional RNNs are limited by issues like vanishing gradients, inefficient handling of long-term dependencies, and slow training. These challenges have largely been addressed by more advanced architectures like LSTMs, GRUs, and Transformers, which are now preferred for most sequence modeling tasks.

In [None]:
# Question 5
# Why LSTM is preferred for time series modelling?




LSTM (Long Short-Term Memory) networks are preferred for time series modeling because of their unique ability to handle sequential data and capture temporal dependencies effectively. Here's why they are advantageous:

1. Handling Long-Term Dependencies
Time series data often exhibit long-term dependencies, where future values depend not only on recent observations but also on events that occurred further back in time.
LSTMs use memory cells and gates (input, output, and forget gates) to selectively remember or forget information over long sequences, making them well-suited for capturing these dependencies.

2. Mitigating the Vanishing Gradient Problem
Traditional RNNs (Recurrent Neural Networks) face challenges like the vanishing gradient problem, where gradients diminish over long sequences, preventing the network from learning effectively.
LSTMs overcome this with their internal architecture, allowing gradients to flow unimpeded for longer durations.

3. Adaptability to Irregular Patterns
Time series data can have varying trends, seasonality, or noise. LSTMs can learn these complex patterns due to their recurrent nature and flexibility in updating or retaining states.

4. Multivariate Time Series Capability
LSTMs can handle multivariate time series, where multiple variables interact and influence one another. This is crucial for tasks involving multiple interdependent time series data streams.

5. Robustness to Missing Data
Time series datasets often have missing or irregular data points. LSTMs can interpolate or infer missing values effectively by leveraging temporal correlations.

6. Prediction of Sequential Outputs
LSTMs can predict not just single values but also sequences of values, making them suitable for forecasting tasks like stock prices, weather, or energy consumption.

7. Support for Variable-Length Input
Unlike traditional models that may require fixed input sizes, LSTMs can process variable-length sequences, making them versatile for datasets with varying time intervals.

Practical Applications:
Financial Forecasting: Stock prices, revenue, or sales predictions.

Energy Demand Modeling: Predicting power consumption trends.
Weather Forecasting: Estimating future climatic conditions.

Anomaly Detection: Identifying unusual patterns in time series data, such as fraud detection.

In summary, LSTMs are preferred for time series modeling because of their ability to model complex, long-term dependencies while being resilient to challenges like vanishing gradients and missing data.

In [None]:
# Question 6
# Describe the role of forget gate and cell state in LSTM

Role of Forget Gate and Cell State in LSTM
Long Short-Term Memory (LSTM) networks are a type of Recurrent Neural Network (RNN) designed to effectively learn long-term dependencies. The forget gate and cell state are central components of an LSTM, enabling it to control the flow of information and address the limitations of traditional RNNs, such as the vanishing gradient problem.

1. Forget Gate
Purpose:
The forget gate determines which information from the previous cell state should be retained or discarded. This is essential for managing the long-term memory of the LSTM.

Mechanism:
The forget gate is implemented as a sigmoid activation function (𝜎), which outputs values between 0 and 1 for each value in the cell state.
1: Completely retain the information.
0: Completely forget the information.
The forget gate uses the current input (𝑥𝑡 ) and the previous hidden state
(ℎ𝑡−1) to compute its decision:
𝑓𝑡=𝜎(𝑊𝑓⋅[ℎ𝑡−1,𝑥𝑡]+𝑏𝑓)

𝑓𝑡 : Forget gate's output (values between 0 and 1).
𝑊𝑓,𝑏𝑓 : Learnable weights and bias.
ℎ𝑡−1 : Previous hidden state.
𝑥𝑡 : Current input.
Role:
Filters out unnecessary information in the cell state.
Ensures that only relevant information is propagated through time.
2. Cell State
Purpose:
The cell state acts as the long-term memory of the LSTM. It carries information across many time steps, allowing the network to maintain context over long sequences.

Mechanism:
The cell state is updated in three steps:
Forget Unnecessary Information:
The forget gate determines which parts of the previous cell state
(𝐶𝑡−1 ) should be discarded.
𝐶𝑡=𝑓𝑡⋅𝐶𝑡−1

Add New Information:
The input gate decides which new information to add to the cell state.
A candidate update (𝐶~𝑡 ) is computed and scaled by the input gate’s output
(𝑖𝑡 ):=Ct +it⋅ C~t​


Pass Updated State:
The resulting 𝐶𝑡​
  is passed to the next time step, enabling the LSTM to retain and update long-term memory dynamically.
Role:
Maintains a consistent flow of information across time steps.
Enables the LSTM to selectively store or discard information, adapting to the needs of the task.
Interaction Between Forget Gate and Cell State:
The forget gate acts as a filter, deciding what parts of the previous memory (
𝐶𝑡−1 ) are no longer relevant.
The cell state updates itself by combining the retained memory (from the forget gate) with new information (from the input gate).
This interaction allows LSTMs to manage both short-term and long-term dependencies effectively.
Why They Are Important:
Forget Gate ensures that irrelevant or outdated information does not clutter the memory.
Cell State provides a mechanism to store and propagate important information across long sequences.
Together, they enable LSTMs to address the vanishing gradient problem and learn dependencies over varying time scales, making them highly effective for tasks like time series forecasting, text generation, and speech recognition.

In [None]:
# Question 7
# Build time series forecasting model using LSTM for AbbVie Inc ( Symbol: ABBV )
# Data file: 'prices_split_adjusted.csv'



In [10]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.optimizers import Adam

# Load and preprocess the data
def load_and_preprocess_data(filepath='content/sample_data/prices_split_adjusted.csv'
, symbol='ABBV'):
    # Read the CSV file
    df = pd.read_csv(file_path)

    # Display the first few rows and general information about the dataset
    data_info = df.info()
    data_head = df.head()
    data_info, data_head

    # Filter for ABBV stock
    df_symbol = df[df['symbol'] == symbol].copy()

    # Convert date column to datetime
    df_symbol['date'] = pd.to_datetime(df_symbol['date'], format='%Y-%m-%d')

    # Sort by date
    df_symbol = df_symbol.sort_values('date')

    # Reset index
    df_symbol.reset_index(drop=True, inplace=True)

    return df_symbol
    df_symbol.head()

# Create sequences for LSTM
def create_sequences(data, seq_length):
    X, y = [], []
    for i in range(len(data) - seq_length):
        X.append(data[i:(i + seq_length)])
        y.append(data[i + seq_length])
    return np.array(X), np.array(y)

# Build LSTM model
def build_lstm_model(input_shape):
    model = Sequential([
        LSTM(50, activation='relu', input_shape=input_shape, return_sequences=True),
        Dropout(0.2),
        LSTM(50, activation='relu'),
        Dropout(0.2),
        Dense(25, activation='relu'),
        Dense(1)
    ])

    model.compile(optimizer=Adam(learning_rate=0.001), loss='mean_squared_error')
    return model

# Main forecasting function
def forecast_stock_price(file_path, symbol='ABBV', seq_length=10, test_size=0.2):
    # Load data
    df = load_and_preprocess_data(file_path, symbol)

    # Select close prices for forecasting
    close_prices = df['close'].values.reshape(-1, 1)

    # Normalize the data
    scaler = MinMaxScaler(feature_range=(0, 1))
    scaled_prices = scaler.fit_transform(close_prices)

    # Create sequences
    X, y = create_sequences(scaled_prices, seq_length)

    # Split into train and test sets
    split = int(len(X) * (1 - test_size))
    X_train, X_test = X[:split], X[split:]
    y_train, y_test = y[:split], y[split:]

    # Build and train the model
    model = build_lstm_model((X_train.shape[1], 1))
    history = model.fit(
        X_train, y_train,
        epochs=50,
        batch_size=32,
        validation_split=0.2,
        verbose=0
    )

    # Make predictions
    train_predict = model.predict(X_train)
    test_predict = model.predict(X_test)

    # Inverse transform predictions
    train_predict = scaler.inverse_transform(train_predict)
    y_train_inv = scaler.inverse_transform(y_train)
    test_predict = scaler.inverse_transform(test_predict)
    y_test_inv = scaler.inverse_transform(y_test)

    # Calculate RMSE
    train_rmse = np.sqrt(np.mean((train_predict - y_train_inv)**2))
    test_rmse = np.sqrt(np.mean((test_predict - y_test_inv)**2))

    # Plotting
    plt.figure(figsize=(12,6))
    plt.plot(history.history['loss'], label='Training Loss')
    plt.plot(history.history['val_loss'], label='Validation Loss')
    plt.title('Model Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend()
    plt.show()

    # Plot actual vs predicted
    plt.figure(figsize=(12,6))
    plt.subplot(1,2,1)
    plt.plot(y_train_inv, label='Actual Train')
    plt.plot(train_predict, label='Predicted Train', alpha=0.7)
    plt.title('Train Predictions')
    plt.legend()

    plt.subplot(1,2,2)
    plt.plot(y_test_inv, label='Actual Test')
    plt.plot(test_predict, label='Predicted Test', alpha=0.7)
    plt.title('Test Predictions')
    plt.legend()
    plt.tight_layout()
    plt.show()

    print(f"Train RMSE: {train_rmse}")
    print(f"Test RMSE: {test_rmse}")

    return model, scaler

# Run the forecasting
model, scaler = forecast_stock_price('prices_split_adjusted_1.csv')

IndentationError: unexpected indent (<ipython-input-10-9e9b89fe012d>, line 16)