<a href="https://colab.research.google.com/github/wekann/Assignment/blob/main/CNN_Architecture.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Theory

In [None]:
'''1. what is a convolutional Neural network (CNN), and why is it used for image processing?
A Convolutional Neural Network (CNN) is a type of deep learning neural network designed specifically for processing data with a grid-like topology, such as images. CNNs are inspired by the structure of the human visual cortex and are especially powerful in recognizing spatial hierarchies and patterns in images.

Key Components of CNN:

1. Convolutional Layers:
   Apply filters (kernels) to extract features like edges, textures, and shapes from images.

2. ReLU (Activation Function):
   Introduces non-linearity to help the network learn complex patterns.

3. Pooling Layers:
   Reduce the spatial dimensions (e.g., width and height) of the data, helping to reduce computational load and prevent overfitting.

4. Fully Connected Layers (Dense Layers):
   Make predictions based on features extracted by convolutional and pooling layers.

Why CNNs Are Used for Image Processing:

1. Automatic Feature Extraction
   CNNs automatically learn which features (like corners, lines, textures) are important, eliminating the need for manual feature engineering.

2. Parameter Sharing
   Filters are shared across the image, which greatly reduces the number of parameters compared to traditional neural networks.

3. Translation Invariance
   CNNs are robust to slight shifts and distortions in image content—critical for real-world image recognition tasks.

4. Hierarchical Learning
   Early layers detect simple patterns (edges), while deeper layers recognize complex patterns (faces, objects).

Common Applications:
* Object and face recognition
* Image classification
* Medical imaging (e.g., tumor detection)
* Self-driving cars (visual input)
* Surveillance and security

*CNNs are well-suited for image processing because they efficiently capture spatial dependencies, require fewer parameters, and can generalize well to new image data.


In [None]:
'''2. What are the key components of CNN architecture?
A Convolutional Neural Network (CNN) architecture is composed of several layers, each serving a specific purpose in learning and extracting features from images.

Key Components of CNN Architecture:
1.Input Layer
* Receives the raw image data (e.g., a 28×28 grayscale image or a 224×224×3 RGB image).
* It reshapes and passes the image data into the network.

2. Convolutional Layer
* Purpose: Automatically extract features (like edges, corners, textures).
* Applies filters (kernels) that slide over the input image to create feature maps.
* Each filter detects a different pattern.
Example: A 3×3 filter scanning a 28×28 image produces a 26×26 feature map.

3. Activation Function (ReLU)
* Purpose: Adds non-linearity so the network can learn complex patterns.
* The most common activation is ReLU (Rectified Linear Unit):
  `ReLU(x) = max(0, x)`

4. Pooling Layer (Subsampling or Downsampling)
* Purpose: Reduces spatial dimensions (height and width), making computation faster and reducing overfitting.
* Common types:
  * Max Pooling: Takes the maximum value in a patch.
  * Average Pooling: Takes the average value.
  > Example: A 2×2 max pooling reduces a 26×26 feature map to 13×13.

5. Flatten Layer
* Converts the 2D feature maps into a 1D vector.
* This is necessary before connecting to the fully connected (dense) layer.

6. Fully Connected Layer (Dense Layer)
* Purpose: Performs high-level reasoning and classification.
* Connects every neuron from the previous layer to every neuron in this layer.
* Usually the last few layers of the CNN.

7. Output Layer
* Produces the final prediction (e.g., class probabilities).
* Softmax activation is used for multi-class classification.
* Sigmoid is used for binary classification.

Optional:
* Dropout Layers: Randomly turn off neurons during training to prevent overfitting.
* Batch Normalization: Normalizes the inputs to layers to speed up training and improve stability.

Table:
| Component               | Function                                 |
| ----------------------- | ---------------------------------------- |
| Input Layer             | Accepts image data                       |
| Convolutional Layer     | Extracts features with filters           |
| Activation (ReLU)       | Adds non-linearity                       |
| Pooling Layer           | Reduces size and computation             |
| Flatten Layer           | Prepares data for dense layers           |
| Fully Connected Layer   | Learns patterns and makes predictions    |
| Output Layer            | Produces final classification results    |
| Dropout/BatchNorm (opt) | Prevents overfitting, speeds up training |

In [None]:
'''3. what is the role of the convolutional layer in CNNs?

The Convolutional Layer is the core building block of a Convolutional Neural Network (CNN). Its main role is to extract meaningful features from the input image by applying a set of learnable filters (also called kernels).

Primary Role: Feature Extraction
* The convolutional layer applies filters that scan (or convolve over) the input image.
* Each filter detects a specific feature such as edges, lines, textures, or shapes.
* These filters slide across the image, performing a mathematical operation called convolution, generating a feature map (also known as activation map).

How It Works:
1. Input: A 2D image (e.g., 28×28 pixels).
2. Filter/Kernels: A small matrix (e.g., 3×3 or 5×5) that moves over the image.
3. Convolution Operation:
   * Multiply the filter with the image region it covers.
   * Sum the result to get a single value in the output feature map.
4. Feature Map: The output highlights where the filter’s pattern (like an edge) appears in the image.

Example:
Let’s say we use a 3×3 edge detection filter:
Image Patch:     Filter:         Output (1 value):
1 1 1            -1 -1 -1
0 1 0             0  0  0
1 0 1             1  1  1

The filter emphasizes horizontal edges and outputs a number that reflects how strong that pattern is at that location.

#Key Benefits:
* Local Connectivity: Each neuron is connected only to a small region of the input, reducing the number of parameters.
* Translation Invariance: The same filter is applied across the entire image, so the network can recognize patterns regardless of their location.
* Hierarchical Learning:
  * Early layers detect low-level features (edges, colors),
  * Deeper layers detect high-level features (faces, objects).

In Summary:
| Role                   | Description                                 |
| ---------------------- | ------------------------------------------- |
| Feature Extraction     | Detects edges, shapes, textures, etc.       |
| Parameter Efficiency   | Reduces model complexity via shared filters |
| Spatial Understanding  | Preserves spatial relationship of pixels    |
| Input to Deeper Layers | Passes learned features to next layers      |

The convolutional layer is crucial for enabling CNNs to understand images in a structured, efficient, and scalable way.


In [None]:
'''4. what is a filter(kernel) in CNNs?
In Convolutional Neural Networks (CNNs), a filter (also called a kernel) is a small matrix used to scan over the input image to extract specific features such as edges, corners, or textures.

Definition:
A filter is a set of learnable weights—typically a small-sized matrix (e.g., 3×3, 5×5)—that is applied to the input data via the convolution operation. The result of this operation is a feature map that highlights the presence of a particular pattern in different regions of the image.
How It Works:
1. The filter slides across the image (from left to right, top to bottom).
2. At each position, it performs an element-wise multiplication between the filter and the part of the image it overlaps (called a receptive field), then sums the result.
3. This sum becomes one pixel in the output feature map.

Example:
Suppose you apply a 3×3 filter to a grayscale image:
Image Patch:     Filter (Edge Detector):
1 2 1            -1 -1 -1
0 1 0             0  0  0
1 0 1             1  1  1

Multiply corresponding values and sum them to get the result at one pixel in the feature map.

#Purpose of Filters:
Each filter is trained to detect a specific feature:
* Edge detection
* Texture patterns
* Color gradients
* Shapes or corners

As you go deeper in the CNN:
* Early layers detect **simple features** (e.g., vertical lines).
* Later layers detect **complex patterns** (e.g., eyes, faces, objects).

#Multiple Filters:
A CNN uses many filters in each convolutional layer:
* Each filter produces one feature map.
* Stacking multiple feature maps allows the CNN to learn multiple aspects of the image.

Summary:
| Term          | Description                                             |
| ------------- | ------------------------------------------------------- |
| Filter/Kernel | Small matrix of weights used for feature detection      |
| Size          | Commonly 3×3, 5×5, etc.                                 |
| Operation     | Slides over input to produce feature maps               |
| Purpose       | Detect patterns like edges, corners, textures           |
| Learnable     | Yes — the values of filters are learned during training |

Filters are the eyes of a CNN — they allow the network to see and understand the structure of the image.


In [None]:
'''5. what is pooling in CNNs, and why is it important?

Pooling in Convolutional Neural Networks (CNNs) is a technique used to **reduce the spatial dimensions (height and width) of feature maps while retaining the most important information.

What Is Pooling?
Pooling is applied after a convolutional layer and works by:
* Dividing the input into small regions (e.g., 2×2 or 3×3).
* Applying an operation to each region to extract a summary statistic.

Types of Pooling:
1. Max Pooling (most common)
   * Takes the maximum value from each patch.
   * Example: From \[2, 4, 1, 3], max pooling returns 4.

2. Average Pooling
   * Takes the average value from each patch.
   * Example: From \[2, 4, 1, 3], average pooling returns 2.5.

Why Is Pooling Important?
| Benefit                            | Explanation                                                                  |
| ---------------------------------- | ---------------------------------------------------------------------------- |
| Reduces Dimensionality             | Shrinks the size of feature maps, reducing computation and memory usage.     |
| Retains Key Features               | Keeps the most important information (e.g., strongest features in a region). |
| Improves Efficiency                | Fewer parameters → faster training and inference.                            |
| Adds Translation Invariance        | Makes the model more robust to small movements/shifts in the image.          |
| Prevents Overfitting               | Acts as a form of regularization by simplifying representations.             |

Example: Max Pooling 2×2
Suppose you apply 2×2 max pooling to a 4×4 feature map:

Original Feature Map:      After Max Pooling (2×2):
[1, 3, 2, 4]                [3, 4]
[5, 6, 1, 2]      →         [8, 7]
[4, 8, 7, 1]
[3, 5, 6, 0]

In Summary:
| Term            | Description                                                           |
| --------------- | --------------------------------------------------------------------- |
| Pooling         | Downsampling technique to reduce spatial dimensions                   |
| Max Pooling     | Keeps the most prominent feature                                      |
| Average Pooling | Averages feature values in each region                                |
| Importance      | Reduces size, computation, overfitting, and boosts feature robustness |

Pooling is a crucial step in CNNs that helps models focus on the most relevant information, while being more efficient and generalizable.


In [None]:
'''6. what are the common types oof pooling used in CNNs?
Pooling is a downsampling technique used in CNNs to reduce the spatial size of feature maps and retain important features. The most common types of pooling are:
1. Max Pooling (Most Common)

* How it works: Selects the maximum value from each patch of the feature map.
* Purpose: Captures the most prominent feature in a region.

Example (2×2 Max Pooling):
Patch:
[2, 4]
[1, 3]
→ Output: 4

Advantages:
* Helps preserve sharp features.
* Reduces computation and overfitting.

2. Average Pooling
* How it works: Calculates the average value of each patch.
* Purpose: Gives a more generalized or smoothed representation.
Example (2×2 Average Pooling):
Patch:
[2, 4]
[1, 3]
→ Output: (2+4+1+3)/4 = 2.5

Used when: You want a more generalized representation of the features.

3. Global Max Pooling
* How it works: Takes the maximum value from the entire feature map.
* Used in: Replacing fully connected layers at the end of CNNs for classification.

4. Global Average Pooling
* How it works: Averages all values in the entire feature map.
* Used in: Modern architectures (e.g., GoogleNet, ResNet) to reduce overfitting and replace dense layers.

Comparison Table:
| Pooling Type           | Operation           | Output              | Common Use                           |
| ---------------------- | ------------------- | ------------------- | ------------------------------------ |
| Max Pooling            | Max value in patch  | Sharp, key features | Most CNNs for feature detection      |
| Average Pooling        | Avg. value in patch | Smoothed values     | Simpler or noise-tolerant tasks      |
| Global Max Pooling     | Max of entire map   | Single max value    | Model simplification, classification |
| Global Average Pooling | Avg. of entire map  | Single avg value    | Reduces overfitting, final layers    |

Summary:
Pooling layers help CNNs become faster, more robust, and less prone to overfitting by reducing the size of feature maps while keeping key information. Among these, Max Pooling is the most widely used due to its ability to preserve prominent features effectively.


In [None]:
'''Q7. How does the backpropagation algorithm work in CNNs?

Backpropagation in Convolutional Neural Networks (CNNs) is the process used to train the network by adjusting the weights (including those of filters/kernels) to minimize the prediction error. It's an extension of the same algorithm used in standard neural networks, but adapted to the convolutional and pooling operations.

Main Goal of Backpropagation:
To compute the gradient of the loss function with respect to each weight in the network and use that gradient to update the weights using optimization techniques like Stochastic Gradient Descent (SGD).
Backpropagation Steps in CNNs:

1. Forward Pass
* The input image is passed through convolutional layers, activation functions (e.g., ReLU), pooling layers, and fully connected layers.
* The final output is compared to the true label using a loss function (e.g., cross-entropy).

2. Compute Loss
* The loss function calculates how far the predicted output is from the actual label.
* Example:
  `Loss = CrossEntropy(predicted, actual)`

3. Backward Pass (Gradient Computation)
This is done layer by layer in reverse:
a. Output Layer to Last Fully Connected Layer
* Calculate the gradient of the loss w\.r.t. the weights and activations.
* Apply chain rule of calculus to propagate the error backward.

b. Fully Connected Layers
* Compute gradients w\.r.t. weights, biases, and input activations.

c. Pooling Layers
* Max Pooling: Only the max value in the window contributes to the gradient.
* Average Pooling: The gradient is equally distributed across all elements in the window.

d. Convolutional Layers
* Compute the gradient of the loss w\.r.t.:
  * Filter weights (kernels): By convolving the input with the backpropagated error.
  * Input feature maps: So errors can be passed further back to earlier layers.

4. Update Weights
Use an optimizer (e.g., SGD, Adam) to update all learnable parameters:
  weight = weight - learning_rate × gradient
  ```
Special Considerations in CNNs:
| Component         | Gradient Flow                                |
| ----------------- | -------------------------------------------- |
| Convolution Layer | Gradient w\.r.t. filter weights & input      |
| Pooling Layer     | Special handling (e.g., mask in max pooling) |
| ReLU Activation   | Passes gradients only if input > 0           |

In Summary:
| Step          | Description                                          |
| ------------- | ---------------------------------------------------- |
| Forward Pass  | Compute output and loss                              |
| Compute Loss  | Measure difference between predicted and true values |
| Backward Pass | Calculate gradients using chain rule                 |
| Weight Update | Adjust weights using gradients and learning rate     |

Backpropagation Enables Learning

Backpropagation allows CNNs to learn from data by gradually adjusting filters and weights so that the network improves its predictions over time.

In [None]:
'''Q8. What is the role of activation functions in CNNs?
Activation functions in CNNs introduce non-linearity into the model, allowing it to learn and represent complex patterns in data.

Main Role:
To transform the output of neurons in each layer so the CNN can:
* Model non-linear relationships
* Learn complex features (not just straight lines)
* Make better predictions on tasks like image classification
Without activation functions, the CNN would behave like a linear model, no matter how many layers it has — and would fail to learn complex tasks.

#How It Works:
Each neuron (after convolution or fully connected layer) outputs a value. The activation function then transforms this value before passing it to the next layer.

Common Activation Functions in CNNs:
| Activation Function | Formula                            | Characteristics                                    |
| ------------------- | ---------------------------------- | -------------------------------------------------- |
| ReLU                | `f(x) = max(0, x)`                 | Most common; fast and effective; sparsity          |
| Sigmoid             | `f(x) = 1 / (1 + e^(-x))`          | Smooth, used in binary classification              |
| Tanh                | `f(x) = (e^x - e^-x)/(e^x + e^-x)` | Outputs between -1 and 1                           |
| Leaky ReLU          | `f(x) = x if x>0 else αx`          | Fixes ReLU’s “dead neuron” problem                 |
| Softmax             | `f(x_i) = e^(x_i)/Σe^(x_j)`        | Used in final layer for multi-class classification |

Why ReLU is the Default in CNNs:
* Simple and fast to compute
* Introduces non-linearity efficiently
* Helps avoid vanishing gradient problem
* Sparse activation (many outputs are 0), improving efficiency

Where Are They Used in CNNs?
* After each convolutional layer
* After fully connected (dense) layers
* In the output layer, depending on the task:
  * Softmax → Multi-class classification
  * Sigmoid → Binary classification

In Summary:
| Role                    | Explanation                                   |
| ----------------------- | --------------------------------------------- |
| Introduce Non-Linearity | Allow CNNs to learn complex patterns          |
| Enable Deep Learning    | Without them, the network behaves linearly    |
| Control Output Range    | E.g., sigmoid between 0–1; tanh between -1–1  |
| Task-Specific Behavior  | Softmax or sigmoid for classification outputs |

Activation functions are crucial to the power of CNNs — they give the model the ability to understand images in a deep, non-linear way.


In [None]:
'''Q9. What is the concept of receptive fields in CNNs?
The receptive field in a Convolutional Neural Network (CNN) refers to the region of the input image that a particular neuron (or feature) in a CNN layer is "looking at" or influenced by.
Simple Definition:
A receptive field is the area of the input image that affects the output of a neuron in a particular CNN layer.

Why It Matters:
* It defines how much context a neuron has to make a decision.
* In early layers, the receptive field is small (e.g., 3×3).
* In deeper layers, neurons have a larger receptive field — they can "see" more of the image and detect higher-level features (e.g., faces, objects).

How It Grows:
The receptive field increases with each layer, depending on:
* Kernel (filter) size
* Stride
* Padding
* Number of layers

For example:

* Layer 1: 3×3 kernel → Receptive field = 3×3
* Layer 2 (on top of Layer 1): another 3×3 → Receptive field = 5×5
* Layer 3: another 3×3 → Receptive field = 7×7
* ...and so on.

This stacking effect allows neurons in deeper layers to capture global context.
Example Analogy:
* Think of the **receptive field like a window** through which a neuron sees the input image.
* A small window = local details (edges, textures)
* A large window = broader context (eyes, faces, shapes)

In Summary:
| Term            | Description                                               |
| --------------- | --------------------------------------------------------- |
| Receptive Field | Region of input image influencing a neuron's output       |
| Small RF        | Early layers → detect edges, colors, textures             |
| Large RF        | Deeper layers → detect shapes, objects, semantics         |
| Importance      | Helps the network learn **context-aware** representations |

Note:Designing CNN architectures with the right receptive field size is key — too small, and the model can’t understand large patterns; too big too soon, and it might miss fine details.

In [None]:
'''Q10. Explain the concept of tensor space in CNNs
In Convolutional Neural Networks (CNNs), tensor space refers to the multi-dimensional structure used to represent data — such as images, feature maps, and weights — throughout the network. Tensors are the core data format that CNNs use to store and process information.

What is a Tensor?
A tensor is a generalization of scalars, vectors, and matrices to higher dimensions:
| Type      | Example                            | Dimensions |
| --------- | ---------------------------------- | ---------- |
| Scalar    | 5                                  | 0D         |
| Vector    | \[5, 10, 15]                       | 1D         |
| Matrix    | \[\[1, 2], \[3, 4]]                | 2D         |
| 3D Tensor | Multiple matrices (like RGB image) | 3D         |
| nD Tensor | Data with more dimensions          | nD         |

Why Tensors in CNNs?
CNNs handle data in multiple dimensions, such as:

* Image height and width
* Color channels (RGB)
* Batch size (number of images)
* Feature maps (output of layers)

All this data is efficiently represented using tensors, allowing deep learning libraries like TensorFlow or PyTorch to handle complex operations easily.
Example: Image as a Tensor
A single RGB image of size 32×32 pixels:
Tensor shape: [Height, Width, Channels] = [32, 32, 3]
A **batch of 64 such images** becomes:
Tensor shape: [Batch, Height, Width, Channels] = [64, 32, 32, 3]

Tensors in CNN Layers:
| Layer Type            | Input Tensor Shape              | Output Tensor Shape           |
| --------------------- | ------------------------------- | ----------------------------- |
| Input Layer           | `[Batch, H, W, Channels]`       | Same                          |
| Convolution Layer     | Applies filters                 | `[Batch, H', W', Filters]`    |
| Pooling Layer         | Downsamples input               | `[Batch, H'', W'', Channels]` |
| Fully Connected Layer | Flattens to `[Batch, Features]` | `[Batch, Classes]`            |

Tensor Space in CNNs = Multidimensional Feature Representation
As data flows through the CNN:
* Tensors change shape based on operations (e.g., convolutions, pooling, flattening).
* Each tensor layer represents the state of the image features at a different level of abstraction.
* The tensor space is the evolving multidimensional space in which all learning and transformations happen.

In Summary:
| Concept      | Description                                             |
| ------------ | ------------------------------------------------------- |
| Tensor       | A multi-dimensional array used to store data            |
| Tensor Space | All possible values and shapes tensors can take in CNNs |
| Use in CNNs  | Represent images, filters, feature maps, weights, etc.  |
| Importance   | Enables efficient deep learning operations at scale     |
Understanding tensor space is essential for building, debugging, and optimizing CNN models — it’s the language of data in deep learning.


In [None]:
'''Q11. What is LeNet-5, and how does it contribute to the development of CNNs?

LeNet-5 is one of the earliest and most influential Convolutional Neural Networks (CNNs), developed by Yann LeCun and his team in 1998 for handwritten digit recognition, especially for the MNIST dataset.

What is LeNet-5?
LeNet-5 is a 7-layer CNN architecture (excluding input) that introduced many of the core ideas still used in modern CNNs today. It was designed to classify digits (0–9) from grayscale images of size 32×32 pixels.

LeNet-5 Architecture Overview:
| Layer  | Type                      | Output Shape | Description                     |
| ------ | ------------------------- | ------------ | ------------------------------- |
| Input  | Image                     | 32×32×1      | Grayscale input image           |
| C1     | Convolutional             | 28×28×6      | 6 filters of size 5×5           |
| S2     | Subsampling (Avg Pooling) | 14×14×6      | Downsampling                    |
| C3     | Convolutional             | 10×10×16     | Deeper feature extraction       |
| S4     | Subsampling               | 5×5×16       | Another pooling layer           |
| C5     | Fully Connected Conv      | 1×1×120      | Connects to all 5×5×16 features |
| F6     | Fully Connected           | 84           | Classic dense layer             |
| Output | Softmax                   | 10           | Digit classification (0–9)      |

Key Innovations in LeNet-5:

1. Introduced Convolutional Layers
   – Filters learned edge-like patterns automatically from data.

2. Used Pooling/Subsampling Layers
   – Reduced dimensionality while retaining important features.

3. Stacked Layers for Hierarchical Feature Learning
   – Low-level features (edges) → mid-level (shapes) → high-level (digits).

4. Used Backpropagation
   – For training weights across all layers, including convolutional.

5. Combined CNN with Fully Connected Layers
   – For robust classification.

Contribution to CNN Development:
| Contribution            | Impact                                                                   |
| ------------------------| ------------------------------------------------------------------------ |
|  Early CNN Model        | Paved the way for modern deep learning models                            |
|  Digit Recognition      | Achieved state-of-the-art results on MNIST                               |
|  Architectural Template | Inspired networks like AlexNet, VGG, ResNet                              |
|  Modular Design         | Introduced clear blocks: Conv → Pool → FC → Softmax                      |
|  Real-World Use Case    | Used in early OCR (Optical Character Recognition) systems (e.g., checks) |

#Visual Summary of LeNet-5:
Input (32×32) → [C1: Conv] → [S2: Pool] → [C3: Conv] → [S4: Pool]
→ [C5: FC Conv] → [F6: FC] → [Output: Softmax]

In Summary:
| Feature      | Details                                 |
| ------------ | --------------------------------------- |
| Model Name   | LeNet-5                                 |
| Inventor     | Yann LeCun (1998)                       |
| Purpose      | Digit recognition (MNIST dataset)       |
| Key Layers   | Conv, Pooling, Fully Connected, Softmax |
| Contribution | Foundation of modern CNNs               |

LeNet-5 was a breakthrough in computer vision and is still taught today as the starting point for understanding CNNs.


In [None]:
'''Q12. What is AlexNet, and why was it a breakthrough in deep learning?
What is AlexNet?
AlexNet is a deep Convolutional Neural Network (CNN) architecture developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, which won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012 by a huge margin.
It was designed to classify images into 1,000 categories using the ImageNet dataset, which contains over 1.2 million images.

Key Features of AlexNet:
| Component        | Description                                         |
| ---------------- | --------------------------------------------------- |
| Input            | 227×227×3 (RGB image)                               |
| Conv Layers      | 5 convolutional layers to extract image features    |
| ReLU Activations | Introduced ReLU (faster training than sigmoid/tanh) |
| Max Pooling      | Used for downsampling                               |
| Fully Connected  | 3 dense layers to classify image features           |
| Dropout          | Regularization to prevent overfitting               |
| GPU Training     | Trained using 2 GPUs in parallel (NVIDIA GTX 580)   |
| Output           | Softmax over 1000 classes                           |

Architecture Overview (Simplified):
```
Input → Conv1 → ReLU → MaxPool → Conv2 → ReLU → MaxPool →
Conv3 → ReLU → Conv4 → ReLU → Conv5 → ReLU → MaxPool →
FC1 → Dropout → FC2 → Dropout → FC3 → Softmax
``
Why Was AlexNet a Breakthrough?
| Reason                      | Description                                                                 |
| ----------------------------| --------------------------------------------------------------------------- |
| ImageNet Victory            | Outperformed traditional methods by 10.8% top-5 error rate                  |
| ReLU Activation             | Allowed faster training and better gradient flow than sigmoid/tanh          |
| GPU Training                | Trained on 2 GPUs, enabling deeper architecture to be practically used      |
| Deep Architecture           | 8 layers with 60 million parameters                                         |
| Dropout for Regularization  | Helped prevent overfitting in dense layers                                  |
| Popularized Deep Learning   | Sparked global interest in CNNs and deep learning research and applications |

Impact of AlexNet on Deep Learning:
* Triggered the deep learning revolution in computer vision.
* Led to development of more powerful architectures like VGG, GoogLeNet, ResNet.
* Widely adopted in industry and research (e.g., facial recognition, medical imaging, autonomous vehicles).

In Summary:
| Feature             | AlexNet Impact                           |
| ------------------- | ---------------------------------------- |
| Inventors           | Krizhevsky, Sutskever, Hinton (2012)     |
| Dataset             | ImageNet (1.2M images, 1000 classes)     |
| Architecture        | 8 layers (Conv + FC + Softmax)           |
| Breakthrough Reason | Deep architecture + GPU + ReLU + Dropout |
| Legacy              | Sparked modern deep learning movement    |

AlexNet proved that deep CNNs trained with GPUs could dramatically outperform traditional computer vision methods — and laid the foundation for today's AI-powered vision systems.


In [None]:
'''Q13. What is VGGNet, and how does it differ from AlexNet?

What is VGGNet?
VGGNet is a deep Convolutional Neural Network developed by the Visual Geometry Group (VGG) at the University of Oxford, led by Karen Simonyan and Andrew Zisserman, and introduced in 2014.
It gained fame for achieving excellent performance on the ImageNet Challenge (ILSVRC 2014) and is known for its simplicity and depth.

Key Characteristics of VGGNet:
| Feature                | Details                                                        |
| ---------------------- | -------------------------------------------------------------- |
| Versions               | VGG-11, VGG-13, VGG-16, VGG-19 (based on number of layers)     |
| Input Image Size       | 224×224×3 (RGB)                                                |
| Convolutional Layers   | Uses only 3×3 filters with stride 1                            |
| Pooling                | 2×2 max pooling with stride 2                                  |
| Fully Connected Layers | 2 or 3 dense layers at the end                                 |
| Activation Function    | ReLU                                                           |
| Parameters             | VGG-16 has \~138 million parameters                            |

Architecture Overview of VGG-16:
```
Input → [Conv3×3 ×2] → MaxPool → [Conv3×3 ×2] → MaxPool →
[Conv3×3 ×3] → MaxPool → [Conv3×3 ×3] → MaxPool →
[Conv3×3 ×3] → MaxPool → FC → FC → Softmax
```
Difference Between VGGNet and AlexNet:

| Feature            | AlexNet (2012)                  | VGGNet (2014)                 |
| ------------------ | ------------------------------- | ----------------------------- |
| Year               | 2012                            | 2014                          |
| Depth              | 8 layers                        | 16 or 19 layers               |
| Filter Size        | Varies (11×11, 5×5, 3×3)        | Consistent 3×3 filters        |
| Stride & Padding   | Stride = 4 (early layers)       | Stride = 1, Padding = same    |
| Pooling            | Max Pooling (overlapping)       | Max Pooling (non-overlapping) |
| Architecture Style | Mixed layer types, less regular | Very uniform and simple       |
| Parameters         | \~60 million                    | \~138 million (VGG-16)        |
| Training GPUs      | 2 GPUs                          | Single GPU (with more memory) |
| Performance        | Top-5 error: \~15.3%            | Top-5 error: \~7.3% (VGG-16)  |

Why VGGNet Was Important:
* Proved that deeper networks with small filters improve performance.
* Its modular design made it easy to generalize and extend.
* Inspired future architectures like ResNet, Inception, and MobileNet.

Summary:
| Question                | Answer                                               |
| ----------------------- | ---------------------------------------------------- |
| What is VGGNet?         | A deep CNN with many small (3×3) filters             |
| Invented By             | Oxford VGG Group (2014)                              |
| Difference from AlexNet | Deeper, simpler, smaller filters, better performance |
| Legacy                  | Foundation for modern CNN designs                    |

VGGNet showed that depth + simplicity can significantly improve CNN performance — and became a benchmark model in computer vision research and applications.


In [None]:
'''Q14. What is GoogLeNet, and what is its main innovation?
What is GoogLeNet?
GoogLeNet is a deep Convolutional Neural Network architecture developed by Google researchers (Szegedy et al.) and introduced in 2014. It won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC 2014)** with a top-5 error rate of 6.67%, outperforming models like VGGNet.

It is also known as Inception-v1, the first in a series of "Inception" networks.
Main Innovation: The Inception Module
The core idea behind GoogLeNet is the Inception module, which allows the network to analyze visual information at multiple scales simultaneously.

Inception Module Includes:
* 1×1 convolutions (for dimensionality reduction and computational efficiency)
* 3×3 convolutions
* 5×5 convolutions
* 3×3 max pooling

All of these run in parallel and their outputs are concatenated along the depth dimension.

Architecture Highlights:
| Feature                | Description                                     |
| ---------------------- | ----------------------------------------------- |
| Depth                  | 22 layers (deeper than VGG and AlexNet)         |
| Parameter Efficiency   | \~5 million parameters (compared to VGG’s 138M) |
| Inception Module       | Processes input at multiple scales              |
| Auxiliary Classifiers  | Added during training to improve gradient flow  |
| Global Average Pooling | Replaces final fully connected layers           |

How GoogLeNet Differs from Earlier Models:
| Feature        | GoogLeNet                             | Earlier Models (AlexNet, VGG)      |
| -------------- | ------------------------------------- | ---------------------------------- |
| Depth          | 22 layers                             | 8–19 layers                        |
| Key Innovation | Inception module                      | Stacked conv + pooling             |
| FC Layers      | Removed (used Global Average Pooling) | Multiple dense FC layers           |
| Parameters     | \~5 million                           | VGG: \~138 million                 |
| Efficiency     | High accuracy with fewer parameters   | High memory and computational cost |

Why It Was a Breakthrough:
* Multi-scale feature extraction in a single module
* Reduced computation with 1×1 convolutions
* No fully connected layers → smaller model, faster training
* Inspired Inception-v2, v3, v4, and Inception-ResNet

#In Summary:
| Item            | Detail                                        |
| --------------- | --------------------------------------------- |
| Model Name      | GoogLeNet (Inception-v1)                      |
| Year Introduced | 2014                                          |
| Innovation      | Inception module (multi-scale parallel paths) |
| Layers          | 22 deep                                       |
| Parameter Count | \~5 million                                   |
| Impact          | High accuracy + efficiency; scalable design   |

GoogLeNet marked a major step forward in creating deep, efficient, and scalable neural networks, paving the way for many advanced architectures in modern computer vision.


In [None]:
'''Q15. What is ResNet, and what problem does it solve?
What is ResNet?
ResNet (Residual Network) is a deep convolutional neural network architecture introduced by Kaiming He et al.from Microsoft Research in 2015, which won the ImageNet Challenge (ILSVRC 2015) with a top-5 error of 3.57%—surpassing human-level performance on classification.
The Problem ResNet Solves:
As neural networks grow deeper, they should, in theory, perform better. But in practice, very deep networks often suffer from:
| Problem                             | Description                                                              |
| ----------------------------------- | ------------------------------------------------------------------------ |
| Degradation Problem                 | Accuracy gets worse as layers are added (not due to overfitting).        |
| Vanishing/Exploding Gradients       | Gradients become too small or large during backpropagation in deep nets. |

Despite having enough data and computing power, training deep models was ineffective beyond a certain depth due to these issues.

ResNet's Key Innovation: Residual Learning
ResNet solves the degradation problem by introducing **skip connections (also called residual connections).

Residual Block:
Instead of learning a direct mapping `H(x)`, ResNet learns a residual function:

$$
H(x) = F(x) + x
$$

Where:
* `x` = input
* `F(x)` = output of a few stacked layers
* `x` is added back to `F(x)` via a shortcut connection

This helps gradients flow more easily during training and allows the network to learn identity mappings, effectively bypassing unnecessary layers if needed.
ResNet Architecture Overview:
| Model Variant | Number of Layers |
| ------------- | ---------------- |
| ResNet-18     | 18 layers        |
| ResNet-34     | 34 layers        |
| ResNet-50     | 50 layers        |
| ResNet-101    | 101 layers       |
| ResNet-152    | 152 layers       |

All versions use residual blocks as the core building unit.

Comparison with Earlier Models:
| Feature        | ResNet                        | VGG / AlexNet                  |
| -------------- | ----------------------------- | ------------------------------ |
| Depth          | Up to 152+ layers             | 8–19 layers                    |
| Key Innovation | Residual (skip) connections   | Stacked convolution layers     |
| Gradient Flow  | Improved                      | Weak in very deep networks     |
| Training Ease  | Easier for deep networks      | Difficult beyond certain depth |
| Performance    | Top-1 error \~20% (ResNet-50) | Top-1 \~25–30% (VGG/AlexNet)   |

In Summary:
| Feature        | Description                                                                        |
| -------------- | ---------------------------------------------------------------------------------- |
| Full Name      | ResNet (Residual Network)                                                          |
| Introduced By  | Kaiming He et al. (Microsoft Research) in 2015                                     |
| Solved Problem | Degradation in deep networks                                                       |
| Key Idea       | Skip/residual connections                                                          |
| Depth Achieved | 18 to 152+ layers                                                                  |
| Impact         | Set the foundation for ultra-deep architectures (e.g., transformers, EfficientNet) |


In [None]:
'''Q16. What is DenseNet, and how does it differ from ResNet?
What is DenseNet?
DenseNet (Dense Convolutional Network) is a deep learning architecture introduced by Gao Huang et al. in 2017, designed to improve information flow and feature reuse in very deep neural networks.
DenseNet connects each layer to every other layer in a feed-forward manner, unlike traditional CNNs where layers are connected sequentially.

Key Concept: Dense Connectivity
In DenseNet, every layer receives input from all previous layers and passes its own output to all subsequent layers.
Mathematically:
$$
x_l = H_l([x_0, x_1, x_2, ..., x_{l-1}])
$$

Where:
* $x_l$ = output of the *l*-th layer
* $H_l$ = composite function (e.g., BatchNorm → ReLU → Conv)
* $[x_0, x_1, ..., x_{l-1}]$ = concatenation of feature maps from all previous layers

This encourages feature reuse, efficient gradient flow, and compact models.
DenseNet Architecture Features:
| Component              | Description                                         |
| ---------------------- | --------------------------------------------------- |
| Dense Blocks           | Groups of layers with dense connections             |
| Transition Layers      | 1×1 conv + pooling between dense blocks             |
| Composite Function     | BN → ReLU → Conv                                    |
| Growth Rate (k)        | Number of feature maps each layer adds (e.g., k=32) |
| Depth Options          | DenseNet-121, 169, 201, 264 (number = total layers) |

Difference Between DenseNet and ResNet:

| Feature         | ResNet                                | DenseNet                                         |
| --------------- | ------------------------------------- | ------------------------------------------------ |
| Year Introduced | 2015                                  | 2017                                             |
| Key Idea        | Residual/Skip connections: $x + F(x)$ | Dense connections: concatenate all previous maps |
| Connection Type | Additive skip connections             | Concatenation of feature maps                    |
| Feature Reuse   | Limited (only previous layer reused)  | High (all previous layers reused)                |
| Gradient Flow   | Improved                              | Even better due to direct connections            |
| Model Size      | Large                                 | More compact due to feature reuse                |
| Redundancy      | May learn redundant features          | Less redundancy, more efficient                  |

#Performance & Efficiency:
* DenseNet typically achieves better accuracy than ResNet with fewer parameters.
* It reduces overfitting by reusing features.
* However, due to concatenation, it may require more memory and slightly more complex implementation.

In Summary:
| Aspect             | DenseNet                                                   |
| ------------------ | ---------------------------------------------------------- |
| Full Name          | Dense Convolutional Network                                |
| Proposed By        | Gao Huang et al., 2017                                     |
| Key Feature        | Dense connectivity (each layer connected to all before it) |
| Advantage          | Efficient feature reuse, compact models                    |
| Compared to ResNet | Better gradient flow, fewer parameters, more connections   |

SyntaxError: incomplete input (ipython-input-1-2658741797.py, line 1)

In [None]:
'''Q17. What are the main steps involved in training a Convolutional Neural Network (CNN) from scratch?

Training a CNN from scratch involves a structured pipeline to ensure the model learns effectively from raw data. Here's a step-by-step breakdown:
Step 1: Prepare the Dataset
* Collect Data: Gather a labeled dataset (e.g., images with class labels).
* Split Dataset: Divide into:
  * Training set
  * Validation set
  * Test set
* Preprocessing:
  * Resize/reshape images
  * Normalize pixel values (e.g., scale to \[0, 1] or \[-1, 1])
  * Data augmentation (e.g., flip, rotate, zoom) to increase dataset variability

Step 2: Design the CNN Architecture
Build the CNN by stacking the following layers:
* Input Layer: Accepts images in the shape `(height, width, channels)`
* Convolutional Layers: Extract spatial features using filters
* Activation Functions: Typically ReLU (to introduce non-linearity)
* Pooling Layers: Downsample feature maps (e.g., Max Pooling)
* Fully Connected (Dense) Layers: For classification
* Output Layer:
  * Softmax for multi-class
  * Sigmoid for binary classification
Step 3: Configure the Model
* Loss Function:
  * Cross-entropy (for classification)
  * MSE (for regression tasks)
* Optimizer:
  * SGD, Adam, RMSProp, etc.
* Learning Rate: Small value (e.g., 0.001) to control step size during updates
* Metrics: Accuracy, precision, recall, etc.

Step 4: Train the Model
Use forward and backward passes in epochs:
* Forward Propagation: Pass input through the network and compute predictions
* Loss Computation: Calculate error between prediction and actual label
* Backpropagation: Compute gradients using the loss
* Update Weights: Optimizer adjusts weights using gradients

Repeat over multiple epochs, using mini-batches (batch size = 16, 32, etc.).

Step 5: Validate During Training
* Use the validation set to monitor performance after each epoch.
* Track metrics like validation loss and accuracy.
* Apply techniques like early stopping or learning rate decay if needed.

Step 6: Evaluate the Model
* Test the trained model on unseen test data.
* Report final performance metrics (accuracy, F1-score, confusion matrix, etc.)

Step 7: Save & Deploy the Model
* Save the trained model (`.h5`, `.pt`, etc.)
* Deploy to production or embed in applications (web, mobile, etc.)

Optional Enhancements
* Regularization (Dropout, L2)
* Batch Normalization
* Transfer learning if dataset is small

Summary Table:
| Step             | Description                          |
| ---------------- | ------------------------------------ |
| 1. Prepare Data  | Collect, clean, augment, normalize   |
| 2. Build CNN     | Stack Conv → Pool → Dense → Output   |
| 3. Configure     | Set loss, optimizer, learning rate   |
| 4. Train         | Forward pass, loss, backprop, update |
| 5. Validate      | Tune performance on validation set   |
| 6. Evaluate      | Test on new data                     |
| 7. Save & Deploy | Use model in real-world scenarios    |

PRACTICAL

In [None]:
'''Q.1: Implementing a basic convolution using a 5x5 image (matrix) and a filter (kernel) step-by-step without any deep learning library, just using basic Python and NumPy.
Problem Statement:
Perform 2D convolution on a **5x5 grayscale image matrix** using a **3x3 filter (kernel)**. No padding, stride = 1.

Step-by-Step:
1. Define the image (5x5 matrix):
```python
import numpy as np

image = np.array([
    [10, 20, 30, 40, 50],
    [60, 70, 80, 90, 100],
    [110, 120, 130, 140, 150],
    [160, 170, 180, 190, 200],
    [210, 220, 230, 240, 250]
])
```
2. Define the filter (3x3 kernel):
Example: edge detection filter
```python
kernel = np.array([
    [-1, -1, -1],
    [-1,  8, -1],
    [-1, -1, -1]
])
```
3. Perform convolution (no padding, stride = 1):
Output size will be (5 - 3 + 1) × (5 - 3 + 1) = 3 × 3
```python
output = np.zeros((3, 3))

for i in range(3):  # Rows
    for j in range(3):  # Columns
        region = image[i:i+3, j:j+3]  # Extract 3x3 region
        conv_value = np.sum(region * kernel)  # Element-wise multiply and sum
        output[i, j] = conv_value
```
4. Print the output matrix:
```python
print("Output after Convolution:\n", output)
```
Complete Output Example:
Given the above image and kernel, you'll get a `3x3` matrix like this:

```
Output after Convolution:
 [[  0.   0.   0.]
 [  0.   0.   0.]
 [  0.   0.   0.]]
```

(Note: with uniform patterns and an edge filter like this, the output might be zeros or edges if present.)
Summary:
* Used a 5×5 input image
* Applied a 3×3 kernel
* Performed convolution with stride 1, no padding
* Produced a 3×3 output matrix

In [None]:
'''Q2: Max Pooling on a 4x4 feature map using a 2x2 window, stride = 2, and no padding — which is standard for pooling operations.
What is Max Pooling?
Max pooling selects the maximum value from a sub-region (window) of the feature map.
For a 4x4 feature map with a 2x2 window and stride = 2, the output will be a 2x2 matrix.

Step-by-step Implementation in Python:
1. Define the 4x4 Feature Map
```python
import numpy as np
feature_map = np.array([
    [1, 3, 2, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12],
    [13, 14, 15, 16]
])
```
2. Define Max Pooling Parameters
* Pool size: `2x2`
* Stride: `2`
* Output size: `2x2` (because: (4-2)/2 + 1 = 2)
```python
pool_size = 2
stride = 2
output = np.zeros((2, 2))
```
3. Apply Max Pooling
```python
for i in range(0, feature_map.shape[0], stride):
    for j in range(0, feature_map.shape[1], stride):
        region = feature_map[i:i+pool_size, j:j+pool_size]
        output[i//stride, j//stride] = np.max(region)
```
4. Print the Output
```python
print("Output after Max Pooling:\n", output)
```

Expected Output:
Given the `feature_map`, the regions and their max values are:
| Region | Values          | Max |
| ------ | --------------- | --- |
| (0,0)  | \[1 3; 5 6]     | 6   |
| (0,1)  | \[2 4; 7 8]     | 8   |
| (1,0)  | \[9 10; 13 14]  | 14  |
| (1,1)  | \[11 12; 15 16] | 16  |
So, final output:
```
[[ 6.  8.]
 [14. 16.]]
```
Summary:
* Input: 4x4 feature map
* Pooling Window: 2x2
* Stride: 2
* Output: 2x2 matrix with maximum values from each 2x2 region

In [None]:
'''Q3. Implement the ReLU Activation Function on a Feature Map

What is ReLU?
ReLU (Rectified Linear Unit) is an activation function defined as:
$$
\text{ReLU}(x) = \max(0, x)
$$

It replaces negative values with 0, and keeps positive values unchanged.

#Step-by-Step Implementation in Python
1. Define a sample feature map (with positive and negative values):
```python
import numpy as np

feature_map = np.array([
    [2, -3, 0],
    [-1, 5, -6],
    [4, -2, 7]
])
```
2. Apply ReLU:
There are two ways:
#Method 1: Using NumPy’s `np.maximum()`
```python
relu_output = np.maximum(0, feature_map)
```
#Method 2: Define a ReLU function
```python
def relu(x):
    return np.maximum(0, x)

relu_output = relu(feature_map)
```
3. Print the Output
```python
print("Output after ReLU:\n", relu_output)
```
# Expected Output
Given the input:

```
[[  2  -3   0]
 [ -1   5  -6]
 [  4  -2   7]]
```

ReLU output will be:

```
[[2 0 0]
 [0 5 0]
 [4 0 7]]
``

In [None]:
'''Q4: Creating a Simple CNN model with:

* One Convolutional Layer
* One Fully Connected (Dense) Layer
* Using **Random Data

We’ll use TensorFlow (Keras) to implement this. You can run this in a Python environment like Jupyter Notebook, Google Colab, or any Python IDE with TensorFlow installed.

Step-by-Step Code: Simple CNN with Random Data
```python
import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np
```
Step 1: Generate Random Input Data
Assume we have 10 images, each of size 28x28 with 1 channel (like grayscale images):
```python
# Random input: 10 samples of 28x28 grayscale images
X = np.random.rand(10, 28, 28, 1).astype(np.float32)

# Random labels for 10 samples and 3 classes
y = tf.keras.utils.to_categorical(np.random.randint(0, 3, size=(10,)), num_classes=3)
```
Step 2: Define a Simple CNN Model
```python
model = models.Sequential([
    layers.Conv2D(16, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(32, activation='relu'),
    layers.Dense(3, activation='softmax')  # 3-class classification
])
```
Step 3: Compile the Model
```python
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
```
Step 4: Train on Random Data
```python
model.fit(X, y, epochs=3)
```

In [None]:
'''Q5: Generate a synthetic dataset using random noise and train a simple CNN on it.

Objective:
* Create a dataset with random noise as images
* Assign random class labels
* Train a simple CNN model to demonstrate the end-to-end training process (though the accuracy will be low due to randomness)

Why Do This?
This exercise is useful to:
* Understand CNN architecture setup
* Test model training pipeline
* Practice working with synthetic data

Step-by-Step Code (Using TensorFlow/Keras)
```python
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models
```
Step 1: Generate Random Noise Data
Let’s say:
* 100 samples
* Each image is 28x28 grayscale
* 3 random classes

```python
# Generate random noise images
X = np.random.rand(100, 28, 28, 1).astype(np.float32)

# Generate random labels (0, 1, or 2)
y = np.random.randint(0, 3, size=(100,))
y = tf.keras.utils.to_categorical(y, num_classes=3)
```
Step 2: Create a Simple CNN Model
```python
model = models.Sequential([
    layers.Conv2D(8, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(16, activation='relu'),
    layers.Dense(3, activation='softmax')  # 3-class output
])
```
Step 3: Compile the Model
```python
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
```
Step 4: Train the Model
```python
model.fit(X, y, epochs=5, batch_size=10)
```


In [None]:
'''Q6: Create a simple CNN using Keras with:
* One Convolutional Layer
* One Max Pooling Layer
This is a minimal CNN architecture, useful for learning purposes.

Step-by-Step Code: Simple CNN (Keras)
```python
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models
```
Step 1: Generate Dummy Input Data
Let’s simulate grayscale images of size 28x28**, 100 samples, and 3 classes:

```python
# 100 grayscale images (28x28)
X = np.random.rand(100, 28, 28, 1).astype(np.float32)

# Random labels (3 classes)
y = np.random.randint(0, 3, size=(100,))
y = tf.keras.utils.to_categorical(y, num_classes=3)
```
Step 2: Define the CNN Model

```python
model = models.Sequential([
    layers.Conv2D(16, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D(pool_size=(2, 2)),
    layers.Flatten(),
    layers.Dense(32, activation='relu'),
    layers.Dense(3, activation='softmax')  # 3-class output
])
```
Step 3: Compile the Model

```python
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
```
Step 4: Train the Model

```python
model.fit(X, y, epochs=5, batch_size=10)
```
Optional: View the Model Structure

```python
model.summary()
```
Summary

* **Conv2D layer** extracts features using filters
* **MaxPooling layer** reduces spatial dimensions
* **Flatten + Dense** layers are used for classification
* Trained on **random data** just to demonstrate architecture

In [None]:
'''Q7: Add a Fully Connected (Dense) Layer after Convolution and Max-Pooling layers in a CNN using Keras.

* One `Conv2D` layer
* One `MaxPooling2D` layer
* One `Dense` (fully connected) layer after flattening

e.g Code in Keras (TensorFlow)
```python
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models
```
Step 1: Generate Random Data (100 grayscale images, 28×28, 3 classes)

```python
# Input images: 100 samples, 28x28, 1 channel
X = np.random.rand(100, 28, 28, 1).astype(np.float32)

# Random labels (0, 1, 2) → one-hot encoded
y = tf.keras.utils.to_categorical(np.random.randint(0, 3, size=(100,)), num_classes=3)
```
Step 2: Build CNN Model with Fully Connected Layer

```python
model = models.Sequential([
    layers.Conv2D(16, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),

    layers.Flatten(),  # Flatten before dense layer

    layers.Dense(64, activation='relu'),  # Fully connected layer
    layers.Dense(3, activation='softmax')  # Output layer for 3 classes
])
```
Step 3: Compile the Model

```python
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
```
Step 4: Train the Model

```python
model.fit(X, y, epochs=5, batch_size=10)
```
Summary of Architecture:

| Layer Type        | Details                  |
| ----------------- | ------------------------ |
| Conv2D            | 16 filters, 3x3, ReLU    |
| MaxPooling2D      | 2x2 pool                 |
| Flatten           | Converts 2D to 1D vector |
| Dense (Hidden FC) | 64 units, ReLU           |
| Dense (Output FC) | 3 units, Softmax         |

In [None]:
'''Q8: Add Batch Normalization to a simple CNN model using TensorFlow Keras.

What is Batch Normalization?
Batch Normalization:
* Normalizes the outputs of a layer
* Helps **stabilize** and **speed up** training
* Often added **after a Conv or Dense layer, before activation**

Step-by-Step Code: Simple CNN with Batch Normalization
```python
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models
```
Step 1: Generate Dummy Input Data
```python
# 100 grayscale images (28x28)
X = np.random.rand(100, 28, 28, 1).astype(np.float32)

# Random labels for 3 classes
y = tf.keras.utils.to_categorical(np.random.randint(0, 3, size=(100,)), num_classes=3)
```
Step 2: Define CNN with Batch Normalization
```python
model = models.Sequential([
    layers.Conv2D(32, (3, 3), input_shape=(28, 28, 1), padding='same'),
    layers.BatchNormalization(),             # Batch Normalization
    layers.Activation('relu'),               # Activation after normalization
    layers.MaxPooling2D(pool_size=(2, 2)),

    layers.Flatten(),
    layers.Dense(64),
    layers.BatchNormalization(),             # BN before activation in Dense too
    layers.Activation('relu'),
    layers.Dense(3, activation='softmax')    # Output layer
])
```
Step 3: Compile the Model
```python
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
```
Step 4: Train the Model
```python
model.fit(X, y, epochs=5, batch_size=10)
```

In [None]:
'''Q9: Add Dropout Regularization to a simple CNN model using TensorFlow Keras.
What is Dropout?
* Dropout is a regularization technique that randomly sets a fraction of input units to 0 during training.
* It helps prevent overfitting.
Step-by-Step Code: Simple CNN with Dropout
```python
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models
```
Step 1: Generate Dummy Input Data
```python
# 100 grayscale images of 28x28
X = np.random.rand(100, 28, 28, 1).astype(np.float32)

# 3 random class labels (one-hot encoded)
y = tf.keras.utils.to_categorical(np.random.randint(0, 3, size=(100,)), num_classes=3)
```
Step 2: Define CNN with Dropout Layers
```python
model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D(pool_size=(2, 2)),
    layers.Dropout(0.25),                        # Dropout after pooling

    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.5),                         # Dropout before output layer
    layers.Dense(3, activation='softmax')        # Output layer for 3 classes
])
```
Step 3: Compile the Model
```python
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
```
Step 4: Train the Model
```python
model.fit(X, y, epochs=5, batch_size=10)
```
Summary
| Layer Type    | Description              |
| ------------- | ------------------------ |
| Conv2D        | Feature extraction       |
| MaxPooling2D  | Downsampling             |
| Dropout(0.25) | Regularize conv layer    |
| Flatten       | Prepare for dense layers |
| Dense (64)    | Fully connected hidden   |
| Dropout(0.5)  | Regularize dense layer   |
| Dense (3)     | Output softmax layer     |

In [None]:
'''Q10.write a code to print the architecture of the VGG16 model in keras?

Step-by-Step Code to Load and Print VGG16 Model Architecture
```python
from tensorflow.keras.applications import VGG16
# Load the VGG16 model (pretrained on ImageNet)
model = VGG16(weights='imagenet')
# Print the architecture summary
model.summary()
```
Output:
* This will print:

  * Each layer's **type** (Conv2D, MaxPooling2D, Dense, etc.)
  * **Output shape** at each stage
  * **Number of parameters** in each layer
* The model ends with **fully connected layers** for classification.

Optional: Load Without Top (For Feature Extraction)
```python
model = VGG16(weights='imagenet', include_top=False)
model.summary()
```

In [None]:
'''Q11: Plot the accuracy and loss graphs after training a CNN model using Matplotlib and Keras' `model.fit()` history.

Step-by-Step Code to Plot Accuracy and Loss
Step 1: Train a Simple CNN Model (with history)
```python
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models
import matplotlib.pyplot as plt

# Generate dummy data
X = np.random.rand(100, 28, 28, 1).astype(np.float32)
y = tf.keras.utils.to_categorical(np.random.randint(0, 3, size=(100,)), num_classes=3)

# Define a simple CNN model
model = models.Sequential([
    layers.Conv2D(16, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(32, activation='relu'),
    layers.Dense(3, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train and store the training history
history = model.fit(X, y, epochs=10, batch_size=10, validation_split=0.2)
```
Step 2: Plot Accuracy and Loss
```python
# Plot training & validation accuracy
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()

# Plot training & validation loss
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.tight_layout()
plt.show()
```
Output:
* Left graph: Training vs. Validation Accuracy over epochs
* Right graph: Training vs. Validation Loss over epochs

In [None]:
'''Q12. Write a code to print the architecture of the ResNet50 model in keras?

Step-by-Step Code to Print ResNet50 Architecture
```python
from tensorflow.keras.applications import ResNet50

# Load ResNet50 model with pre-trained ImageNet weights
model = ResNet50(weights='imagenet')

# Print model architecture
model.summary()
```
Optional: Load ResNet50 Without Top Layer (for Feature Extraction)
```python
model = ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
model.summary()
```
* `include_top=False` removes the final fully connected classification layers.
* Useful when you want to **use ResNet50 as a feature extractor** in custom models.

Note:
If you want a **visual diagram** of the model:
```python
from tensorflow.keras.utils import plot_model
plot_model(model, show_shapes=True, to_file='resnet50_architecture.png')


In [None]:
'''Q13.  write a code to train a basic CNN model and print the training loss and accuracy after each epoch?

Step-by-Step Code: Train CNN & Print Loss/Accuracy per Epoch
```python
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models

# Step 1: Create dummy data
X = np.random.rand(100, 28, 28, 1).astype(np.float32)  # 100 grayscale images (28x28)
y = tf.keras.utils.to_categorical(np.random.randint(0, 3, size=(100,)), num_classes=3)  # 3 classes

# Step 2: Define a basic CNN model
model = models.Sequential([
    layers.Conv2D(16, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(32, activation='relu'),
    layers.Dense(3, activation='softmax')  # Output layer for 3 classes
])

# Step 3: Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Step 4: Train the model and store history
history = model.fit(X, y, epochs=5, batch_size=10, verbose=1)  # verbose=1 shows loss/accuracy each epoch

Output Example:
With `verbose=1`, training logs will automatically print something like:

```
Epoch 1/5
10/10 [==============================] - 1s 20ms/step - loss: 1.1247 - accuracy: 0.2900
Epoch 2/5
10/10 [==============================] - 0s 6ms/step - loss: 1.0861 - accuracy: 0.3600
...
Optional: Manual Print (if using custom training loop)
If you want to print manually (for more control):

```python
for epoch in range(5):
    history = model.fit(X, y, epochs=1, batch_size=10, verbose=0)  # Silent training
    loss = history.history['loss'][0]
    acc = history.history['accuracy'][0]
    print(f"Epoch {epoch+1} - Loss: {loss:.4f}, Accuracy: {acc:.4f}")
```