In [None]:
Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its
application.

Min-Max scaling is a data preprocessing technique used to scale the features of a dataset to a specific range, typically [0, 1]. It is accomplished by subtracting the minimum value from each feature and then dividing by the range of that feature. This method ensures that all features are uniformly scaled to the same range.

Here's how you can implement Min-Max scaling in Python:

import numpy as np

def min_max_scaling(data):
    min_val = np.min(data, axis=0)
    max_val = np.max(data, axis=0)
    scaled_data = (data - min_val) / (max_val - min_val)
    return scaled_data

# Example usage:
# Original dataset
data = np.array([[25], [30], [40], [20], [50]])

# Min-Max scaling
scaled_data = min_max_scaling(data)

print("Original data:")
print(data)
print("\nMin-Max scaled data:")
print(scaled_data)

Output:
Original data:
[[25]
 [30]
 [40]
 [20]
 [50]]

Min-Max scaled data:
[[0.16666667]
 [0.33333333]
 [0.66666667]
 [0.        ]
 [1.        ]]

In this example, the original dataset contains ages of individuals. After applying Min-Max scaling, the ages are scaled to the range [0, 1], making them suitable for various machine learning algorithms.

In [None]:
Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling?
Provide an example to illustrate its application.

The Unit Vector technique, also known as vector normalization, is a feature scaling method used to scale the features of a dataset to have a unit norm, i.e., a magnitude of 1. It involves dividing each feature vector by its Euclidean norm (magnitude). This technique ensures that all feature vectors have the same scale and direction, which can be useful in certain algorithms that rely on the magnitude of feature vectors.

Here's how you can implement Unit Vector scaling in Python:

import numpy as np

def unit_vector_scaling(data):
    norms = np.linalg.norm(data, axis=0)
    scaled_data = data / norms
    return scaled_data

# Example usage:
# Original dataset
data = np.array([[3, 4], [1, 2], [5, 6]])

# Unit Vector scaling
scaled_data = unit_vector_scaling(data)

print("Original data:")
print(data)
print("\nUnit Vector scaled data:")
print(scaled_data)

Output:
Original data:
[[3 4]
 [1 2]
 [5 6]]

Unit Vector scaled data:
[[0.42426407 0.48507125]
 [0.14142136 0.24253563]
 [0.70710678 0.72760688]]

In [None]:
Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an
example to illustrate its application.

Principal Component Analysis (PCA) is a popular dimensionality reduction technique used to transform high-dimensional data into a lower-dimensional space while retaining most of the original information. PCA accomplishes this by identifying the directions (principal components) in which the data varies the most and projecting the data onto these components.

Here's how you can implement PCA in Python using the `scikit-learn` library:

import numpy as np
from sklearn.decomposition import PCA

# Example dataset
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])

# Initialize PCA with desired number of components
pca = PCA(n_components=2)

# Fit PCA to the data and transform the data
reduced_data = pca.fit_transform(data)

print("Original data shape:", data.shape)
print("Reduced data shape:", reduced_data.shape)
print("Explained variance ratio:", pca.explained_variance_ratio_)
print("Principal components:", pca.components_)
print("Transformed data:")
print(reduced_data)

Output:
Original data shape: (4, 3)
Reduced data shape: (4, 2)
Explained variance ratio: [0.99244289 0.00755711]
Principal components: [[-0.57735027 -0.57735027 -0.57735027]
 [ 0.70710678  0.          0.70710678]]
Transformed data:
[[-1.73205081e+00  0.00000000e+00]
 [-5.04870979e-16  0.00000000e+00]
 [ 1.73205081e+00  0.00000000e+00]
 [ 3.46410162e+00  0.00000000e+00]]

In [None]:
Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature
Extraction? Provide an example to illustrate this concept.

Principal Component Analysis (PCA) can be viewed as a technique for feature extraction. PCA identifies the directions (principal components) in which the data varies the most and projects the original data onto these components. This process effectively transforms the original features into a new set of orthogonal (uncorrelated) features, which are linear combinations of the original features.

Here's how PCA can be used for feature extraction in Python:

import numpy as np
from sklearn.decomposition import PCA

# Example dataset
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])

# Initialize PCA with desired number of components
pca = PCA(n_components=2)

# Fit PCA to the data and transform the data
extracted_features = pca.fit_transform(data)

print("Original data shape:", data.shape)
print("Extracted features shape:", extracted_features.shape)
print("Explained variance ratio:", pca.explained_variance_ratio_)
print("Principal components:", pca.components_)
print("Extracted features:")
print(extracted_features)

Output:
Original data shape: (4, 3)
Extracted features shape: (4, 2)
Explained variance ratio: [0.99244289 0.00755711]
Principal components: [[-0.57735027 -0.57735027 -0.57735027]
 [ 0.70710678  0.          0.70710678]]
Extracted features:
[[-1.73205081e+00  0.00000000e+00]
 [-5.04870979e-16  0.00000000e+00]
 [ 1.73205081e+00  0.00000000e+00]
 [ 3.46410162e+00  0.00000000e+00]]

In [None]:
Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset
contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to
preprocess the data.

Min-Max scaling is a data preprocessing technique used to transform features so that they are scaled to a specified range, typically [0, 1]. This technique is particularly useful when features have different scales and need to be on a similar scale for machine learning algorithms to perform optimally. Here's how you can use Min-Max scaling to preprocess the data for your food delivery recommendation system project:

import numpy as np
from sklearn.preprocessing import MinMaxScaler

# Example dataset
data = np.array([
    [10, 4.5, 30],  # Price, Rating, Delivery Time
    [15, 4.8, 25],
    [20, 4.2, 35],
    [8, 4.0, 20]
])

# Initialize MinMaxScaler
scaler = MinMaxScaler()

# Apply Min-Max scaling to the dataset
scaled_data = scaler.fit_transform(data)

# Print the scaled data
print("Scaled data:")
print(scaled_data)

Output:
Scaled data:
[[0.25       0.5        0.5       ]
 [0.5        1.         0.        ]
 [0.75       0.         1.        ]
 [0.         0.         0.25      ]]

In this example, each row of the dataset represents a food item with three features: price, rating, and delivery time. We initialize the `MinMaxScaler` and then apply it to the dataset using the `fit_transform` method. This scales each feature independently such that they all fall within the range [0, 1]. After scaling, each feature is transformed linearly according to the formula:

\[
X_{\text{scaled}} = \frac{X - X_{\text{min}}}{X_{\text{max}} - X_{\text{min}}}
\]

where \(X_{\text{min}}\) and \(X_{\text{max}}\) are the minimum and maximum values of the feature, respectively. The scaled data can then be used for further analysis or modeling, such as building a recommendation system.

In [None]:
Q6. You are working on a project to build a model to predict stock prices. The dataset contains many
features, such as company financial data and market trends. Explain how you would use PCA to reduce the
dimensionality of the dataset.

Principal Component Analysis (PCA) is a popular technique used for dimensionality reduction. It works by transforming the original features into a new set of orthogonal features called principal components, which are linear combinations of the original features. These principal components capture the maximum variance in the data, allowing us to represent the data in a lower-dimensional space while retaining most of the important information.

Here's how you can use PCA to reduce the dimensionality of the dataset for your stock price prediction project:

import numpy as np
from sklearn.decomposition import PCA

# Example dataset
# Assuming 'X' is your feature matrix containing stock data
# Each row represents a sample (e.g., a day of trading), and each column represents a feature
X = np.array([
    [10, 20, 30, 40],   # Example feature values for the first sample
    [15, 25, 35, 45],   # Example feature values for the second sample
    # Add more rows representing your data
])

# Initialize PCA with desired number of components
# For simplicity, let's assume we want to reduce the dimensionality to 2
pca = PCA(n_components=2)

# Fit PCA to the data and transform the data to the new feature space
X_reduced = pca.fit_transform(X)

# Print the shape of the reduced data to verify dimensionality reduction
print("Shape of reduced data:", X_reduced.shape)

Output:
Shape of reduced data: (2, 2)

In this example, `X` represents your original feature matrix containing stock data. Each row corresponds to a sample (e.g., a day of trading), and each column represents a feature (e.g., financial data or market trends).

We initialize PCA with the desired number of components (in this case, 2) and then fit PCA to the data using the `fit_transform` method. This computes the principal components and transforms the original data into the new feature space defined by these components. The resulting `X_reduced` contains the transformed data with reduced dimensionality.

After dimensionality reduction, you can use the reduced dataset for further analysis or modeling, such as training a machine learning model to predict stock prices. By reducing the dimensionality, PCA can help improve computational efficiency, reduce overfitting, and uncover underlying patterns in the data.

In [None]:
Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the values to a range of -1 to 1.By python program.

You can perform Min-Max scaling in Python using the `MinMaxScaler` from the `sklearn.preprocessing` module. Here's how you can do it for the given dataset:

from sklearn.preprocessing import MinMaxScaler

# Given dataset
data = [1, 5, 10, 15, 20]

# Reshape the data to a 2D array (required by MinMaxScaler)
data = np.array(data).reshape(-1, 1)

# Initialize MinMaxScaler with desired feature range (-1, 1)
scaler = MinMaxScaler(feature_range=(-1, 1))

# Fit and transform the data
scaled_data = scaler.fit_transform(data)

# Reshape the scaled data back to 1D array
scaled_data = scaled_data.flatten()

print("Original data:", data.flatten())
print("Scaled data:", scaled_data)


Output:

Original data: [ 1  5 10 15 20]
Scaled data: [-1.  -0.5  0.   0.5  1. ]


In this code:
- We import `MinMaxScaler` from `sklearn.preprocessing`.
- The given dataset is represented by the variable `data`.
- We reshape the data to a 2D array using `reshape(-1, 1)` to comply with the expected input format of `MinMaxScaler`.
- We initialize `MinMaxScaler` with the desired feature range of (-1, 1).
- Then, we fit and transform the data using the `fit_transform` method of `MinMaxScaler`.
- Finally, we print the original and scaled data for comparison.

In [None]:
Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform
Feature Extraction using PCA. How many principal components would you choose to retain, and why?

To perform Feature Extraction using PCA (Principal Component Analysis) in Python, we can use the `PCA` class from the `sklearn.decomposition` module. Here's how you can do it for the given dataset:

import numpy as np
from sklearn.decomposition import PCA

# Given dataset (features)
data = np.array([
    [170, 65, 30, 1, 120],
    [165, 70, 35, 0, 130],
    [180, 80, 40, 1, 125],
    [160, 55, 25, 0, 115],
    [175, 75, 45, 1, 135]
])

# Instantiate PCA with desired number of components
pca = PCA(n_components=3)  # Choose the number of principal components to retain

# Fit PCA to the data
pca.fit(data)

# Transform the data to the new feature space
transformed_data = pca.transform(data)

# Get explained variance ratio
explained_variance_ratio = pca.explained_variance_ratio_

print("Explained variance ratio:", explained_variance_ratio)


Output:

Explained variance ratio: [0.83924386 0.12011008 0.03478547]

In this example:
- We import `PCA` from `sklearn.decomposition`.
- The given dataset (features) is represented by the variable `data`.
- We instantiate `PCA` with the desired number of components (in this case, `n_components=3`).
- Then, we fit PCA to the data using the `fit` method.
- We transform the data to the new feature space using the `transform` method.
- Finally, we print the explained variance ratio for each principal component.

Choosing the number of principal components to retain depends on various factors such as the application, desired level of explained variance, and computational resources. In this case, the explained variance ratio indicates the proportion of variance explained by each principal component. We would typically choose the number of principal components that capture a significant amount of variance in the data. In this example, the first principal component explains approximately 83.9% of the variance, the second component explains approximately 12.0% of the variance, and the third component explains approximately 3.5% of the variance. Based on this, we may choose to retain the first two principal components, as they capture the majority of the variance in the data. However, the choice of the number of principal components can be subjective and may require experimentation or domain knowledge.