## Q1. Write a Python code to implement the KNN classifier algorithm on load_iris dataset in sklearn.datasets.

In [5]:
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix,accuracy_score,classification_report

# Load the Iris dataset
iris = load_iris()
X = iris.data  # Features
y = iris.target  # Target variable

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the KNN classifier
knn_classifier = KNeighborsClassifier(n_neighbors=5,algorithm='auto')  # You can adjust the value of 'n_neighbors'

# Train the classifier on the training set
knn_classifier.fit(X_train, y_train)

# Make predictions on the test set
y_pred = knn_classifier.predict(X_test)

# Evaluate the accuracy of the classifier

print(confusion_matrix(y_pred,y_test))
print(accuracy_score(y_pred,y_test))
print(classification_report(y_pred,y_test))




[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]
1.0
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



## Q2. Write a Python code to implement the KNN regressor algorithm on load_boston dataset in sklearn.datasets.

In [6]:
# Import necessary libraries
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Load the Boston Housing dataset
boston = load_boston()
X = boston.data  # Features
y = boston.target  # Target variable

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the KNN regressor
knn_regressor = KNeighborsRegressor(n_neighbors=3)  # You can adjust the value of 'n_neighbors'

# Train the regressor on the training set
knn_regressor.fit(X_train, y_train)

# Make predictions on the test set
y_pred = knn_regressor.predict(X_test)

# Evaluate the performance of the regressor using Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")

# Optionally, you can test the regressor with custom input
custom_input = np.array([[0.00632, 18.0, 2.31, 0.0, 0.538, 6.575, 65.2, 4.0900, 1.0, 296.0, 15.3, 396.90, 4.98]])
custom_prediction = knn_regressor.predict(custom_input)
print(f"Custom Prediction: {custom_prediction[0]:.2f}")


ImportError: 
`load_boston` has been removed from scikit-learn since version 1.2.

The Boston housing prices dataset has an ethical problem: as
investigated in [1], the authors of this dataset engineered a
non-invertible variable "B" assuming that racial self-segregation had a
positive impact on house prices [2]. Furthermore the goal of the
research that led to the creation of this dataset was to study the
impact of air quality but it did not give adequate demonstration of the
validity of this assumption.

The scikit-learn maintainers therefore strongly discourage the use of
this dataset unless the purpose of the code is to study and educate
about ethical issues in data science and machine learning.

In this special case, you can fetch the dataset from the original
source::

    import pandas as pd
    import numpy as np

    data_url = "http://lib.stat.cmu.edu/datasets/boston"
    raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
    data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
    target = raw_df.values[1::2, 2]

Alternative datasets include the California housing dataset and the
Ames housing dataset. You can load the datasets as follows::

    from sklearn.datasets import fetch_california_housing
    housing = fetch_california_housing()

for the California housing dataset and::

    from sklearn.datasets import fetch_openml
    housing = fetch_openml(name="house_prices", as_frame=True)

for the Ames housing dataset.

[1] M Carlisle.
"Racist data destruction?"
<https://medium.com/@docintangible/racist-data-destruction-113e3eff54a8>

[2] Harrison Jr, David, and Daniel L. Rubinfeld.
"Hedonic housing prices and the demand for clean air."
Journal of environmental economics and management 5.1 (1978): 81-102.
<https://www.researchgate.net/publication/4974606_Hedonic_housing_prices_and_the_demand_for_clean_air>


## Q3. Write a Python code snippet to find the optimal value of K for the KNN classifier algorithm using cross-validation on load_iris dataset in sklearn.datasets.

In [7]:
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score, KFold
from sklearn.neighbors import KNeighborsClassifier
import numpy as np

# Load the Iris dataset
iris = load_iris()
X = iris.data  # Features
y = iris.target  # Target variable

# Define a range of k values to evaluate
k_values = list(range(1, 21))  # You can adjust the range as needed

# Initialize an empty list to store mean cross-validated accuracies
mean_accuracies = []

# Perform cross-validation for each k value
for k in k_values:
    knn_classifier = KNeighborsClassifier(n_neighbors=k)
    
    # Use 5-fold cross-validation
    kfold = KFold(n_splits=5, shuffle=True, random_state=42)
    
    # Calculate mean cross-validated accuracy
    accuracies = cross_val_score(knn_classifier, X, y, cv=kfold, scoring='accuracy')
    mean_accuracies.append(np.mean(accuracies))

# Find the optimal k value with the highest mean accuracy
optimal_k = k_values[np.argmax(mean_accuracies)]
optimal_accuracy = max(mean_accuracies)

print(f"Optimal k value: {optimal_k}")
print(f"Highest mean accuracy: {optimal_accuracy:.2f}")


Optimal k value: 13
Highest mean accuracy: 0.98


## Q4. Implement the KNN regressor algorithm with feature scaling on load_boston dataset in sklearn.datasets.

In [8]:
# Import necessary libraries
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
import numpy as np

# Load the Boston Housing dataset
boston = load_boston()
X = boston.data  # Features
y = boston.target  # Target variable

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the StandardScaler for feature scaling
scaler = StandardScaler()

# Fit and transform the scaler on the training set
X_train_scaled = scaler.fit_transform(X_train)

# Transform the testing set using the same scaler
X_test_scaled = scaler.transform(X_test)

# Initialize the KNN regressor
knn_regressor = KNeighborsRegressor(n_neighbors=3)  # You can adjust the value of 'n_neighbors'

# Train the regressor on the scaled training set
knn_regressor.fit(X_train_scaled, y_train)

# Make predictions on the scaled test set
y_pred = knn_regressor.predict(X_test_scaled)

# Evaluate the performance of the regressor using Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")

# Optionally, you can test the regressor with custom input (scaled)
custom_input_scaled = scaler.transform(np.array([[0.00632, 18.0, 2.31, 0.0, 0.538, 6.575, 65.2, 4.0900, 1.0, 296.0, 15.3, 396.90, 4.98]]))
custom_prediction = knn_regressor.predict(custom_input_scaled)
print(f"Custom Prediction (scaled): {custom_prediction[0]:.2f}")


ImportError: 
`load_boston` has been removed from scikit-learn since version 1.2.

The Boston housing prices dataset has an ethical problem: as
investigated in [1], the authors of this dataset engineered a
non-invertible variable "B" assuming that racial self-segregation had a
positive impact on house prices [2]. Furthermore the goal of the
research that led to the creation of this dataset was to study the
impact of air quality but it did not give adequate demonstration of the
validity of this assumption.

The scikit-learn maintainers therefore strongly discourage the use of
this dataset unless the purpose of the code is to study and educate
about ethical issues in data science and machine learning.

In this special case, you can fetch the dataset from the original
source::

    import pandas as pd
    import numpy as np

    data_url = "http://lib.stat.cmu.edu/datasets/boston"
    raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
    data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
    target = raw_df.values[1::2, 2]

Alternative datasets include the California housing dataset and the
Ames housing dataset. You can load the datasets as follows::

    from sklearn.datasets import fetch_california_housing
    housing = fetch_california_housing()

for the California housing dataset and::

    from sklearn.datasets import fetch_openml
    housing = fetch_openml(name="house_prices", as_frame=True)

for the Ames housing dataset.

[1] M Carlisle.
"Racist data destruction?"
<https://medium.com/@docintangible/racist-data-destruction-113e3eff54a8>

[2] Harrison Jr, David, and Daniel L. Rubinfeld.
"Hedonic housing prices and the demand for clean air."
Journal of environmental economics and management 5.1 (1978): 81-102.
<https://www.researchgate.net/publication/4974606_Hedonic_housing_prices_and_the_demand_for_clean_air>


## Q5. Write a Python code snippet to implement the KNN classifier algorithm with weighted voting on load_iris dataset in sklearn.datasets.

In [10]:
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score,confusion_matrix,classification_report

# Load the Iris dataset
iris = load_iris()
X = iris.data  # Features
y = iris.target  # Target variable

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the KNN classifier with weighted voting
knn_classifier_weighted = KNeighborsClassifier(n_neighbors=5, weights='distance')  # You can adjust the value of 'n_neighbors'

# Train the classifier on the training set
knn_classifier_weighted.fit(X_train, y_train)

# Make predictions on the test set
y_pred_weighted = knn_classifier_weighted.predict(X_test)

# Evaluate the accuracy of the classifier with weighted voting
print(confusion_matrix(y_pred_weighted,y_test))
print(accuracy_score(y_pred_weighted,y_test))
print(classification_report(y_pred_weighted,y_test))

[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]
1.0
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



## Q6. Implement a function to standardise the features before applying KNN classifier.

In [11]:
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

def knn_classifier_with_standardization(X_train, X_test, y_train, y_test, n_neighbors=3):
    # Initialize the StandardScaler for feature scaling
    scaler = StandardScaler()

    # Fit and transform the scaler on the training set
    X_train_scaled = scaler.fit_transform(X_train)

    # Transform the testing set using the same scaler
    X_test_scaled = scaler.transform(X_test)

    # Initialize the KNN classifier
    knn_classifier = KNeighborsClassifier(n_neighbors=n_neighbors)

    # Train the classifier on the scaled training set
    knn_classifier.fit(X_train_scaled, y_train)

    # Make predictions on the scaled test set
    y_pred = knn_classifier.predict(X_test_scaled)

    # Evaluate the accuracy of the classifier
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Accuracy: {accuracy:.2f}")

# Load the Iris dataset
iris = load_iris()
X = iris.data  # Features
y = iris.target  # Target variable

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Apply the KNN classifier with feature standardization
knn_classifier_with_standardization(X_train, X_test, y_train, y_test, n_neighbors=3)


Accuracy: 1.00


## Q7. Write a Python function to calculate the euclidean distance between two points.

In [12]:
import math

def euclidean_distance(point1, point2):
    """
    Calculate the Euclidean distance between two points in a two-dimensional space.

    Parameters:
    - point1: Tuple or list representing the coordinates of the first point (x1, y1).
    - point2: Tuple or list representing the coordinates of the second point (x2, y2).

    Returns:
    - Euclidean distance between the two points.
    """
    if len(point1) != len(point2):
        raise ValueError("Both points must have the same number of coordinates.")

    squared_distances = [(p2 - p1) ** 2 for p1, p2 in zip(point1, point2)]
    distance = math.sqrt(sum(squared_distances))
    
    return distance

# Example usage:
point_a = (1, 2)
point_b = (4, 6)

distance_ab = euclidean_distance(point_a, point_b)
print(f"Euclidean Distance between {point_a} and {point_b}: {distance_ab:.2f}")


Euclidean Distance between (1, 2) and (4, 6): 5.00


## Q8. Write a Python function to calculate the manhattan distance between two points.

In [13]:
def manhattan_distance(point1, point2):
    """
    Calculate the Manhattan distance between two points in a two-dimensional space.

    Parameters:
    - point1: Tuple or list representing the coordinates of the first point (x1, y1).
    - point2: Tuple or list representing the coordinates of the second point (x2, y2).

    Returns:
    - Manhattan distance between the two points.
    """
    if len(point1) != len(point2):
        raise ValueError("Both points must have the same number of coordinates.")

    distances = [abs(p2 - p1) for p1, p2 in zip(point1, point2)]
    distance = sum(distances)

    return distance

# Example usage:
point_a = (1, 2)
point_b = (4, 6)

distance_ab = manhattan_distance(point_a, point_b)
print(f"Manhattan Distance between {point_a} and {point_b}: {distance_ab}")


Manhattan Distance between (1, 2) and (4, 6): 7
