In [None]:
Q1. Write a Python code to implement the KNN classifier algorithm on load_iris dataset in
sklearn.datasets.


ANS-1



Sure, here's a Python code to implement the KNN classifier algorithm on the `load_iris` dataset from `sklearn.datasets`:

```python
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize the features to have mean=0 and variance=1
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initialize the KNN classifier with k=3 (you can change the value of k as needed)
knn_classifier = KNeighborsClassifier(n_neighbors=3)

# Train the classifier on the training data
knn_classifier.fit(X_train, y_train)

# Make predictions on the test data
y_pred = knn_classifier.predict(X_test)

# Calculate the accuracy of the classifier
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Generate a classification report and confusion matrix
print("Classification Report:")
print(classification_report(y_test, y_pred))

print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
```

In this code, we first load the iris dataset and split it into training and testing sets. Then, we standardize the features to have mean=0 and variance=1 using `StandardScaler`. Next, we initialize the KNN classifier with k=3 (you can change the value of `n_neighbors` to adjust the k value). We then train the KNN classifier on the training data and make predictions on the test data. Finally, we calculate the accuracy of the classifier, generate a classification report, and print the confusion matrix to evaluate the model's performance.





Q2. Write a Python code to implement the KNN regressor algorithm on load_boston dataset in
sklearn.datasets.



ANS-2



Sure, here's a Python code to implement the KNN regressor algorithm on the `load_boston` dataset from `sklearn.datasets`:

```python
import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error, r2_score

# Load the Boston dataset
boston = load_boston()
X = boston.data
y = boston.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize the features to have mean=0 and variance=1
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initialize the KNN regressor with k=5 (you can change the value of k as needed)
knn_regressor = KNeighborsRegressor(n_neighbors=5)

# Train the regressor on the training data
knn_regressor.fit(X_train, y_train)

# Make predictions on the test data
y_pred = knn_regressor.predict(X_test)

# Calculate the mean squared error and R-squared score of the regressor
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R-squared score:", r2)
```

In this code, we first load the Boston dataset and split it into training and testing sets. Then, we standardize the features to have mean=0 and variance=1 using `StandardScaler`. Next, we initialize the KNN regressor with k=5 (you can change the value of `n_neighbors` to adjust the k value). We then train the KNN regressor on the training data and make predictions on the test data. Finally, we calculate the mean squared error and R-squared score of the regressor to evaluate its performance.




Q3. Write a Python code snippet to find the optimal value of K for the KNN classifier algorithm using
cross-validation on load_iris dataset in sklearn.datasets.



ANS-3



To find the optimal value of K for the KNN classifier algorithm using cross-validation on the `load_iris` dataset from `sklearn.datasets`, you can use the `GridSearchCV` function from `sklearn.model_selection`. `GridSearchCV` allows you to perform an exhaustive search over specified parameter values and find the best hyperparameter using cross-validation. Here's a Python code snippet to achieve this:

```python
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier

# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize the features to have mean=0 and variance=1
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Define the range of K values to search
k_values = np.arange(1, 21)  # You can adjust this range as needed

# Initialize the KNN classifier
knn_classifier = KNeighborsClassifier()

# Create a dictionary with the hyperparameters to tune
param_grid = {'n_neighbors': k_values}

# Initialize GridSearchCV with the KNN classifier and the parameter grid
grid_search = GridSearchCV(knn_classifier, param_grid, cv=5, scoring='accuracy')

# Perform the grid search to find the best K value using cross-validation
grid_search.fit(X_train, y_train)

# Get the best K value and its corresponding accuracy score
best_k = grid_search.best_params_['n_neighbors']
best_accuracy = grid_search.best_score_

print("Best K value:", best_k)
print("Best Accuracy:", best_accuracy)
```

In this code, we first load the iris dataset and split it into training and testing sets. Then, we standardize the features using `StandardScaler`. Next, we define the range of K values (in this case, from 1 to 20) that we want to search over. We then initialize the KNN classifier and create a dictionary with the hyperparameters to tune, where `n_neighbors` is the hyperparameter representing K. We use `GridSearchCV` with 5-fold cross-validation (`cv=5`) to perform an exhaustive search over the specified K values and find the best K value based on the accuracy metric.

After performing the grid search, we extract the best K value and its corresponding accuracy score. The best K value is the one that results in the highest accuracy on the training data. You can use this optimal K value to train the final KNN classifier and evaluate its performance on the test data.




Q4. Implement the KNN regressor algorithm with feature scaling on load_boston dataset in
sklearn.datasets.



ANS-4


To implement the KNN regressor algorithm with feature scaling on the `load_boston` dataset from `sklearn.datasets`, we need to follow these steps:

1. Load the dataset.
2. Split the data into training and testing sets.
3. Perform feature scaling on the features.
4. Initialize the KNN regressor.
5. Train the regressor on the training data.
6. Make predictions on the test data.
7. Evaluate the performance of the regressor using metrics such as mean squared error (MSE) and R-squared score.

Here's the Python code to achieve this:

```python
import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error, r2_score

# Load the Boston dataset
boston = load_boston()
X = boston.data
y = boston.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize the features to have mean=0 and variance=1
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initialize the KNN regressor with k=5 (you can change the value of k as needed)
knn_regressor = KNeighborsRegressor(n_neighbors=5)

# Train the regressor on the training data
knn_regressor.fit(X_train, y_train)

# Make predictions on the test data
y_pred = knn_regressor.predict(X_test)

# Calculate the mean squared error and R-squared score of the regressor
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R-squared score:", r2)
```

In this code, we first load the Boston dataset and split it into training and testing sets. Then, we perform feature scaling on the features using `StandardScaler`. Next, we initialize the KNN regressor with k=5 (you can change the value of `n_neighbors` to adjust the k value). We then train the KNN regressor on the training data and make predictions on the test data. Finally, we calculate the mean squared error and R-squared score to evaluate the performance of the regressor.

Note: Feature scaling is crucial for KNN algorithms because it relies on the distance between data points. Standardizing the features helps to avoid any bias caused by features with larger scales dominating the distance calculations.




Q5. Write a Python code snippet to implement the KNN classifier algorithm with weighted voting on
load_iris dataset in sklearn.datasets.



ANS-5



To implement the KNN classifier algorithm with weighted voting on the `load_iris` dataset from `sklearn.datasets`, we need to follow these steps:

1. Load the dataset.
2. Split the data into training and testing sets.
3. Perform any necessary feature scaling (not required for all distance metrics).
4. Initialize the KNN classifier with weighted voting.
5. Train the classifier on the training data.
6. Make predictions on the test data.
7. Evaluate the performance of the classifier using metrics such as accuracy, classification report, and confusion matrix.

Here's the Python code to implement the KNN classifier with weighted voting:

```python
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Optional: Standardize the features to have mean=0 and variance=1
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initialize the KNN classifier with weighted voting
knn_classifier = KNeighborsClassifier(n_neighbors=3, weights='distance')

# Train the classifier on the training data
knn_classifier.fit(X_train, y_train)

# Make predictions on the test data
y_pred = knn_classifier.predict(X_test)

# Calculate the accuracy of the classifier
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Generate a classification report and confusion matrix
print("Classification Report:")
print(classification_report(y_test, y_pred))

print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
```

In this code, we first load the iris dataset and split it into training and testing sets. Optionally, we perform feature scaling using `StandardScaler`, but it's worth noting that weighted voting does not require feature scaling, as it already takes into account the distance between data points.

Next, we initialize the KNN classifier with weighted voting by setting the `weights` parameter to `'distance'`. This means that the vote of each neighbor is weighted based on its distance to the query point, with closer neighbors having more influence on the prediction.

We then train the KNN classifier on the training data and make predictions on the test data. Finally, we calculate the accuracy of the classifier and generate a classification report and confusion matrix to evaluate its performance.

By using weighted voting, the KNN classifier gives more importance to closer neighbors, which can be particularly useful when data points that are closer to the query point are more likely to be similar and relevant for making accurate predictions.





Q6. Implement a function to standardise the features before applying KNN classifier.



ANS-6



To standardize the features before applying the KNN classifier, you can create a function that takes the feature matrix X as input and returns the standardized feature matrix. The standardization process involves subtracting the mean from each feature and dividing by the standard deviation, so that the features have mean=0 and variance=1.

Here's a Python function to perform feature standardization:

```python
import numpy as np

def standardize_features(X):
    """
    Standardize the features in the feature matrix X.
    
    Parameters:
    X (numpy array): Feature matrix of shape (n_samples, n_features).
    
    Returns:
    X_standardized (numpy array): Standardized feature matrix of shape (n_samples, n_features).
    """
    mean = np.mean(X, axis=0)
    std = np.std(X, axis=0)
    X_standardized = (X - mean) / std
    return X_standardized
```

You can use this function to standardize the features before applying the KNN classifier:

```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize the features in the training and testing sets
X_train_standardized = standardize_features(X_train)
X_test_standardized = standardize_features(X_test)

# Initialize the KNN classifier
knn_classifier = KNeighborsClassifier(n_neighbors=3)

# Train the classifier on the standardized training data
knn_classifier.fit(X_train_standardized, y_train)

# Make predictions on the standardized test data
y_pred = knn_classifier.predict(X_test_standardized)

# Calculate the accuracy of the classifier
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
```

In this code, we first define the `standardize_features` function that takes a feature matrix X as input, computes the mean and standard deviation for each feature, and then standardizes the features by subtracting the mean and dividing by the standard deviation. We then use this function to standardize the features in the training and testing sets before applying the KNN classifier. This ensures that the features have mean=0 and variance=1, making the KNN algorithm more effective, especially if it uses distance-based metrics like Euclidean distance or Manhattan distance.





Q7. Write a Python function to calculate the euclidean distance between two points.



ANS-7


