# **ML Lab - Machine Learning Techniques**
Urlana Suresh Kumar - 22071A6662

In this File, we explore different machine learning techniques, categorized into supervised and unsupervised learning. We cover the following topics:

1. **Supervised Learning - Classification**: This involves training a model to classify data into distinct categories. We use the Iris dataset and apply the Random Forest Classifier to predict the species of flowers based on various features.

2. **Supervised Learning - Regression**: Regression tasks involve predicting a continuous value. We use the California housing dataset and apply the Random Forest Regressor to predict housing prices based on various features like location, number of rooms, and more.

3. **Unsupervised Learning - Clustering**: Clustering is a type of unsupervised learning where the goal is to group similar data points. We use K-Means clustering on the Iris dataset and evaluate the clustering performance using Silhouette Score and Adjusted Rand Index.

The following code blocks demonstrate how to apply these algorithms using Python's scikit-learn library, followed by model evaluation metrics.


#**1. Supervised Learning - Classification**

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load data
data = load_iris()
X = data.data
y = data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train model
clf = RandomForestClassifier(random_state=42)
clf.fit(X_train, y_train)

# Make predictions
y_pred = clf.predict(X_test)

# Evaluate model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print("Classification Report:\n", classification_report(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

Accuracy: 1.0
Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45

Confusion Matrix:
 [[19  0  0]
 [ 0 13  0]
 [ 0  0 13]]


#**2. Supervised Learning - Regression**

In [2]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Load the California housing dataset
data = fetch_california_housing()
X = data.data
y = data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train model
regressor = RandomForestRegressor(random_state=42)
regressor.fit(X_train, y_train)

# Make predictions
y_pred = regressor.predict(X_test)

# Evaluate model
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Absolute Error (MAE):", mae)
print("Mean Squared Error (MSE):", mse)
print("R² Score:", r2)

Mean Absolute Error (MAE): 0.33228759407299757
Mean Squared Error (MSE): 0.25650512920799395
R² Score: 0.8045734925119942


#**Unsupervised Learning - Clustering**

In [3]:
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score, adjusted_rand_score

# Load data
data = load_iris()
X = data.data
y_true = data.target # True labels for evaluation purposes

# Train K-Means clustering model
kmeans = KMeans(n_clusters=3, random_state=42)
y_pred = kmeans.fit_predict(X)

# Evaluate model
silhouette_avg = silhouette_score(X, y_pred)
ari = adjusted_rand_score(y_true, y_pred)

print("Silhouette Score:", silhouette_avg)
print("Adjusted Rand Index (ARI):", ari)

Silhouette Score: 0.551191604619592
Adjusted Rand Index (ARI): 0.7163421126838476


# Conclusion

In this tutorial, we've successfully implemented and evaluated different machine learning models using Python's scikit-learn library:

- **Supervised Learning (Classification)**: Achieved a perfect classification accuracy of 1.0 using the Random Forest Classifier on the Iris dataset.
- **Supervised Learning (Regression)**: The Random Forest Regressor on the California housing dataset achieved an R² score of 0.804, indicating a good model fit.
- **Unsupervised Learning (Clustering)**: Applied K-Means clustering on the Iris dataset and evaluated the results using the Silhouette Score and Adjusted Rand Index, yielding a silhouette score of 0.55 and an ARI of 0.72, demonstrating effective clustering performance.

These examples highlight the versatility and power of machine learning algorithms in different domains and their ability to solve real-world problems by analyzing and making predictions based on data.