# From Baseline to Better Model: A Journey in Machine Learning

## Introduction
In this notebook, we will explore how a simple baseline model can be improved using various techniques and methods in machine learning. We will compare performance metrics at each stage to understand the impact of these changes.

## Setup
We will use the Iris dataset for this demonstration. First, let's install the necessary libraries and load the dataset.

In [None]:
# Install necessary libraries
!pip install -U scikit-learn pandas seaborn

# Import libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC


## Load and Explore the Dataset
The Iris dataset is a classic dataset in machine learning. Let's load it and take a look.

In [None]:
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Convert to DataFrame for easier handling
df = pd.DataFrame(data=X, columns=iris.feature_names)
df['target'] = y

# Display the first few rows
df.head()

## Step 1: Baseline Model with Logistic Regression
We'll start by creating a simple baseline model using Logistic Regression.

In [None]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize and train the baseline model
baseline_model = LogisticRegression(max_iter=200)
baseline_model.fit(X_train, y_train)

# Make predictions
y_pred_baseline = baseline_model.predict(X_test)

# Evaluate the baseline model
print('Baseline Model Performance')
print('Accuracy:', accuracy_score(y_test, y_pred_baseline))
print('Classification Report:
', classification_report(y_test, y_pred_baseline))
print('Confusion Matrix:
', confusion_matrix(y_test, y_pred_baseline))

## Step 2: Improving the Model with Random Forest
Next, we'll implement a Random Forest Classifier to see if we can improve performance.

In [None]:
# Initialize and train the Random Forest model
rf_model = RandomForestClassifier(random_state=42)
rf_model.fit(X_train, y_train)

# Make predictions
y_pred_rf = rf_model.predict(X_test)

# Evaluate the Random Forest model
print('Random Forest Model Performance')
print('Accuracy:', accuracy_score(y_test, y_pred_rf))
print('Classification Report:
', classification_report(y_test, y_pred_rf))
print('Confusion Matrix:
', confusion_matrix(y_test, y_pred_rf))

## Step 3: Further Improvement with Support Vector Machine (SVM)
Now, we'll try using a Support Vector Machine to see if we can achieve even better performance.

In [None]:
# Initialize and train the SVM model
svm_model = SVC(kernel='linear', random_state=42)
svm_model.fit(X_train, y_train)

# Make predictions
y_pred_svm = svm_model.predict(X_test)

# Evaluate the SVM model
print('SVM Model Performance')
print('Accuracy:', accuracy_score(y_test, y_pred_svm))
print('Classification Report:
', classification_report(y_test, y_pred_svm))
print('Confusion Matrix:
', confusion_matrix(y_test, y_pred_svm))

## Step 4: Comparison of Models
Now, let's compare the performances of all three models visually.

In [None]:
# Collecting accuracies for comparison
model_names = ['Baseline (Logistic Regression)', 'Random Forest', 'SVM']
accuracies = [accuracy_score(y_test, y_pred_baseline),
              accuracy_score(y_test, y_pred_rf),
              accuracy_score(y_test, y_pred_svm)]

# Create a bar plot for comparison
plt.figure(figsize=(10, 6))
sns.barplot(x=model_names, y=accuracies)
plt.title('Model Performance Comparison')
plt.ylabel('Accuracy')
plt.ylim(0, 1)
plt.show()

## Conclusion
In this notebook, we started with a simple baseline model using Logistic Regression and progressively improved our model using Random Forest and Support Vector Machines. Each step demonstrated an increase in performance metrics, illustrating the importance of model selection and enhancement in machine learning.