# Fool's Gold Classification Model

Congratulations on creating your model in the Fool's Gold lesson! This Jupyter Notebook contains the Python code that reproduces the results of the model you just built. You can run this notebook as is or modify it to experiment further.


## Setting Up Your Environment

First, let's import the necessary libraries and ensure your dataset is ready.

In [10]:
# Import Necessary Libraries
# Here we import libraries needed for data handling and machine learning models. 
# Feel free to add any library according to your needs.
import pandas as pd
import numpy as np
from sklearnex import patch_sklearn
patch_sklearn()
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)


## Loading Your Dataset

In the Fool's Gold lesson, you've worked with a dataset to create your model. Now, it's time to bring that dataset into this notebook to see the model in action. Make sure the `train.csv` and `test.csv` files are in the same directory as this Jupyter notebook. This will allow us to easily load the data for training and testing our model.


In [23]:
# Read Dataset
# These lines load your training and testing datasets.
# Ensure 'train.csv' and 'test.csv' are in this notebook's directory.
train_df = pd.read_csv('train.csv')
test_df = pd.read_csv('test.csv')


## Selecting Features for Your Model

In the "Fool's Gold" lesson, you've learned how to identify real gold from fool's gold using various features. Now, let's decide which characteristics (features) of our samples the model should use to make predictions. You'll fill in the `features` list with the attributes you believe are most indicative of real gold.

Remember, the target variable `label` indicates whether the sample is real gold (`1`) or fool's gold (`0`). Let's set up our features and target for the model.


In [18]:
# Define Features and Target Variable
# Possible features: "Hardness", "Density", "Conductivity", "Shininess", "Shape", "Texture"
features = ["Hardness", "density", "Conductivity", "Texture"]  
target = 'Label'  # 'Label' indicates if the sample is real gold (1) or fool's gold (0)


## Understanding One-Hot Encoding

Before we train our model, we need to prepare our data. Some of our features, like "shape" and "texture", are categorical, meaning they represent categories (e.g., 'round', 'square') rather than numbers. Most machine learning models work better with numerical data, so we use a technique called one-hot encoding to convert these categories into a format the model can understand. This process creates new columns, indicating the presence (1) or absence (0) of each category. Let's apply one-hot encoding to our dataset.


In [19]:
# One-Hot Encoding Function
def one_hot_encoding(df: pd.DataFrame) -> pd.DataFrame:
    cat_columns = df.select_dtypes(include=["object"]).columns  # Identify categorical columns
    for col in cat_columns:
        # For each categorical column, create dummy/indicator variables
        dummies = pd.get_dummies(df[col], prefix=col)
        # Drop the original categorical column from dataframe
        df = df.drop(col, axis=1)
        # Append the new one-hot encoded columns to the original dataframe
        df = pd.concat([df, dummies], axis=1)
    return df

# Now, we apply the one-hot encoding to our training and testing data
train_df_encoded = one_hot_encoding(train_df)
test_df_encoded = one_hot_encoding(test_df)


## Training Your Model

Now that our data is ready, it's time to choose and train a machine learning model. You've explored different types of classifiers in the Fool's Gold lesson. Based on what you've learned, you can decide which model to use. We'll leave a placeholder for the model you choose, and include examples of how to initialize other classifiers for comparison or further experimentation.


In [21]:
# Model Training
X_train = train_df_encoded.drop(target, axis=1)
y_train = train_df_encoded[target]
X_test = test_df_encoded.drop(target, axis=1)
y_test = test_df_encoded[target]


# For example, to use a DecisionTreeClassifier, you would write:
model = DecisionTreeClassifier()

# Other models you learned about:
# For K-Nearest Neighbors: model = KNeighborsClassifier()
# For Logistic Regression: model = LogisticRegression()

# Fit the model to the training data
model.fit(X_train, y_train)

## Evaluating Your Model's Performance

After training your model in the Fool's Gold lesson, it's crucial to understand how well it predicts. This step is all about assessing the accuracy of your model on the test data. Remember, accuracy is just one way to measure performance; depending on your project, other metrics might be important too.


In [22]:
# Model Evaluation
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(f"Model Accuracy: {np.round(accuracy, 3)}")

# For additional evaluation metrics, you might consider:
# from sklearn.metrics import confusion_matrix, classification_report
# print(confusion_matrix(y_test, predictions))
# print(classification_report(y_test, predictions))

Accuracy: 1.0
