# An Introduction to Experiment Tracking with Weights & Biases

<!--- @wandbcode{mlops-zoomcamp} -->

### Setup Dependencies

You can install all the dependencies individually

```shell
pip install pandas matplotlib scikit-learn pyarrow
pip install wandb
```

Or, install them in one go...

```
pip install -r requiements.txt
```

### Import Libraries

In [None]:
import wandb

import pickle

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score, mean_squared_error

### Initialize a Weights & Biases Run

At the beginning of our script or notebook, calling `wandb.init()` generates a background process to sync and log data as a W&B Run.

In [None]:
wandb.init(project="mlops-zoomcamp-wandb", name="experiment-1")

### Load the Iris Dataset

This data sets consists of 3 different types of irises’ (Setosa, Versicolour, and Virginica) petal and sepal length, stored in a 150x4 `numpy.ndarray`. In order to know more about this dataset, you can checkout the [official docs for `sklearn`](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html).

In [None]:
X, y = load_iris(return_X_y=True)
label_names = ["Setosa", "Versicolour", "Virginica"]

### Training Model and Experiment Tracking 

Define model configs or other hyperparameters using `wandb.config`.

In [None]:
# Log your model configs to Weights & Biases
params = {"C": 0.1, "random_state": 42}
wandb.config = params

Define and train a Logistic Regression model

In [None]:
model = LogisticRegression(**params).fit(X, y)
y_pred = model.predict(X)
y_probas = model.predict_proba(X)

Log your metrics to Weights & Biases using `wandb.log`.

In [None]:
wandb.log({
    "accuracy": accuracy_score(y, y_pred),
    "mean_squared_error": mean_squared_error(y, y_pred)
})

### Visualize and Compare Plots using Weights & Biases

The [**ROC curves**](https://docs.wandb.ai/guides/integrations/scikit#roc) plot true positive rate (y-axis) vs false positive rate (x-axis). The ideal score is a `TPR = 1` and `FPR = 0`, which is the point on the top left. Typically we calculate the area under the ROC curve (AUC-ROC), and the greater the AUC-ROC the better.

In [None]:
wandb.sklearn.plot_roc(y, y_probas, labels=label_names)

The [**precision-recall**](https://docs.wandb.ai/guides/integrations/scikit#precision-recall-curve) curve computes the tradeoff between precision and recall for different thresholds. A high area under the curve represents both high recall and high precision, where high precision relates to a low false positive rate, and high recall relates to a low false negative rate. High scores for both show that the classifier is returning accurate results (high precision), as well as returning a majority of all positive results (high recall). PR curve is useful when the classes are very imbalanced.

In [None]:
wandb.sklearn.plot_precision_recall(y, y_probas, labels=label_names)

The [**confusion matrix**](https://docs.wandb.ai/guides/integrations/scikit#confusion-matrix) computes the confusion matrix to evaluate the accuracy of a classifier. It's useful for assessing the quality of model predictions and finding patterns in the predictions the model gets wrong. The diagonal represents the predictions the model got right, i.e. where the actual label is equal to the predicted label.

In [None]:
wandb.sklearn.plot_confusion_matrix(y, y_pred, labels=label_names)

In order to know more about the different functionalities available as part of the Scikit-Learn integration with Weights & Biases, you can check the [official docs](https://docs.wandb.ai/guides/integrations/scikit).

### Logging Model to Weights & Biases

[Weights & Biases Artifacts](https://docs.wandb.ai/guides/artifacts) to track datasets, models, dependencies, and results through each step of your machine learning pipeline. Artifacts make it easy to get a complete and auditable history of changes to your files.

In [None]:
# Save your model
with open("logistic_regression.pkl", "wb") as f:
    pickle.dump(model, f)

# Log your model as a versioned file to Weights & Biases Artifact
artifact = wandb.Artifact(f"iris-logistic-regression-model", type="model")
artifact.add_file("logistic_regression.pkl")
wandb.log_artifact(artifact)

### Finish the Experiment

In [None]:
wandb.finish()