## MLOPS ZoomCamp - Week 2
- [Questions](https://github.com/DataTalksClub/mlops-zoomcamp/blob/main/cohorts/2023/02-experiment-tracking/wandb.md)

### Q1. Install the Package

In [1]:
!wandb --version

wandb, version 0.15.3


### Q2. Download and preprocess the data

In [2]:
data_url = "https://d37ci6vzurychx.cloudfront.net/trip-data/green_tripdata_2022-"
jan, feb, mar = "01.parquet", "02.parquet", "03.parquet"
out_dir = "../data/hw2-data-wb/green_tripdata_2022-"

for file in [jan, feb, mar]:
    !wget {data_url}{file} -O {out_dir}{file} -q


In [5]:
# configure WandB API Key
import wandb
from getpass import getpass

WANDB_API_KEY = getpass()
wandb.login(key=WANDB_API_KEY)

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: W&B API key is configured. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: C:\Users\uditm/.netrc


True

In [6]:
!python hw-wandb/preprocess_data.py \
    --wandb_project "mlops-zc-wb" \
    --wandb_entity "uditmanav17" \
    --raw_data_path "../data/hw2-data-wb/" \
    --dest_path ./output

wandb: Currently logged in as: uditmanav17. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.3
wandb: Run data is saved locally in d:\Programming\Git\mlops\mlops-zoomcamp\02-experiment-tracking\wandb\run-20230602_120903-6chgnn4x
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run frosty-wave-1
wandb:  View project at https://wandb.ai/uditmanav17/mlops-zc-wb
wandb:  View run at https://wandb.ai/uditmanav17/mlops-zc-wb/runs/6chgnn4x
wandb: Adding directory to artifact (.\output)... Done. 0.0s


In [9]:
!ls -lah ./output | grep dv.pkl

-rw-r--r-- 1 uditm 197609 151K Jun  2 12:09 dv.pkl


### Q3. Train a model with Weights & Biases logging

In [10]:
# check updated script
!cat -n hw-wandb/train.py | sed '52,62!d'

    52	    mse = mean_squared_error(y_val, y_pred, squared=False)
    53	    # TODO: Log `mse` to Weights & Biases under the key `"MSE"`
    54	    wandb.log({"MSE": mse})
    55	
    56	    with open("regressor.pkl", "wb") as f:
    57	        pickle.dump(rf, f)
    58	
    59	    # TODO: Log `regressor.pkl` as an artifact of type `model`
    60	    artifact_new = wandb.Artifact(name="reg-model", type="model")
    61	    artifact_new.add_file("regressor.pkl")
    62	    wandb.log_artifact(artifact_new)


In [11]:
!python hw-wandb/train.py \
  --wandb_project "mlops-zc-wb" \
  --wandb_entity "uditmanav17" \
  --data_artifact "uditmanav17/mlops-zc-wb/NYC-Taxi:v0"

wandb: Currently logged in as: uditmanav17. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.3
wandb: Run data is saved locally in d:\Programming\Git\mlops\mlops-zoomcamp\02-experiment-tracking\wandb\run-20230602_131837-rb851p7k
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run comfy-field-2
wandb:  View project at https://wandb.ai/uditmanav17/mlops-zc-wb
wandb:  View run at https://wandb.ai/uditmanav17/mlops-zc-wb/runs/rb851p7k
wandb:   4 of 4 files downloaded.  


In [15]:
api = wandb.Api()
run = api.run("uditmanav17/mlops-zc-wb/rb851p7k")
run.config

{'max_depth': 10, 'random_state': 0}

### Q4. Tune model hyperparameters

In [16]:
# check updated script
!cat -n hw-wandb/sweep.py | sed '29,35!d'

    29	    rf = RandomForestRegressor(
    30	        max_depth=config.max_depth,
    31	        n_estimators=config.n_estimators,
    32	        min_samples_split=config.min_samples_split,
    33	        min_samples_leaf=config.min_samples_leaf,
    34	        random_state=0,
    35	    )


In [18]:
!python hw-wandb/sweep.py \
  --wandb_project "mlops-zc-wb" \
  --wandb_entity "uditmanav17" \
  --data_artifact "uditmanav17/mlops-zc-wb/NYC-Taxi:v0"

Create sweep with ID: nfnn0efu
Sweep URL: https://wandb.ai/uditmanav17/mlops-zc-wb/sweeps/nfnn0efu


wandb: Agent Starting Run: t8tzw8gc with config:
wandb: 	max_depth: 4
wandb: 	min_samples_leaf: 3
wandb: 	min_samples_split: 3
wandb: 	n_estimators: 45
wandb: Currently logged in as: uditmanav17. Use `wandb login --relogin` to force relogin
wandb: - Waiting for wandb.init()...
wandb: \ Waiting for wandb.init()...
wandb: Tracking run with wandb version 0.15.3
wandb: Run data is saved locally in d:\Programming\Git\mlops\mlops-zoomcamp\02-experiment-tracking\wandb\run-20230602_134948-t8tzw8gc
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run electric-sweep-1
wandb:  View project at https://wandb.ai/uditmanav17/mlops-zc-wb
wandb:  View sweep at https://wandb.ai/uditmanav17/mlops-zc-wb/sweeps/nfnn0efu
wandb:  View run at https://wandb.ai/uditmanav17/mlops-zc-wb/runs/t8tzw8gc
wandb:   4 of 4 files downloaded.  
wandb: Waiting for W&B process to finish... (success).
wandb: 
wandb: Run history:
wandb: MSE ▁
wandb: 
wandb: Run summary:
wandb: MSE 2.4641
wandb: 
wandb:  View run

### Q5. Link the best model to the model registry

![Alt text](wandb.png)