# Week 2 Project

Once again, we’ll be using the [Women's Ecommerce Clothing Reviews Dataset from Kaggle](https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews), which Kaggle states is a 

> dataset revolving around the reviews written by customers. Its nine supportive features offer a great environment to parse out the text through its multiple dimensions. Because this is real commercial data, it has been anonymized, and references to the company in the review text and body have been replaced with “retailer”. 

The machine learning task will be sentiment analysis, classifying each review as having positive or negative sentiment.

## Task 1: Training and Evaluating Sentiment Analysis Models Using Metaflow

In this task, you'll use Metaflow to build two machine learning models for sentiment analysis: a baseline *"majority class"* classifier and your own custom model. You'll then train both models in parallel and experiment with different hyperparameters to optimize their performance. Finally, you'll use this notebook and the Metaflow Client API to analyze the results of your different models and hyperparameters. Here's what you'll need to do:

### Step 1: Build the Workflows
The first step in this task is to build the workflow(s) for sentiment analysis using the Metaflow framework. Start by creating a new flow in Metaflow and implementing the baseline *"majority class"* classifier. Then, build your own custom classifier using techniques you learned in Week 1, or any [helpful resources](https://outerbounds.com/docs/nlp-tutorial-L2/) you'd like. For your custom model, be sure to include steps for data preprocessing, model training, and evaluation.

### Step 2: Train Both Models in Parallel
Once you've built your models, the next step is to train both models in parallel using the Metaflow framework. Use Metaflow to run both training jobs in parallel steps. If you get stuck, you may want to review the [FlowSpec branching documentation](https://docs.metaflow.org/metaflow/basics#branch).

### Step 3: Experiment with Hyperparameters
After you've trained both models in parallel, the next step is to experiment with different hyperparameters to optimize their performance. Try different values for hyperparameters such as learning rate, batch size, and number of epochs, and record the results for each combination of hyperparameters as Data Artifacts in Metaflow.

### Step 4: Analyze the Results
Finally, use this notebook and the Metaflow Client API to analyze the results of your different models and hyperparameters. Create visualizations to compare the performance of the two models and identify the best hyperparameters for each one.

By completing this task, you'll gain experience working with the Metaflow framework and learn how to build and optimize machine learning workflows for sentiment analysis.

In [2]:
from collections import Counter
import pandas as pd
import numpy as np 
from termcolor import colored
import matplotlib.pyplot as plt
import seaborn as sns
import string
from sklearn.model_selection import train_test_split

# You can style your plots here, but it is not part of the project.
YELLOW = '#FFBC00'
GREEN = '#37795D'
PURPLE = '#5460C0'
BACKGROUND = '#F4EBE6'
colors = [GREEN, PURPLE]
custom_params = {
    'axes.spines.right': False, 'axes.spines.top': False,
    'axes.facecolor':BACKGROUND, 'figure.facecolor': BACKGROUND, 
    'figure.figsize':(8, 8)
}
sns_palette = sns.color_palette(colors, len(colors))
sns.set_theme(style='ticks', rc=custom_params)

In [3]:
# TODO: load the data. 
df = pd.read_csv("/home/workspace/workspaces/full-stack-ml-metaflow-corise-week-2/data/Womens Clothing E-Commerce Reviews.csv", index_col=0)

# transformations
df.columns = ["_".join(name.lower().strip().split()) for name in df.columns]
df = df[~df.review_text.isna()]
df['review'] = df['review_text'].astype('str')
_has_review_df = df[df['review_text'] != 'nan']
reviews = _has_review_df['review_text']
labels = np.where(_has_review_df.rating >= 4, 1, 0)
df = pd.DataFrame({'label': labels, **_has_review_df})

# split into training and validation.
_df = pd.DataFrame({'review': reviews, 'label': labels})
traindf, valdf = train_test_split(_df, test_size=0.2)

In [7]:
df.head()

Unnamed: 0,label,clothing_id,age,title,review_text,rating,recommended_ind,positive_feedback_count,division_name,department_name,class_name,review
0,1,767,33,,Absolutely wonderful - silky and sexy and comf...,4,1,0,Initmates,Intimate,Intimates,Absolutely wonderful - silky and sexy and comf...
1,1,1080,34,,Love this dress! it's sooo pretty. i happene...,5,1,4,General,Dresses,Dresses,Love this dress! it's sooo pretty. i happene...
2,0,1077,60,Some major design flaws,I had such high hopes for this dress and reall...,3,0,0,General,Dresses,Dresses,I had such high hopes for this dress and reall...
3,1,1049,50,My favorite buy!,"I love, love, love this jumpsuit. it's fun, fl...",5,1,0,General Petite,Bottoms,Pants,"I love, love, love this jumpsuit. it's fun, fl..."
4,1,847,47,Flattering shirt,This shirt is very flattering to all due to th...,5,1,6,General,Tops,Blouses,This shirt is very flattering to all due to th...


In [8]:
from sklearn.dummy import DummyClassifier
from sklearn.metrics import accuracy_score as acc
from sklearn.metrics import roc_auc_score as rocauc

# TODO: build the majority class baseline model.
dummy_clf = DummyClassifier(strategy="most_frequent")
X_train, y_train = traindf['review'], traindf['label']
X_val, y_val = valdf['review'], valdf['label']
dummy_clf.fit(X_train, y_train)
# TODO: find the majority class in the labels. 🤔
preds = dummy_clf.predict(X_val)
scores = dummy_clf.predict_proba(X_val)
# TODO: score the model on valdf with a 2D metric space: sklearn.metrics.accuracy_score, sklearn.metrics.roc_auc_score
# Documentation on suggested model-scoring approach: https://scikit-learn.org/stable/modules/model_evaluation.html
base_acc = acc(y_val, preds)
base_rocauc = rocauc(y_val, scores[:,1])
print(f"Baseline model performance:\n{base_acc} Accuracy\n{base_rocauc} ROC AUC")

Baseline model performance:
0.7644071538971076 Accuracy
0.5 ROC AUC


In [9]:
# Cleanup
import gc
gc.collect()

0

In [11]:
%%writefile model.py
# TODO: modify this custom model to your liking. Check out this tutorial for more on this class: https://outerbounds.com/docs/nlp-tutorial-L2/
# TODO: train the model on traindf.
# TODO: score the model on valdf with _the same_ 2D metric space you used in previous cell.
# TODO: test your model works by importing the model module in notebook cells, and trying to fit traindf and score predictions on the valdf data!

import tensorflow as tf
from tensorflow.keras import layers, optimizers, regularizers
from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.metrics import accuracy_score, roc_auc_score
from sklearn.feature_extraction.text import CountVectorizer

class NbowModel():
    def __init__(self, vocab_sz):

        self.vocab_sz = vocab_sz

        # Instantiate the CountVectorizer
        self.cv = CountVectorizer(
            min_df=.005, max_df = .75, stop_words='english', 
            strip_accents='ascii', max_features=self.vocab_sz
        )

        # Define the keras model
        inputs = tf.keras.Input(shape=(self.vocab_sz,), 
                                name='input')
        x = layers.Dropout(0.10)(inputs)
        x = layers.Dense(
            15, activation="relu",
            kernel_regularizer=regularizers.L1L2(l1=1e-5, l2=1e-4)
        )(x)
        predictions = layers.Dense(1, activation="sigmoid",)(x)
        self.model = tf.keras.Model(inputs, predictions)
        opt = optimizers.Adam(learning_rate=0.002)
        self.model.compile(loss="binary_crossentropy", 
                           optimizer=opt, metrics=["accuracy"])

    def fit(self, X, y):
        print(X.shape)
        print(X[0])
        res = self.cv.fit_transform(X).toarray()
        self.model.fit(x=res, y=y, batch_size=32, 
                       epochs=10, validation_split=.2)
    
    def predict(self, X):
        print(X.shape)
        print(X[0])
        res = self.cv.transform(X).toarray()
        return self.model.predict(res)
    
    def eval_acc(self, X, labels, threshold=.5):
        return accuracy_score(labels, 
                              self.predict(X) > threshold)
    
    def eval_rocauc(self, X, labels):
        return roc_auc_score(labels,  self.predict(X))

    @property
    def model_dict(self): 
        return {'vectorizer':self.cv, 'model': self.model}

    @classmethod
    def from_dict(cls, model_dict):
        "Get Model from dictionary"
        nbow_model = cls(len(
            model_dict['vectorizer'].vocabulary_
        ))
        nbow_model.model = model_dict['model']
        nbow_model.cv = model_dict['vectorizer']
        return nbow_model

    def __str__(self):
        return "NBowModel"

Overwriting model.py


In [20]:
%%writefile baseline_challenge.py
# TODO: In this cell, write your BaselineChallenge flow in the baseline_challenge.py file.

from metaflow import FlowSpec, step, Flow, current, Parameter, IncludeFile, card, current
from metaflow.cards import Table, Markdown, Artifact, Image
import numpy as np 
from dataclasses import dataclass

labeling_function = ... # TODO: Define your labeling function here.

@dataclass
class ModelResult:
    "A custom struct for storing model evaluation results."
    name: None
    params: None
    pathspec: None
    acc: None
    rocauc: None

class BaselineChallenge(FlowSpec):

    split_size = Parameter('split-sz', default=0.2)
    data = IncludeFile('data', default='Womens Clothing E-Commerce Reviews.csv')
    kfold = Parameter('k', default=5)
    scoring = Parameter('scoring', default='accuracy')

    @step
    def start(self):

        import pandas as pd
        import io 
        from sklearn.model_selection import train_test_split
        
        # load dataset packaged with the flow.
        # this technique is convenient when working with small datasets that need to move to remove tasks.
        df = self.data 
        # TODO: load the data.
        df = pd.read_csv(io.StringIO(self.data)) 
        # Look up a few lines to the IncludeFile('data', default='Womens Clothing E-Commerce Reviews.csv'). 
        # You can find documentation on IncludeFile here: https://docs.metaflow.org/scaling/data#data-in-local-files


        # filter down to reviews and labels 
        df.columns = ["_".join(name.lower().strip().split()) for name in df.columns]
        df = df[~df.review_text.isna()]
        df['review'] = df['review_text'].astype('str')
        _has_review_df = df[df['review_text'] != 'nan']
        reviews = _has_review_df['review_text']
        labels = np.where(_has_review_df.rating >= 4, 1, 0)
        self.df = pd.DataFrame({'label': labels, **_has_review_df})

        # split the data 80/20, or by using the flow's split-sz CLI argument
        _df = pd.DataFrame({'review': reviews, 'label': labels})
        self.traindf, self.valdf = train_test_split(_df, test_size=self.split_size)
        print(f'num of rows in train set: {self.traindf.shape[0]}')
        print(f'num of rows in validation set: {self.valdf.shape[0]}')

        self.next(self.baseline, self.model)

    @step
    def baseline(self):
        "Compute the baseline"
        from sklearn.dummy import DummyClassifier       
        from sklearn.metrics import accuracy_score as acc
        from sklearn.metrics import roc_auc_score as rocauc
        
        self._name = "baseline"
        params = "Predict majority class"
        pathspec = f"{current.flow_name}/{current.run_id}/{current.step_name}/{current.task_id}"
        dummy_clf = DummyClassifier(strategy="most_frequent")
        X_train, y_train = self.traindf['review'], self.traindf['label']
        X_val, y_val = self.valdf['review'], self.valdf['label']
        dummy_clf.fit(X_train, y_train)
        # TODO: find the majority class in the labels. 🤔
        # TODO: score the model on valdf with a 2D metric space: sklearn.metrics.accuracy_score, sklearn.metrics.roc_auc_score
        # Documentation on suggested model-scoring approach: https://scikit-learn.org/stable/modules/model_evaluation.html
        self.predictions = dummy_clf.predict(X_val)
        self.scores = dummy_clf.predict_proba(X_val)
        self.baseline_accuracy = acc(y_val, self.predictions)
        self.baseline_roc_auc = rocauc(y_val, self.scores[:,1])
        print(f"Baseline model performance:\n{self.baseline_accuracy} Accuracy\n{self.baseline_roc_auc} ROC AUC")
        self.result = ModelResult("Baseline", params, pathspec, self.baseline_accuracy, self.baseline_roc_auc)
        self.next(self.aggregate)

    @step
    def model(self):

        # TODO: import your model if it is defined in another file.
        from model import NbowModel
        self._name = str(NbowModel)
        # NOTE: If you followed the link above to find a custom model implementation, 
            # you will have noticed your model's vocab_sz hyperparameter.
            # Too big of vocab_sz causes an error. Can you explain why? 
        self.hyperparam_set = [{'vocab_sz': 100}, {'vocab_sz': 300}, {'vocab_sz': 500}]  
        pathspec = f"{current.flow_name}/{current.run_id}/{current.step_name}/{current.task_id}"

        self.results = []
        for params in self.hyperparam_set:
            model = NbowModel(**params) # TODO: instantiate your custom model here!
            model.fit(X=self.df['review'], y=self.df['label'])
            acc = model.eval_acc(self.valdf["review"],self.valdf["label"]) # TODO: evaluate your custom model in an equivalent way to accuracy_score.
            rocauc = model.eval_rocauc(self.valdf["review"],self.valdf["label"]) # TODO: evaluate your custom model in an equivalent way to roc_auc_score.
            self.results.append(ModelResult(f"NbowModel - vocab_sz: {params['vocab_sz']}", params, pathspec, acc, rocauc))

        self.next(self.aggregate)

    @step
    def aggregate(self, inputs):
        self.next(self.end)

    @step
    def end(self):
        pass

if __name__ == '__main__':
    BaselineChallenge()

Overwriting baseline_challenge.py


In [21]:
! python baseline_challenge.py run --data "/home/workspace/workspaces/full-stack-ml-metaflow-corise-week-2/data/Womens Clothing E-Commerce Reviews.csv"

[35m[1mMetaflow 2.8.4+ob(v1)[0m[35m[22m executing [0m[31m[1mBaselineChallenge[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:sandbox[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mValidating your flow...[K[0m[35m[22m[0m
[32m[1m    The graph looks good![K[0m[32m[1m[0m
[35m[22mRunning pylint...[K[0m[35m[22m[0m
[32m[1m    Pylint is happy![K[0m[32m[1m[0m
[22mCreating local datastore in current directory (/home/workspace/workspaces/full-stack-ml-metaflow-corise-week-2/project/.metaflow)[K[0m[22m[0m
[22mIncluding file /home/workspace/workspaces/full-stack-ml-metaflow-corise-week-2/data/Womens Clothing E-Commerce Reviews.csv of size 8MB [K[0m[22m[0m
[35m2023-05-07 12:38:12.421 [0m[1mWorkflow starting (run-id 123), see it in the UI at https://ui-pw-363809672.outerbounds.dev/BaselineChallenge/123[0m
[35m2023-05-07 12:38:12.595 [0m[32m[123/start/717 (pid 20648)] [0m[1mTask is starting.[0m
[35m2023-05-07 12:38:15.313 [0m[32m[123/start/717 

## Task 2: Anticipating Failure in Your Machine Learning Project

In this task, you'll practice anticipating potential failure modes in a sentiment analysis classifier and develop strategies to mitigate them. Here's what you'll need to do:
### Step 1: Identify Potential Failure Modes
The first step in anticipating failure in your machine learning project is to identify potential failure modes. Start by brainstorming ways in which your project could fail from an engineering point of view. For example, your model could overfit to the training data or suffer from data bias.

- Data Bias: The dataset could be biased towards a particular group or sentiment, leading to a model that is not representative of the broader population.
- Overfitting: The model could perform well on the training data but poorly on new data due to overfitting.
- Outliers: The presence of outliers in the data could skew the model's predictions.
- Insufficient Data: There may not be enough data to build a robust model.
- Feature Engineering: The model's performance could be limited by poor feature engineering.
- Inappropriate Model Selection: Choosing the wrong type of model for the problem could lead to poor performance.
- Lack of domain knowledge: Lack of knowledge in the domain the model is built for can lead to poor model performance.


### Step 2: Develop Strategies to Mitigate Failure Modes


Once you've identified potential failure modes, the next step is to develop strategies to mitigate them. Think about what measures you could take to fix the issue if it were to occur. For example, if your model is overfitting to the training data, you could try regularization techniques such as L1 or L2 regularization to reduce the complexity of the model.

- Data bias: Collect a diverse set of training data that includes samples from various demographics and backgrounds to avoid bias in the model.
- Overfitting: Use regularization techniques such as L1 or L2 regularization to reduce the complexity of the model and prevent overfitting to the training data.
- Lack of data: Use data augmentation techniques such as word embeddings or synthetic data generation to increase the amount of available training data.
- Outliers: Use robust statistical techniques such as median absolute deviation or interquartile range to identify and remove outliers from the dataset before training the model.
- Model complexity: Experiment with different model architectures and hyperparameters to find the simplest model that still achieves good performance on the validation set.

### Step 3: Plan Ahead to Avoid Failure Modes

Finally, it's important to plan ahead to avoid potential failure modes in the first place. Think about what you could have done initially to avoid these failure modes. For example, you could have collected a diverse set of training data to reduce data bias or experimented with different model architectures to find the best solution for your problem.

The key to anticipating failure in your machine learning project is to be proactive rather than reactive. By identifying potential failure modes ahead of time and developing strategies to mitigate them, you'll be better equipped to build a successful machine learning project.

- Data Collection: Collect a diverse set of training data to reduce data bias. Ensure that the data is representative of the real-world scenarios and covers a wide range of use cases.
- Feature Engineering: Use feature engineering techniques to transform the raw data into meaningful features that capture the important aspects of the problem. This can help in reducing the complexity of the model and prevent overfitting.
- Model Selection: Experiment with different model architectures to find the best solution for your problem. Try out different algorithms and models to determine which one works best for your dataset.
- Hyperparameter Tuning: Tune the hyperparameters of your model to optimize its performance. Use techniques like cross-validation to ensure that your model is generalizing well and not overfitting to the training data.
- Regularization Techniques: Use regularization techniques such as L1 or L2 regularization to reduce the complexity of the model and prevent overfitting.

## Task 3: Visualizing ML Results with MF Cards
Now it is time to iterate. Extend the flow in your `baseline_challenge.py` file to include a step that aggregates all of the results from hyperparameter tuning jobs, and logs results and a data visualiation in a Metaflow card.

In [28]:
%%writefile baseline_challenge.py
# TODO: In this cell, write your BaselineChallenge flow in the baseline_challenge.py file.

from metaflow import FlowSpec, step, Flow, current, Parameter, IncludeFile, card, current
from metaflow.cards import Table, Markdown, Artifact, Image
import numpy as np 
from dataclasses import dataclass

labeling_function = ... # TODO: Define your labeling function here.

@dataclass
class ModelResult:
    "A custom struct for storing model evaluation results."
    name: None
    params: None
    pathspec: None
    acc: None
    rocauc: None

class BaselineChallenge(FlowSpec):

    split_size = Parameter('split-sz', default=0.2)
    data = IncludeFile('data', default='Womens Clothing E-Commerce Reviews.csv')
    kfold = Parameter('k', default=5)
    scoring = Parameter('scoring', default='accuracy')

    @step
    def start(self):

        import pandas as pd
        import io 
        from sklearn.model_selection import train_test_split
        
        # load dataset packaged with the flow.
        # this technique is convenient when working with small datasets that need to move to remove tasks.
        df = self.data 
        # TODO: load the data.
        df = pd.read_csv(io.StringIO(self.data)) 
        # Look up a few lines to the IncludeFile('data', default='Womens Clothing E-Commerce Reviews.csv'). 
        # You can find documentation on IncludeFile here: https://docs.metaflow.org/scaling/data#data-in-local-files


        # filter down to reviews and labels 
        df.columns = ["_".join(name.lower().strip().split()) for name in df.columns]
        df = df[~df.review_text.isna()]
        df['review'] = df['review_text'].astype('str')
        _has_review_df = df[df['review_text'] != 'nan']
        reviews = _has_review_df['review_text']
        labels = np.where(_has_review_df.rating >= 4, 1, 0)
        self.df = pd.DataFrame({'label': labels, **_has_review_df})

        # split the data 80/20, or by using the flow's split-sz CLI argument
        _df = pd.DataFrame({'review': reviews, 'label': labels})
        self.traindf, self.valdf = train_test_split(_df, test_size=self.split_size)
        print(f'num of rows in train set: {self.traindf.shape[0]}')
        print(f'num of rows in validation set: {self.valdf.shape[0]}')

        self.next(self.baseline, self.model)

    @step
    def baseline(self):
        "Compute the baseline"
        from sklearn.dummy import DummyClassifier       
        from sklearn.metrics import accuracy_score as acc
        from sklearn.metrics import roc_auc_score as rocauc
        
        self._name = "baseline"
        params = "Predict majority class"
        pathspec = f"{current.flow_name}/{current.run_id}/{current.step_name}/{current.task_id}"
        dummy_clf = DummyClassifier(strategy="most_frequent")
        X_train, y_train = self.traindf['review'], self.traindf['label']
        X_val, y_val = self.valdf['review'], self.valdf['label']
        dummy_clf.fit(X_train, y_train)
        # TODO: find the majority class in the labels. 🤔
        # TODO: score the model on valdf with a 2D metric space: sklearn.metrics.accuracy_score, sklearn.metrics.roc_auc_score
        # Documentation on suggested model-scoring approach: https://scikit-learn.org/stable/modules/model_evaluation.html
        self.predictions = dummy_clf.predict(X_val)
        self.scores = dummy_clf.predict_proba(X_val)
        self.baseline_accuracy = acc(y_val, self.predictions)
        self.baseline_roc_auc = rocauc(y_val, self.scores[:,1])
        print(f"Baseline model performance:\n{self.baseline_accuracy} Accuracy\n{self.baseline_roc_auc} ROC AUC")
        self.result = ModelResult("Baseline", params, pathspec, self.baseline_accuracy, self.baseline_roc_auc)
        self.next(self.aggregate)

    @step
    def model(self):

        # TODO: import your model if it is defined in another file.
        from model import NbowModel
        self._name = "model"
        # NOTE: If you followed the link above to find a custom model implementation, 
            # you will have noticed your model's vocab_sz hyperparameter.
            # Too big of vocab_sz causes an error. Can you explain why? 
        self.hyperparam_set = [{'vocab_sz': 100}, {'vocab_sz': 300}, {'vocab_sz': 500}]  
        pathspec = f"{current.flow_name}/{current.run_id}/{current.step_name}/{current.task_id}"

        self.results = []
        for params in self.hyperparam_set:
            model = NbowModel(**params) # TODO: instantiate your custom model here!
            model.fit(X=self.df['review'], y=self.df['label'])
            acc = model.eval_acc(self.valdf["review"],self.valdf["label"]) # TODO: evaluate your custom model in an equivalent way to accuracy_score.
            rocauc = model.eval_rocauc(self.valdf["review"],self.valdf["label"]) # TODO: evaluate your custom model in an equivalent way to roc_auc_score.
            self.results.append(ModelResult(f"NbowModel - vocab_sz: {params['vocab_sz']}", params, pathspec, acc, rocauc))

        self.next(self.aggregate)

    def add_one(self, rows, result, df):
        "A helper function to load results."
        rows.append([
            Markdown(result.name),
            Artifact(result.params),
            Artifact(result.pathspec),
            Artifact(result.acc),
            Artifact(result.rocauc)
        ])
        df['name'].append(result.name)
        df['accuracy'].append(result.acc)
        return rows, df

    @card(type="corise") # TODO: Set your card type to "corise". 
            # I wonder what other card types there are?
            # https://docs.metaflow.org/metaflow/visualizing-results
            # https://github.com/outerbounds/metaflow-card-altair/blob/main/altairflow.py
    @step
    def aggregate(self, inputs):

        import seaborn as sns
        import matplotlib.pyplot as plt
        from matplotlib import rcParams 
        rcParams.update({'figure.autolayout': True})

        rows = []
        violin_plot_df = {'name': [], 'accuracy': []}
        for task in inputs:
            if task._name == "model": 
                for result in task.results:
                    print(result)
                    rows, violin_plot_df = self.add_one(rows, result, violin_plot_df)
            elif task._name == "baseline":
                print(task.result)
                rows, violin_plot_df = self.add_one(rows, task.result, violin_plot_df)
            else:
                raise ValueError("Unknown task._name type. Cannot parse results.")
            
        current.card.append(Markdown("# All models from this flow run"))

        # TODO: Add a Table of the results to your card! 
        current.card.append(
            Table(
                rows, # TODO: What goes here to populate the Table in the card? 
                headers=["Model name", "Params", "Task pathspec", "Accuracy", "ROCAUC"]
            )
        )
        
        fig, ax = plt.subplots(1,1)
        plt.xticks(rotation=40)
        sns.violinplot(data=violin_plot_df, x="name", y="accuracy", ax=ax)
        
        # TODO: Append the matplotlib fig to the card
        # Docs: https://docs.metaflow.org/metaflow/visualizing-results/easy-custom-reports-with-card-components#showing-plots
        current.card.append(Image.from_matplotlib(fig))
        self.next(self.end)

    @step
    def end(self):
        pass

if __name__ == '__main__':
    BaselineChallenge()

Overwriting baseline_challenge.py


In [29]:
! python baseline_challenge.py run --data "/home/workspace/workspaces/full-stack-ml-metaflow-corise-week-2/data/Womens Clothing E-Commerce Reviews.csv" 

[35m[1mMetaflow 2.8.4+ob(v1)[0m[35m[22m executing [0m[31m[1mBaselineChallenge[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:sandbox[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mValidating your flow...[K[0m[35m[22m[0m
[32m[1m    The graph looks good![K[0m[32m[1m[0m
[35m[22mRunning pylint...[K[0m[35m[22m[0m
[32m[1m    Pylint is happy![K[0m[32m[1m[0m
[22mIncluding file /home/workspace/workspaces/full-stack-ml-metaflow-corise-week-2/data/Womens Clothing E-Commerce Reviews.csv of size 8MB [K[0m[22m[0m
[35m2023-05-07 12:46:52.874 [0m[1mWorkflow starting (run-id 126), see it in the UI at https://ui-pw-363809672.outerbounds.dev/BaselineChallenge/126[0m
[35m2023-05-07 12:46:53.052 [0m[32m[126/start/733 (pid 22275)] [0m[1mTask is starting.[0m
[35m2023-05-07 12:46:55.778 [0m[32m[126/start/733 (pid 22275)] [0m[22mnum of rows in train set: 18112[0m
[35m2023-05-07 12:46:58.437 [0m[32m[126/start/733 (pid 22275)] [0m[22mnum of rows in val