![header](https://i.imgur.com/sAPM7Yy.png)

# Instructions and Starter Code for the DAVIS Contest - PyTorch

This notebook demonstrates how to structure your code
and results for the DAVIS contest
by means of an end-to-end example using
the
[PyTorch](https://pytorch.org/docs/stable/index.html)
and
[PyTorch Lightning](https://pytorch-lightning.readthedocs.io/en/latest/)
deep learning frameworks.
See [this colab notebook](http://wandb.me/davis-starter-keras)
for the same in Tensorflow/Keras.

You should feel free to make use of the code here and in
[the contest repo](https://github.com/wandb/davis-contest)
(installed via `pip` below and imported as `contest`)
to build your data engineering and model training pipelines,
but that's not strictly necessary to compete in the contest.
All that you need to do is produce your results
in an appropriately-formatted
Weights & Biases [Artifact](https://docs.wandb.ai/artifacts),
as described below,
and follow the instructions in the
[submission notebook](http://wandb.me/davis-submit).

In [None]:
%%capture

!pip install git+https://github.com/wandb/davis-contest.git#egg=contest[torch]

In [None]:
import os 

import wandb

import contest  # utilities for working with contest data
from contest.utils import clips, paths

## 0️⃣ Create a Weights & Biases account if you don't have one.

[Weights & Biases](https://wandb.ai/site)
is a developer toolkit for machine learning --
kind of like GitHub, but specialized
to the particular problems that come up in machine learning.

We'll be using it throughout the contest
to organize datasets,
track models during training,
and evaluate model performance for submission.

Run the cell below to either log in to Weights & Biases
or create a new account.
If you're participating in the contest,
make sure to sign up under your company email address.

In [None]:
wandb.login()

## 1️⃣ Download the training data

First, we need to download the training data
onto the machine we're using.
This same code will work on Google Cola and on your own machine.

The data is stored as a Weights & Biases
[Artifact](https://docs.wandb.ai/artifacts).
The Artifacts system allows you
to track the large binary files that are inputs to
and outputs of machine learning experiments.
Think of Artifacts like GitHub repositories,
but for data and models instead of code!

Your final submission in the contest
will be in the form of an Artifact.
Check out [this video tutorial](http://wandb.me/artifacts-video)
to learn more about how to use Artifacts,
or read the docs [here](https://docs.wandb.ai/artifacts/).

In [None]:
# picking out the training data artifact by name

entity = "charlesfrye"  # artifacts are associated with an entity -- s user or team
project = "davis"  # artifacts are associated with a project -- a collection of ML experiments
split = "train"  # the train and val data are both stored in the same format
tag = "latest"  # different versions of an Artifact have different tags

training_data_artifact_id = os.path.join(entity, project, f"davis2016-{split}") + ":" + tag
training_data_artifact_id

Calling `run.use_artifact` and then `.download()`
during a script downloads the Artifact and its files to a local directory,
if they aren't already present.

This cell contains the minimal code you need to get the training data.
Below, we'll see how to integrate Artifacts into your pipeline more fully,
so that you can, e.g., track which inputs a model was trained on.

In [None]:
with wandb.init(project=project, job_type="download") as run:
  training_data_artifact = run.use_artifact(training_data_artifact_id)
  training_data_dir = training_data_artifact.download()
  print("\ntraining data downloaded to " + training_data_dir)
  !ls {training_data_dir}

### Dataset format and exploration

You can view the training data
in the format used by all of the datasets,
including the test set
and submitted results,
[here](http://wandb.me/davis-train-data).
A short description of that format follows.

Every artifact you use or make for the contest
should have, at the top-level directory,
a file called `paths.json`,
which contains information on the paths to data files in the artifact.

These files are intended to be read as
[pandas `DataFrames`](https://pandas.pydata.org/).
The resulting columns will possibly include
- `"raw"`, for the input image files
- `"annotation"`, for the ground truth segmentation masks, as PNG files, and
- `"output"`, for model predictions. These will only be present for results saved as Artifacts.

Note that the test set, when provided,
will not have an `"annotation"` column,
so make sure your model can run on datasets that don't have that column and only have `"raw"` images!

![data-artifact-format](https://i.imgur.com/WQIXC0O.png)

The prefixes of paths are arbitrary and may have differing depths
(the examples below have three directories,
but other datasets may have a different number).

However, every path will have, at the end, two elements:
`{clip_name}/{12345}.jpg`
where
- `clip_name` is a string identifying the video clip to which the image belongs
and
- `12345` is a five-digit, [zero-filled](https://docs.python.org/3/library/stdtypes.html#str.zfill) number indicating the frame index of the image.

The columns are assumed to be indexed by integers,
and these integers are used
to match `"raw"` and `"annotation"` in the starter code and
to match `"output"` and `"annotation"`
in the submission evaluation code.

See below for an example.

![paths-content](https://i.imgur.com/Bh7EKte.png)

For convenience, the data has also been packaged up into a
Weights & Biases Dataset Visualizaton Table [here](http://wandb.me/davis-train-table).
This format, pictured below, is convenient for exploring the data
and getting to know it better.

You can read more about DSviz Tables [here](https://docs.wandb.ai/datasets-and-predictions).

![data-table-format](https://i.imgur.com/mliFzqc.png)

## 2️⃣ Set up your data pipeline

Now that the data is downloaded to the filesystem,
we need to define a method for getting the data onto the GPU
and into the model.

This is much more complicated for big datasets,
like this one, that can't fit inside the GPU
comfortably alongside our model.

In the [GitHub repo for this contest](https://github.com/wandb/davis-contest),
we provide tools for loading data from disk using the
[PyTorch Lightning library](https://pytorch-lightning.readthedocs.io/en/stable/),
which helps organize and optimize complex PyTorch pipelines.

For more on using PyTorch Lightning with Weights & Biases,
check out
[this tutorial video](http://wandb.me/lit-video)
and [colab notebook](http://wandb.me/lit-colab)
or read the [W&B docs](https://docs.wandb.ai/integrations/lightning)
or the [PyTorch Lightning docs](https://pytorch-lightning.readthedocs.io/en/latest/api/pytorch_lightning.loggers.wandb.html).

In [None]:
print(contest.torch.data.VidSegDataModule.__doc__)

In [None]:
print(contest.torch.data.VidSegDataset.__doc__)

These tools load images without regard to which video they come from,
and so it's difficult if not impossible to build a model that
can make use of information over time,
which is very useful for this task.

One easy win over this baseline would be to rewrite this data-loading code
to load clips and then construct a model architecture that makes use of temporal sequence information.

### Splitting up the data

The small size of this dataset,
relative to the difficulty of the task,
increases the danger of over-fitting.

To help track this during training,
we'll split off some data into a holdout set
and track our performance on that data.

But we can't just randomly subsample specific frames,
the way holdout sets are constructed in image datasets.
That's because certain frames come from the same video,
or _clip_, and holding out, say,
every third frame from each clip
doesn't prevent over-fitting nearly as effectively
as holding out a third of the clips.

To make working with clips easier,
we provide utilities for splitting datasets
at the level of clips.
Use these tools as a blueprint
for setting up your own tools that are "clip-aware" --
for example, to build a model that makes use of temporal information.

In [None]:
print(clips.split_on_clips.__doc__)

The code below will create a random split into training and holdout validation data,
at the level of clips, and then log the result
to a Weights & Biases artifact.
Notice the addition of a `paths.json` file to the artifact,
so that it matches the format of other artifacts.

`log_datasplit_artifact` demonstrates two steps needed to register an artifact on Weights & Biases:
1. `add_file`s or `add_dir`s to the artifact to build it, and then
2. upload the artifact to W&B servers using `run.log_artifact`.

See the [documentation](https://docs.wandb.ai/artifacts/api)
or the [tutorial video](http://wandb.me/artifacts-video) for more details on Artifacts.

In [None]:
def log_holdout_split(data_artifact, train_split_df, holdout_split_df):
  log_datasplit_artifact(data_artifact, train_split_df, "train")
  log_datasplit_artifact(data_artifact, holdout_split_df, "holdout")


def log_datasplit_artifact(data_artifact, split_df, splitname, folder="wandb"):
  dataset_artifact = wandb.Artifact(name=f"davis2016-split-{splitname}", type="split-data")
  path = os.path.join(folder, splitname + ".json")
  split_df.to_json(path)
  # all artifacts in the contest need a paths.json file
  dataset_artifact.add_file(path, "paths.json")

  wandb.run.log_artifact(dataset_artifact)

In [None]:
config = {"training_fraction": 0.8}

with wandb.init(project=project,
                job_type="split-data", config=config) as run:
  training_data_artifact = run.use_artifact(training_data_artifact_id)
  paths_df = paths.artifact_paths(training_data_artifact)

  training_paths_df, holdout_paths_df = clips.split_on_clips(paths_df)
  log_holdout_split(training_data_artifact,
                    training_paths_df,
                    holdout_paths_df)

Notice also that this code makes use of the `training_data_artifact` with `run.use_artifact`.

Logging where data came from
(while simultaneously downloading it if need be!)
makes it easier to understand and reproduce your work later,
track down bugs or identify the cause of model regressions,
and otherwise understand how the data influenced your model.

For example, if you check the Artifacts tab
on the run page for this run on Weights & Biases
(see the auto-generated link produced when you run the cell below;
the Artifacts tab is accessed by clicking the icon
that looks like three hockey pucks in a stack),
you can see which artifacts were used during the run
and which were produced by it.

![artifact-io](https://i.imgur.com/Q7HzzF4.png)

This information is collated into a graph, as pictured below,
that can be used to survey the entire pipeline of your project all at once.
Use the Explode button to track individual runs and artifacts.

This graph is
accessible via the Graph View tab on an individual artifact's page
(see [here](http://wandb.me/davis-artifacts-graph-eg) for an example).

The Graph View is also covered in the [video tutorial for Artifacts](http://wandb.me/artifacts-video).

![artifacts-dag](https://i.imgur.com/F5sQIjz.png)

## 3️⃣ Define a model and train it

Now that the data pipeline is set up,
we can define a model that consumes the data
and learns the task.

It will need to take in images of arbitrary shape
and then return outputs of the same shape,
with values between 0 and 1,
with high values corresponding to
pixels that are more likely to be a part of the segmentation mask.

This notebook demonstrates the absolute simplest model
that can be applied to this data:
a spatial convolution that looks at only one pixel at a time.
This is fed into a `sigmoid` nonlinearity
so that the output values are normalized.

### Model Code

In [None]:
import pytorch_lightning as pl
import torch
import torch.nn.functional as F

Four of the methods defined for this model -- `__init__`, `forward`, `training_step`, and `configure_optimizers` --
are the minimum required to define a PyTorch
[LightningModule](https://pytorch-lightning.readthedocs.io/en/stable/lightning_module.html/),
which augments a typical PyTorch module with
some extra hooks and callbacks that enable
flexible but organized training loops.

In [None]:
class DummyModel(pl.LightningModule):

  def __init__(self):
    super().__init__()
    self.conv = torch.nn.Conv2d(in_channels=3, out_channels=1, kernel_size=1)

  def forward(self, xs):
    return torch.sigmoid(self.conv(xs))

  def training_step(self, batch, batch_idx):
    loss = self.forward_on_batch(batch)
    return loss

  def validation_step(self, batch, batch_idx):
    loss = self.forward_on_batch(batch)
    return loss

  def forward_on_batch(self, batch):
    xs, ys = batch
    y_hats = self.forward(xs)
    loss = F.binary_cross_entropy(y_hats, ys)
    return loss

  def configure_optimizers(self):
    return torch.optim.SGD(self.parameters(), lr=0.0)

### Training Code

The cell below uses a PyTorch Lightning
[Trainer](https://pytorch-lightning.readthedocs.io/en/stable/trainer.html)
to train the model.
Trainers handle plumbing tasks like
ensuring the data and the model are on the same device, e.g. the GPU,
turning on and off DropOut and BatchNorm as needed,
and coordinating the data module and the model.

We also use the Weights & Biases integration with PyTorch Lightning,
[`WandbLogger`](https://pytorch-lightning.readthedocs.io/en/stable/generated/pytorch_lightning.loggers.WandbLogger.html),
to track training and log the run to W&B.
Head to the run page
(the link appears once you run the cell)
to watch this information come in live
or review it afterwards --
system metrics, gradient information,
and loss metrics all get logged without any extra effort.

For more on using Weights & Biases with PyTorch Lightning,
check out [this video tutorial](http://wandb.me/lit-video).

Note that the model is saved as an artifact as well.
This makes it easier to run the model on evaluation data,
e.g. on a different machine,
and will make it easier to share your model with the judges
at the end of the contest.

In [None]:
model_artifact_name = "dummy-baseline"

config = {"batch_size": 32,
          "max_epochs": 1,
          "gpus": 1}

with wandb.init(project=project, config=config, job_type="train") as run:

  training_data_artifact = run.use_artifact(training_data_artifact_id)
  training_data_artifact.download()

  trainsplit_artifact = run.use_artifact("davis2016-split-train:latest")
  trainsplit_paths = paths.get_paths(trainsplit_artifact)

  holdoutsplit_artifact = run.use_artifact("davis2016-split-holdout:latest")
  holdoutsplit_paths = paths.get_paths(holdoutsplit_artifact)

  datamodule = contest.torch.data.VidSegDataModule(
      trainsplit_paths, holdoutsplit_paths,
      batch_size=wandb.config["batch_size"])
  datamodule.setup()

  model = DummyModel()
  wandb.config["nparams"] = contest.torch.profile.count_params(model)
  wandb.config["nflops"] = contest.torch.profile.count_flops(model, torch.cuda.device(0))
  
  logger = pl.loggers.wandb.WandbLogger(experiment=run)
  logger.watch(model, log_freq=2)

  trainer = pl.Trainer(
    gpus=wandb.config["gpus"], max_epochs=wandb.config["max_epochs"],
    logger=logger, log_every_n_steps=1)
  
  trainer.fit(model, datamodule)

  model_artifact_id = contest.torch.utils.save_model_to_artifact(
    model, "wandb/final_model", model_artifact_name)

Two things to point out here:
1. Hyperparameters for the run are stored in the `config` dictionary at the top,
which is passed to `wandb.init`. That way, the hyperparameter values are logged to W&B. These hyperparameters are then accessed using the `wandb.config` attribute. That way, you can be sure the logged values are the same as the values being used.
2. Added to the `wandb.config` later, we have the `n`umber of `param`eters in the model, calculated using the `torch.profile` tools provided for the contest. While you don't need to track this during training, this information _must_ be included with your submission (as described in the next section) and be underneath the limits in the contest description,
or else the submission is invalid.
If your model's parameters cannot be counted with the methods we provide, you're responsible for ensuring they are counted correctly.

## 4️⃣ Run your model on the evaluation data

As the contest runs,
you can submit your performance on the validation data
to be included on a public leaderboard.

Final standings will be determined based on performance on a test set,
not on this validation set.
The test set will be released,
without labels,
in the last 72 hours of the contest
(the "testing phase").
During that time,
participants will submit their model's results
to be ranked on a private leaderboard.

It's a well-known phenomenon that the best performers on validation data
are not always the best performers on new test data,
even in restricted settings like Kaggle competitions.
The difficulty of the task and the heterogeneity of video data
make this especially likely for this contest,
as is common in production machine learning.

In order to provide a framework-independent format for results
that can be used for both validation and test data,
the submission generation process has been split into two steps:
1. Execute the model on the evaluation data, logging a "result" artifact and run to Weights & Biases with a specific structure, described below.
2. Submit the results to a Weights & Biases benchmark

During the training phase,
use [this notebook](http://wandb.me/davis-submit)
to submit to the benchmark.
Once the testing phase opens, follow the provided instructions.

Note that the validation data contains labels,
but the test data will not!
Take care to write your result generation code
so that it will run even if no annotations are provided.

We provide starter code for both PyTorch and Keras,
using the simple dataloaders provided for the training loop above.
If you change the data pipeline,
you may need to write your own code here.
You should aim to still produce a `pd.Series` of `output_paths`
that can be passed to `contest.evaluate.make_result_artifact`
so you can make sure your results are in the right format.

In [None]:
evaluation_artifact_name = os.path.join(entity, project, "davis2016-val" +":" + tag)

model_tag = "latest"

result_artifact_name = model_artifact_name + "-result"

output_dir = os.path.join("outputs")
!rm -rf output_dir
!mkdir -p {output_dir}

In [None]:
with wandb.init(project=project, job_type="run-val") as run:
  evaluation_data_artifact = run.use_artifact(evaluation_artifact_name)
  evaluation_data_paths = paths.artifact_paths(evaluation_data_artifact)

  evaluation_dataset = contest.torch.data.VidSegDataset(
    evaluation_data_paths, has_annotations=False)
  num_images = len(evaluation_dataset)

  evaluation_dataloader = torch.utils.data.DataLoader(
    evaluation_dataset, batch_size=1)

  model = contest.torch.utils.load_model_from_artifact(
    model_artifact_name + ":" + model_tag, DummyModel) 

  print("\n")
  device = torch.cuda.device(0)
  nparams = contest.torch.profile.count_params(model)
  nflops = contest.torch.profile.count_flops(model, device)

  # the number of parameters in the model must be logged
  profiling_info = {"nparams": nparams, "nflops": nflops}
  wandb.log(profiling_info)

  output_paths = contest.torch.evaluate.run(
    model, evaluation_dataloader, num_images, output_dir)

  # the number of parameters in the model should also be included in the result artifact
  result_artifact = contest.evaluate.make_result_artifact(
    output_paths, result_artifact_name, metadata=profiling_info)
  run.log_artifact(result_artifact)

Check out the page for the result artifact associated with this run
(link appears above after executing the cell)
in order to see an example of a formatted result.

A result artifact looks much like a dataset artifact
-- it has a `paths.json` file,
along with files that are pointed to by the contents of that file --
but it need only contain a single key: `"output"`.
The output files are black and white PNG files with unsigned 8-bit pixel values between 0 and 255 that represent the model's confidence that a given pixel
in the image is part of the segmentation mask.

Models are required to obey a parameter count constraint,
and the parameter count information must be reported as part of the result.
If your result does not have the parameter count both logged
with the run and associated with the artifact,
it will be declared invalid.
See the discussion above in the model training section or
[the GitHub repository for the contest](https://github.com/wandb/davis-contest)
for more.

## 5️⃣ Submit your results to the leaderboard on Weights & Biases

Once you've run an evaluation job like the one above and produced a results artifact,
you're almost ready to submit to the contest.

Head over to [this notebook](http://wandb.me/davis-submit) for the last two steps.