<a href="https://colab.research.google.com/github/wandb/davis-contest/blob/main/colabs/starter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Instructions and Starter Code for Submitting Results in the DAVIS Contest

In [1]:
%%capture

!pip install wandb
!pip install --ignore-installed git+https://github.com/wandb/davis-contest.git#egg=contest
!pip install ptflops pytorch_lightning

In [2]:
import os 

import wandb

import contest
from contest.utils import clips, paths

## 0️⃣ Create a Weights & Biases account if you don't have one.

## 1️⃣ Download the training data from Weights & Biases

In [3]:
entity = "charlesfrye"
project = "davis"
mode = "train"
tag = "latest"

training_data_artifact_name = os.path.join(entity, project, f"davis2016-{mode}") + ":" + tag
training_data_artifact_name

'charlesfrye/davis/davis2016-train:latest'

In [4]:
with wandb.init(project=project, job_type="download") as run:
  training_data_artifact = run.use_artifact(training_data_artifact_name)
  training_data_dir = training_data_artifact.download()
  print("\ntraining data downloaded to " + training_data_dir)

[34m[1mwandb[0m: Currently logged in as: [33mcharlesfrye[0m (use `wandb login --relogin` to force relogin)


[34m[1mwandb[0m: Downloading large artifact davis2016-train:latest, 428.99MB. 6904 files... 


training data downloaded to ./artifacts/davis2016-train:v1


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

### Viewing the Dataset in Weights & Biases

Link to dsviz version, include screenshots.

## 2️⃣ Define and train a model on the data

### Splitting up the data

In [5]:
print(clips.split_on_clips.__doc__)

Splits a DataFrame of paths to images into two pieces,
  train and holdout, while respecting clip differences.
  See get_clips for information on how clip differences are defined.
  
  Parameters:
    paths_df: pd.DataFrame
      DataFrame whose columns are collections of paths to files.
    columns: None or iterable of strings
      If None, uses default of get_clips, otherwise checks for clips
      based on information in all of columns (see get_clip).
      Inferred clip identity must agree across columns for all rows.
    split: float
      Fraction of clips to put into train split. Non-integer totals are rounded down.
      
  Returns:
    train_split: pd.DataFrame
      DataFrame of paths for training set. Index is reset to integers.
    holdout_split: pd.DataFrame
      DataFrame of paths for holdout set. Index is reset to integers.
  


In [6]:
print(contest.torch.data.VidSegDataModule.__doc__)

From a pd.DataFrame of paths to training images and their annotations,
  and optionally another pd.DataFrame of paths to holdout images and their annotations,
  generates a pl.LightningDataModule suitable for training a torch.Module on the
  training images and validating it on the holdout images.
  
  If only a single pd.DataFrame is provided, that pd.DataFrame is split into two,
  with the fraction put into the training split given by the split argument.
  
  See the PyTorch Lightning docs for details on pl.LightningDataModule:
    https://pytorch-lightning.readthedocs.io/en/stable/datamodules.html?highlight=lightningdatamodule
    
  and so is not suitable for multi-GPU training.
  


In [7]:
print(contest.torch.data.VidSegDataset.__doc__)

From a pd.DataFrame of paths to image files and (optionally)
  to segmentation annotation images for those images,
  creates a simple subclass of torch.utils.data.Dataset suitable for use in
  a Video Segmentation task.
  


First, set up the validation split, at a clipwise level.

In [8]:
def log_holdout_split(data_artifact, train_split_df, holdout_split_df):
  log_datasplit_artifact(data_artifact, train_split_df, "train")
  log_datasplit_artifact(data_artifact, holdout_split_df, "holdout")


def log_datasplit_artifact(data_artifact, split_df, splitname, folder="wandb"):
  dataset_artifact = wandb.Artifact(name=f"davis2016-split-{splitname}", type="split-data")
  path = os.path.join(folder, splitname + ".json")
  split_df.to_json(path)
  dataset_artifact.add_file(path, "paths.json")

  wandb.run.log_artifact(dataset_artifact)

In [9]:
config = {"training_fraction": 0.8}

with wandb.init(project=project,
                job_type="split-data", config=config) as run:
  training_data_artifact = run.use_artifact(training_data_artifact_name)
  paths_df = paths.artifact_paths(training_data_artifact)

  training_paths_df, holdout_paths_df = clips.split_on_clips(paths_df)
  log_holdout_split(training_data_artifact,
                    training_paths_df,
                    holdout_paths_df)

[34m[1mwandb[0m: Currently logged in as: [33mcharlesfrye[0m (use `wandb login --relogin` to force relogin)


[34m[1mwandb[0m: Downloading large artifact davis2016-train:latest, 428.99MB. 6904 files... 

VBox(children=(Label(value=' 0.24MB of 0.31MB uploaded (0.00MB deduped)\r'), FloatProgress(value=0.79879907908…

### Model Code

In [4]:
import pytorch_lightning as pl
import torch
import torch.nn.functional as F

In [5]:
class DummyModel(pl.LightningModule):

  def __init__(self):
    super().__init__()
    self.conv = torch.nn.Conv2d(in_channels=3, out_channels=1, kernel_size=1)

  def forward(self, xs):
    return torch.sigmoid(self.conv(xs))

  def training_step(self, batch, batch_idx):
    loss = self.forward_on_batch(batch)
    return loss

  def validation_step(self, batch, batch_idx):
    loss = self.forward_on_batch(batch)
    return loss

  def forward_on_batch(self, batch):
    xs, ys = batch
    y_hats = self.forward(xs)
    loss = F.binary_cross_entropy(y_hats, ys)
    return loss

  def configure_optimizers(self):
    return torch.optim.SGD(self.parameters(), lr=0.1)

  def count_params(self):
    return sum(p.numel() for p in self.parameters())

For a more realistic model, see _this notebook_.

### Training Code

#### Training the model

In [6]:
model_artifact_name = "dummy-baseline"

In [11]:
config = {"batch_size": 32,
          "max_epochs": 1,
          "gpus": 1}

with wandb.init(project=project, config=config, job_type="train") as run:

  training_data_artifact = run.use_artifact(training_data_artifact_name)
  training_data_artifact.download()

  trainsplit_artifact = run.use_artifact("davis2016-split-train:latest")
  trainsplit_paths = paths.get_paths(trainsplit_artifact)

  holdoutsplit_artifact = run.use_artifact("davis2016-split-holdout:latest")
  holdoutsplit_paths = paths.get_paths(holdoutsplit_artifact)

  datamodule = contest.torch.data.VidSegDataModule(
      trainsplit_paths, holdoutsplit_paths,
      batch_size=wandb.config["batch_size"])
  datamodule.setup()

  model = DummyModel()
  wandb.config["nparams"] = contest.torch.profile.count_params(model)
  wandb.config["nflops"] = contest.torch.profile.count_flops(model, torch.cuda.device(0))
  
  logger = pl.loggers.wandb.WandbLogger(experiment=run)
  logger.watch(model, log_freq=2)

  trainer = pl.Trainer(
    gpus=wandb.config["gpus"], max_epochs=wandb.config["max_epochs"],
    logger=logger, log_every_n_steps=1) 
  
  trainer.fit(model, datamodule)

  model_artifact_name = contest.torch.utils.save_model_to_artifact(
    model, "wandb/final_model", model_artifact_name)

GPU available: True, used: True
TPU available: None, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


DummyModel(
  0.0 M, 100.000% Params, 0.002 GMac, 100.000% MACs, 
  (conv): Conv2d(0.0 M, 100.000% Params, 0.002 GMac, 100.000% MACs, 3, 1, kernel_size=(1, 1), stride=(1, 1))
)



  | Name | Type   | Params
--------------------------------
0 | conv | Conv2d | 4     
--------------------------------
4         Trainable params
0         Non-trainable params
4         Total params


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…



HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Training', layout=Layout(flex='2'), max…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…




VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=0.42520080321…

0,1
_step,0
_runtime,52
_timestamp,1611895272


0,1
_step,▁
_runtime,▁
_timestamp,▁


## 3️⃣ Run your model on the evaluation data

Once you've run your model on the evaluation data,
there's two steps to submission:

1. Log an "evaluation run" to W&B, using _this notebook_.
2. Submit the results to _the benchmark_.

Describe format of the results.

In [7]:
evaluation_artifact_name = os.path.join(entity, project, "davis2016-val" +":" + tag)

In [8]:
model_tag = "latest"

In [9]:
output_dir = os.path.join("outputs")
!rm -rf output_dir
!mkdir -p {output_dir}

In [10]:
result_artifact_name = model_artifact_name + "-result"

In [11]:
with wandb.init(project=project, job_type="run-val") as run:
  evaluation_data_artifact = run.use_artifact(evaluation_artifact_name)
  evaluation_data_paths = paths.artifact_paths(evaluation_data_artifact)

  evaluation_dataset = contest.torch.data.VidSegDataset(
    evaluation_data_paths, has_annotations=False)
  num_images = len(evaluation_dataset)

  evaluation_dataloader = torch.utils.data.DataLoader(
    evaluation_dataset, batch_size=1)

  model = contest.torch.utils.load_model_from_artifact(
    model_artifact_name + ":" + model_tag, DummyModel) 

  print("\n")
  device = torch.cuda.device(0)
  nparams = contest.torch.profile.count_params(model)
  nflops = contest.torch.profile.count_flops(model, device)

  wandb.log({"nparams": nparams, "nflops": nflops})

  output_paths = contest.torch.evaluate.run(
    model, evaluation_dataloader, num_images, output_dir)

  result_artifact = contest.evaluate.make_result_artifact(
    output_paths, result_artifact_name)
  run.log_artifact(result_artifact)

[34m[1mwandb[0m: Currently logged in as: [33mcharlesfrye[0m (use `wandb login --relogin` to force relogin)


[34m[1mwandb[0m: Downloading large artifact davis2016-val:latest, 428.93MB. 6904 files... 



DummyModel(
  0.0 M, 100.000% Params, 0.002 GMac, 100.000% MACs, 
  (conv): Conv2d(0.0 M, 100.000% Params, 0.002 GMac, 100.000% MACs, 3, 1, kernel_size=(1, 1), stride=(1, 1))
)


[34m[1mwandb[0m: Adding directory to artifact (./outputs)... Done. 0.7s


VBox(children=(Label(value=' 0.00MB of 79.95MB uploaded (0.00MB deduped)\r'), FloatProgress(value=6.8944009770…

0,1
nparams,4
nflops,818880
_step,0
_runtime,4
_timestamp,1611896043


0,1
nparams,▁
nflops,▁
_step,▁
_runtime,▁
_timestamp,▁


In [None]:
  ii = 0
  output_paths = pd.DataFrame([np.nan] * len(evaluation_dataset), columns=["output"])
  with torch.no_grad():
    for eval_batch in iter(evaluation_dataloader):
      outputs = model.forward(eval_batch)
      outputs = contest.torch.utils.to_numpy_int_arrays(outputs)

      for output in outputs:
        path = Path(contest.utils.image.save_from_array(output, output_dir, ii))
        output_paths["output"].iloc[ii] = str(path.relative_to(Path(output_dir).parent))
        ii += 1

In [None]:
 output_paths.to_json(output_paths_path)
  result_artifact = wandb.Artifact(name=result_artifact_name, type="result",
                                   metadata={"nparams": nparams, "nflops": nflops})
  result_artifact.add_dir(output_dir, "outputs")
  result_artifact.add_file(output_paths_path, "paths.json")

In [22]:
output_paths_path = os.path.join("wandb", "output_paths.json")

## 4️⃣ Submit your results to the leaderboard on Weights & Biases

Once you've run an evaluation job like the one above and produced a results Artifact,
you're almost ready to submit to the contest.

Head over to _this notebook_ for the last two steps.