# Evaluating Novel View Synthesis Models

In this notebook, we show how to obtain metrics for a given trained novel view synthesis model.

We evaluate the metrics as follows:
- We calculate `PSNR`, `SSIM`, and `LPIPS` metrics for all images separately. These metrics values are averaged over all images in a given scene.
- The `FID` metric is calculated over all predicted / ground-truth images in one scene in one sweep.
- To obtain the final metrics for the full dataset, we average the metrics over all scenes. Every scene is weighted with the same weight.


## Model rediction directory structure

We assume the rendered images from your model to have the same directory structure as the downloaded scenes.

If an image to be evaluated has the location

```bash
/path/to/wayve_scenes_101/<scene_name>/images/<camera>/<imname>.jpeg
```

We expect the location of the respective predicted image to be in 

```bash
/path/to/your/predictions/<scene_name>/images/<camera>/<imname>.jpeg
```

## Evaluation on full dataset

In the following, we show how to run our evaluation suite on the full WayveScenes101 dataset.

We generate nested dictionaries of metrics, breaking down the metrics per image, per scene, and on the full dataset. We generate separate metrics for:
- The train split of each scene (cameras: `left-forward`, `right-forward`, `left-backward`, `right-backward`)
- The test split (`front-forward` camera)
- All cameras

In [None]:
import json

from wayve_scenes.evaluation import evaluate_submission


# Replace the location of the directories with the location the dataset and your predictions on your machine
dir_target = '/path/to/wayve_scenes_101/'
dir_pred = '/path/to/model_predictions/'

# Perform the evaluation
metrics_dict_all, metrics_dict_train, metrics_dict_test = evaluate_submission(dir_pred, dir_target)

# Print the final metrics
print(json.dumps(metrics_dict_all, indent=4))

## Evaluation on specific scenes

With the WayveScenes101 scene metadata, we can also perform analysis of our model performance on specific subsets of scenes. 

In the example below, we obtain the metrics for the scene `scene_002`.

In [None]:
metrics_all, metrics_train, metrics_test = evaluate_submission(dir_pred, dir_target, scene_list=["scene_002"])

### Evaluation on scenes with specific properties

We can also use the scene metadata file to select a specific subset of scenes for evaluation. In the example below, we will obtain the metrics for nighttime scenes and compare it with the performance for daytime scenes.

> Note: In this example, we use the `scene_metadata.csv` to select scenes to evaluate. Thus, we assume that all dataset scenes are downloaded and extracted in the `wayve_scenes_101` directory.

In [None]:
import pandas as pd 

# Load the metadata dataframe
scenes_df = pd.read_csv("../data/scene_metadata.csv")

# get the scenes with a specific environmental condition
scenes_df_daytime = scenes_df[scenes_df["Time of Day"] == "Day"]
scenes_df_nighttime = scenes_df[scenes_df["Time of Day"] == "Night"]

metrics_daytime_all, metrics_daytime_train, metrics_daytime_test = evaluate_submission(dir_pred, dir_target, scene_list=scenes_df_daytime["scene_id"].values)
metrics_nighttime_all, metrics_nighttime_train, metrics_nighttime_test = evaluate_submission(dir_pred, dir_target, scene_list=scenes_df_daytime["scene_id"].values)

print("Daytime metrics test split: ")
print(json.dumps(metrics_daytime_test, indent=4))

print("Nighttime metrics test split: ")
print(json.dumps(metrics_nighttime_test, indent=4))