<a href="https://colab.research.google.com/github/yasu-k2/multimodal-active-inference/blob/main/sound_spaces.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SoundSpaces

[web site](https://soundspaces.org/)

[main repo](https://github.com/facebookresearch/sound-spaces)


## Description

- Tasks
  - PointGoal
  - AudioGoal
  - AudioPointGoal

- [Challenge](https://github.com/facebookresearch/soundspaces-challenge)
  - AudioNav Task
  - Metric is ['Success weighted by Path Length' (SPL)](https://eval.ai/web/challenges/challenge-page/1621/evaluation)

- Datasets
  - **[Replica-Dataset (Replica Dataset v1)](https://github.com/facebookresearch/Replica-Dataset)**
    - 18 scenes
      - **apartment 0-2**
      - office 0-4
      - room 0-2
      - hotel 0
      - FRL apartment 0-5
    - ReplicaSDK
      - ReplicaViewer
      - ReplicaRenderer
    - Smaller in size
    - download script [available](https://raw.githubusercontent.com/facebookresearch/Replica-Dataset/main/download.sh)
  - [Matterport3D](https://niessner.github.io/Matterport/)
    - 90 scenes
    - Used for challenge
    - Need to request access
  - cf. Keep dataset size < 100GB for Colab.

- Data
  - audio renderings (room impulse responses; RIRs), 867GB
    - Replica
      - full binaural, 81GB
    - Matterport
      - full binaural, 682GB
      - full ambisonic, 3.6T
  - metadata of each scene, 1MB
  - episode datasets, 77MB -> 115MB
  - mono sound files, 13MB -> 640MB
  - pretrained weights, 303MB

- Baselines
  - `av-nav` Audio-Visual Navigation (AV-Nav) Model
  - `av-wan` Audio-Visual Waypoints (AV-WaN) Model
  - `savi` Semantic Audio-Visual Navigation (SAVi) Model


## Installation

The entire process took around 1.5 hours in colab.

### habitat-sim (v0.1.7)

- simulator for embodied AI
- requires Python>=3.7.
- latest: v0.2.1

```
!conda create -n habitat python=3.7 cmake=3.14.0
!conda activate habitat
# Installation for a machine without an attached display
!conda install habitat-sim=0.1.7 withbullet headless -c conda-forge -c aihabitat
```

Test habitat-sim installation (options incl. --enable_physics, --save_png)
```
!python habitat-sim/examples/example.py --scene /data/scene_datasets/habitat-test-scenes/skokloster-castle.glb
```

### habitat-lab (v0.1.7)

- embodied AI tasks and agents
- `Env`, `Dataset`, `Episode`, `Task`, `Sensor`, `Observation`
- requires Python>=3.7. Python 3.7 preferred.
- latest: v0.2.1

```
!git clone https://github.com/facebookresearch/habitat-lab.git --branch v0.1.7
!cd habitat-lab
# Install only core of Habitat Lab
!pip install -e .
# Include habitat_baselines (PPO, SLAM, utilities)
!pip install -r requirements.txt
!python setup.py develop --all
```

Test habitat-lab installation
```
!python habitat-lab/examples/example.py
```

### Helper script and settings

Installation script based on [`conda_install.sh`](https://github.com/facebookresearch/habitat-sim/blob/main/examples/colab_utils/colab_install.sh) from official repo and settings from official examples for [habitat-sim](https://github.com/facebookresearch/habitat-sim/blob/main/examples/tutorials/colabs/) and [habitat-lab](https://github.com/facebookresearch/habitat-lab/blob/main/examples/tutorials/colabs/).

In [None]:
!curl -L https://raw.githubusercontent.com/yasu-k2/multimodal-active-inference/main/colab_install_habitat.sh | bash -s

In [None]:
%cd /content/habitat-sim

In [None]:
# !wget -c http://dl.fbaipublicfiles.com/habitat/habitat-test-scenes.zip && unzip -o habitat-test-scenes.zip
## !wget -c http://dl.fbaipublicfiles.com/habitat/objects_v0.2.zip && unzip -o objects_v0.2.zip -d data/objects/
## !wget -c http://dl.fbaipublicfiles.com/habitat/locobot_merged_v0.2.zip && unzip -o locobot_merged_v0.2.zip -d data/objects

In [None]:
#!rm habitat-test-scenes.zip
## !rm objects_v0.2.zip
## !rm locobot_merged_v0.2.zip

In [None]:
# !python examples/example.py --scene data/scene_datasets/habitat-test-scenes/skokloster-castle.glb

In [None]:
%cd /content/habitat-lab

In [None]:
# Some errors with habitat_baselines
# !python setup.py test

In [None]:
# !python examples/example.py

In [None]:
# !python examples/benchmark.py

In [None]:
%cd /content/habitat-sim

```bash
# !pip uninstall --yes pyopenssl
# !pip install pyopenssl
```

```python
# reload the cffi version
# import sys
# if "google.colab" in sys.modules:
#     import importlib
#     import cffi
#     importlib.reload(cffi)
```

```python
import math
import os
import random
import sys

import git
import imageio
import magnum as mn
import numpy as np
%matplotlib inline
from matplotlib import pyplot as plt
from PIL import Image

# You need to restart runtime before importing habitat
import habitat
import habitat_sim

try:
    import ipywidgets as widgets
    from IPython.display import display as ipydisplay
    # For using jupyter/ipywidget IO components
    HAS_WIDGETS = True
except ImportError:
    HAS_WIDGETS = False

if "google.colab" in sys.modules:
    os.environ["IMAGEIO_FFMPEG_EXE"] = "/usr/bin/ffmpeg"

repo = git.Repo(".", search_parent_directories=True)
dir_path = repo.working_tree_dir
%cd $dir_path

data_path = os.path.join(dir_path, "data")
output_directory = "output/"  ## Based on your preference
output_path = os.path.join(dir_path, output_directory)
if not os.path.exists(output_path):
    os.mkdir(output_path)

# define some globals the first time we run.
if "sim" not in globals():
    global sim
    sim = None
    global obj_attr_mgr
    obj_attr_mgr = None
    global prim_attr_mgr
    obj_attr_mgr = None
    global stage_attr_mgr
    stage_attr_mgr = None
    global rigid_obj_mgr
    rigid_obj_mgr = None
```

## Install SoundSpaces

In [None]:
%cd /content

In [None]:
!git clone https://github.com/facebookresearch/sound-spaces.git

In [None]:
%cd sound-spaces

In [None]:
!pip install -e .

## Download dataset

In [None]:
!mkdir data

In [None]:
%cd data

In [None]:
# !wget http://dl.fbaipublicfiles.com/SoundSpaces/binaural_rirs.tar && tar xvf binaural_rirs.tar
!wget http://dl.fbaipublicfiles.com/SoundSpaces/metadata.tar.xz && tar xvf metadata.tar.xz
!wget http://dl.fbaipublicfiles.com/SoundSpaces/sounds.tar.xz && tar xvf sounds.tar.xz
!wget http://dl.fbaipublicfiles.com/SoundSpaces/datasets.tar.xz && tar xvf datasets.tar.xz
!wget http://dl.fbaipublicfiles.com/SoundSpaces/pretrained_weights.tar.xz && tar xvf pretrained_weights.tar.xz

In [None]:
# !rm binaural_rirs.tar
!rm metadata.tar.xz
!rm sounds.tar.xz
!rm datasets.tar.xz
!rm pretrained_weights.tar.xz

In [None]:
# Replica-Dataset
!apt-get install pigz

In [None]:
# replica_v1_0.tar.gz.partaa ~ .partap 1.86GB, .partaq 1.73GB -> 17 files (31.5GB) in total, takes about 45min to download
# -> 43GB after extraction
!curl -L https://raw.githubusercontent.com/yasu-k2/multimodal-active-inference/main/download_replica.sh | bash -s 

In [None]:
!rm replica_v1_0.tar.gz.parta*

In [None]:
%cd data

In [None]:
!rm -r room_0 room_1 room_2
!rm -r office_0 office_1 office_2 office_3 office_4
!rm -r hotel_0
!rm -r frl_apartment_0 frl_apartment_1 frl_apartment_2 frl_apartment_3 frl_apartment_4 frl_apartment_5

In [None]:
%cd ..

In [None]:
%cd metadata/replica/

In [None]:
!rm -r room_0 room_1 room_2
!rm -r office_0 office_1 office_2 office_3 office_4
!rm -r hotel_0
!rm -r frl_apartment_0 frl_apartment_1 frl_apartment_2 frl_apartment_3 frl_apartment_4 frl_apartment_5

In [None]:
%cd ../..

In [None]:
# Matterport3D

In [None]:
%cd metadata/mp3d/

In [None]:
!rm -r *

In [None]:
%cd ../..

In [None]:
# Organize relevant files > FROM HERE
%cd /content/sound-spaces/data

In [None]:
!rm -r datasets/audionav/mp3d/
!rm -r datasets/semantic_audionav/mp3d/
!rm -r metadata/mp3d/
!rm -r pretrained_weights/audionav/av_nav/mp3d/
!rm -r pretrained_weights/audionav/av_wan/mp3d/
# !rm -r pretrained_weights/semantic_audionav/
# !rm -r sounds/semantic_splits/

In [None]:
!rm -r datasets/audionav/replica/v1

In [None]:
!du -sh

In [None]:
%cd /content

In [None]:
!git clone https://github.com/yasu-k2/multimodal-active-inference.git
# %cd multimodal-active-inference/
# !git pull origin main
# %cd ..

In [None]:
!cp -R multimodal-active-inference/datasets/audionav/replica/v1 sound-spaces/data/datasets/audionav/replica/

In [None]:
%cd /content/sound-spaces/data
# TO HERE < Organize relevant files

In [None]:
%cd ..

In [None]:
!pwd

```bash
# Download full RIRs
!python scripts/download_data.py --dataset mp3d --rir-type binaural_rirs
!python scripts/download_data.py --dataset replica --rir-type binaural_rirs
```

In [None]:
import os
from scripts.download_data import download_and_uncompress

output_dir = 'data'
dataset = 'replica'  # 'mp3d', 'replica'
rir_type = 'binaural_rirs'  # 'binaural_rirs', 'ambisonic_rirs'

dataset_rir_dir = os.path.join(output_dir, rir_type, dataset)
aws_root_dir = 'http://dl.fbaipublicfiles.com/SoundSpaces/'
# Select subset of available scenes
scenes = os.listdir(os.path.join('data/metadata/', dataset))
print(scenes)

In [None]:
# scenes = ['apartment_0', 'apartment_1', 'apartment_2']
for scene in scenes:
  scene_file = os.path.join(aws_root_dir, rir_type, dataset, scene + '.tar.gz')
  if os.path.exists(os.path.join(dataset_rir_dir, scene)):
    continue
  else:
    download_and_uncompress(scene_file, output_dir)

```python
from scripts.cache_observations import main
# Iterate over scenes in metadata dir and cache observations
#   default config path is 'ss_baselines/av_nav/config/audionav/{}/train_telephone/pointgoal_rgb.yaml'.format(dataset)
#   config.TASK_CONFIG.SIMULATOR.AGENT_0.SENSORS = ["RGB_SENSOR", "DEPTH_SENSOR"]
#   config.TASK_CONFIG.SIMULATOR.USE_RENDERED_OBSERVATIONS = False
print('Caching Replica observations ...')
main('replica')
print('Caching Matterport3D observations ...')
main('mp3d')
```

In [None]:
!mkdir data/scene_datasets
!mv data/data data/scene_datasets/replica

In [None]:
!sed -i -e "/.*Matterport3D.*/d" scripts/cache_observations.py
!sed -i -e "/.*mp3d.*/d" scripts/cache_observations.py

In [None]:
# Cache observations
!python scripts/cache_observations.py --config-path ss_baselines/av_nav/config/audionav/replica/train_telephone/pointgoal_rgb.yaml
!python scripts/cache_observations.py --config-path ss_baselines/av_nav/config/audionav/replica/val_telephone/pointgoal_rgb.yaml
!python scripts/cache_observations.py --config-path ss_baselines/av_nav/config/audionav/replica/test_telephone/pointgoal_rgb.yaml

## Test SoundSpaces

1. Training

```bash
!python ss_baselines/av_nav/run.py \
  --exp-config ss_baselines/av_nav/config/audionav/replica/train_telephone/audiogoal_depth.yaml \
  --model-dir data/models/replica/audiogoal_depth
````

2. Validation

```bash
# EDIT ckpt.XXX.pth
!python ss_baselines/av_nav/run.py \
  --run-type eval \
  --exp-config ss_baselines/av_nav/config/audionav/replica/val_telephone/audiogoal_depth.yaml \
  --model-dir data/models/replica/audiogoal_depth
```

3. Test the best validation checkpoint based on validation curve

```bash
# EDIT ckpt.XXX.pth
!python ss_baselines/av_nav/run.py \
  --run-type eval \
  --exp-config ss_baselines/av_nav/config/audionav/replica/test_telephone/audiogoal_depth.yaml \
  --model-dir data/models/replica/audiogoal_depth \
  EVAL_CKPT_PATH_DIR data/models/replica/audiogoal_depth/data/ckpt.XXX.pth
```

4. Generate demo video

```bash
# EDIT ckpt.XXX.pth
!python ss_baselines/av_nav/run.py \
  --run-type eval \
  --exp-config ss_baselines/av_nav/config/audionav/replica/test_telephone/audiogoal_depth.yaml \
  --model-dir data/models/replica/audiogoal_depth \
  EVAL_CKPT_PATH_DIR data/models/replica/audiogoal_depth/data/ckpt.XXX.pth \
  VIDEO_OPTION [\"disk\"] \
  TASK_CONFIG.SIMULATOR.USE_RENDERED_OBSERVATIONS False \
  TASK_CONFIG.TASK.SENSORS [\"POINTGOAL_WITH_GPS_COMPASS_SENSOR\",\"SPECTROGRAM_SENSOR\",\"AUDIOGOAL_SENSOR\"] \
  SENSORS [\"RGB_SENSOR\",\"DEPTH_SENSOR\"] \
  EXTRA_RGB True \
  TASK_CONFIG.SIMULATOR.CONTINUOUS_VIEW_CHANGE True \
  DISPLAY_RESOLUTION 512 \
  TEST_EPISODE_COUNT 1
```

5. Evaluating the pretrained model

```bash
!python ss_baselines/av_nav/run.py \
  --run-type eval \
  --exp-config ss_baselines/av_nav/config/audionav/replica/test_telephone/audiogoal_depth.yaml \
  EVAL_CKPT_PATH_DIR data/pretrained_weights/audionav/av_nav/replica/heard.pth
!python ss_baselines/av_nav/run.py \
  --run-type eval \
  --exp-config ss_baselines/av_nav/config/audionav/replica/test_telephone/audiogoal_depth.yaml \
  EVAL_CKPT_PATH_DIR data/pretrained_weights/audionav/av_nav/replica/unheard.pth \
  EVAL.SPLIT test_multiple_unheard
```

6. Interactive demo

```bash
!python scripts/interactive_demo.py
```

In [None]:
!sed -i -e "s/.*'apartment_0'.*/REPLICA_SCENES = ['apartment_0', 'apartment_1', 'apartment_2']/g" ss_baselines/common/env_utils.py
!sed -i -e "/.*'frl_apartment_3'.*/d" ss_baselines/common/env_utils.py
!sed -i -e "/.*'office_3'.*/d" ss_baselines/common/env_utils.py

!sed -i -e "s/CONTENT_SCENES:.*]/CONTENT_SCENES: ['apartment_0', 'apartment_1', 'apartment_2']/" configs/audionav/av_nav/replica/audiogoal.yaml

!sed -i -e "s/NUM_PROCESSES.*/NUM_PROCESSES: 1/g" ss_baselines/av_nav/config/audionav/replica/train_telephone/audiogoal_depth.yaml

In [None]:
!python ss_baselines/av_nav/run.py --exp-config ss_baselines/av_nav/config/audionav/replica/train_telephone/audiogoal_depth.yaml --model-dir data/models/replica/audiogoal_depth

In [None]:
# !python ss_baselines/av_nav/run.py --run-type eval --exp-config ss_baselines/av_nav/config/audionav/replica/val_telephone/audiogoal_depth.yaml --model-dir data/models/replica/audiogoal_depth

In [None]:
## !python ss_baselines/av_nav/run.py --run-type eval --exp-config ss_baselines/av_nav/config/audionav/replica/test_telephone/audiogoal_depth.yaml --model-dir data/models/replica/audiogoal_depth --eval-best
# !python ss_baselines/av_nav/run.py --run-type eval --exp-config ss_baselines/av_nav/config/audionav/replica/test_telephone/audiogoal_depth.yaml --model-dir data/models/replica/audiogoal_depth EVAL_CKPT_PATH_DIR data/models/replica/audiogoal_depth/data/ckpt.0.pth

In [None]:
## !python ss_baselines/av_nav/run.py --run-type eval --exp-config ss_baselines/av_nav/config/audionav/replica/test_telephone/audiogoal_depth.yaml --model-dir data/models/replica/audiogoal_depth --eval-best VIDEO_OPTION [\"disk\"] TASK_CONFIG.SIMULATOR.USE_RENDERED_OBSERVATIONS False TASK_CONFIG.TASK.SENSORS [\"POINTGOAL_WITH_GPS_COMPASS_SENSOR\",\"SPECTROGRAM_SENSOR\",\"AUDIOGOAL_SENSOR\"] SENSORS [\"RGB_SENSOR\",\"DEPTH_SENSOR\"] EXTRA_RGB True TASK_CONFIG.SIMULATOR.CONTINUOUS_VIEW_CHANGE True DISPLAY_RESOLUTION 512 TEST_EPISODE_COUNT 1
# !python ss_baselines/av_nav/run.py --run-type eval --exp-config ss_baselines/av_nav/config/audionav/replica/test_telephone/audiogoal_depth.yaml --model-dir data/models/replica/audiogoal_depth EVAL_CKPT_PATH_DIR data/models/replica/audiogoal_depth/data/ckpt.0.pth VIDEO_OPTION [\"disk\"] TASK_CONFIG.SIMULATOR.USE_RENDERED_OBSERVATIONS False TASK_CONFIG.TASK.SENSORS [\"POINTGOAL_WITH_GPS_COMPASS_SENSOR\",\"SPECTROGRAM_SENSOR\",\"AUDIOGOAL_SENSOR\"] SENSORS [\"RGB_SENSOR\",\"DEPTH_SENSOR\"] EXTRA_RGB True TASK_CONFIG.SIMULATOR.CONTINUOUS_VIEW_CHANGE True DISPLAY_RESOLUTION 512 TEST_EPISODE_COUNT 1

In [None]:
!python ss_baselines/av_nav/run.py --run-type eval --exp-config ss_baselines/av_nav/config/audionav/replica/val_telephone/audiogoal_depth.yaml EVAL_CKPT_PATH_DIR data/pretrained_weights/audionav/av_nav/replica/heard.pth
# !python ss_baselines/av_nav/run.py --run-type eval --exp-config ss_baselines/av_nav/config/audionav/replica/test_telephone/audiogoal_depth.yaml EVAL_CKPT_PATH_DIR data/pretrained_weights/audionav/av_nav/replica/heard.pth
## !python ss_baselines/av_nav/run.py --run-type eval --exp-config ss_baselines/av_nav/config/audionav/replica/test_telephone/audiogoal_depth.yaml EVAL_CKPT_PATH_DIR data/pretrained_weights/audionav/av_nav/replica/unheard.pth EVAL.SPLIT test_multiple_unheard

## Implementing a new agent

1. Simple agent

You can use the `RandomAgent()` in [`simple_agents.py`](https://github.com/facebookresearch/sound-spaces/blob/main/ss_baselines/common/simple_agents.py) included in `ss_baselines` as an example of implementing a relatively simple agent.It shows that you only need to implement an agent class inheriting `habitat.Agent` with your original `__init__()`, `reset()`, `is_goal_reached()`, and `act()` method inside the script. This script takes arguments of `task_config`(default `configs/tasks/pointnav.yaml`. I think this is supposed to be something like [`configs/audionav/av_nav/replica/pointgoal.yaml`](https://github.com/facebookresearch/sound-spaces/blob/main/configs/audionav/av_nav/replica/pointgoal.yaml)) to construct the task and `success_distance`(default `0.2`) and `agent_class`(default `RandomAgent`) for instantiating your agent, so specify according to your setup.

The evaluation happens in the `evaluate()` method of `Benchmark()` class defined in [`benchmark.py`](https://github.com/facebookresearch/sound-spaces/blob/main/ss_baselines/common/benchmark.py). The code corresponding to one episode is [here](https://github.com/facebookresearch/sound-spaces/blob/f11fef81db0c6b05d42fd062faa4929195de4ddf/ss_baselines/common/benchmark.py#L80-L98).

2. Sophisticated agent

If you opt for sophisticated agents, you can refer to the structure of the baseline agents in `ss_baselines` such as [`av_nav`](https://github.com/facebookresearch/sound-spaces/tree/main/ss_baselines/av_nav). The scripts in **bold** are the main points of modification.

- `config/`
  - **`audionav/`** This directory contains experiment configs specifying task, model parameters, and training options for each of train & val & test.
  - `__init__.py` Imports from `default`.
  - **`default.py`** Defines defaults for experiment config and task config.
- `models/` Defines neural network components.
- `ppo/`
  - `policy.py` Defines policy module with neural network components.
  - `ppo.py` Defines PPO module.
  - **`ppo_trainer.py`** Defines trainer implementing `train()` and `eval()` for PPO inheriting from [`BaseRLTrainer`](https://github.com/facebookresearch/sound-spaces/blob/f11fef81db0c6b05d42fd062faa4929195de4ddf/ss_baselines/common/base_trainer.py#L42).
- `__init__.py` Imports from `ppo.ppo_trainer`.
- **`run.py`** Main script for running experiments.It calls `train()` or `eval()` method of the trainer. Specify the appropriate experiment configs with your desired task config. (p.s. The help strings are partially incorrect.)

For your reference
- `run.py`
  - exp_config [`audiogoal_depth.yaml`](https://github.com/facebookresearch/sound-spaces/blob/main/ss_baselines/av_nav/config/audionav/replica/train_telephone/audiogoal_depth.yaml)
    - BASE_TASK_CONFIG [`audiogoal.yaml`](https://github.com/facebookresearch/sound-spaces/blob/main/configs/audionav/av_nav/replica/audiogoal.yaml)
      - ENVIRONMENT
      - SIMULATOR
        - HABITAT_SIM_V0
          - GPU_DEVICE_ID: `0`
        - TYPE: `"SoundSpacesSim"`
        - ACTION_SPACE_CONFIG: `"v0"`
        - SCENE_DATASET: `"replica"`
      - TASK
        - TYPE: `AudioNav`
      - DATASET
        - TYPE: `"AudioNav"`
        - SPLIT: `"train_telephone"`
        - CONTENT_SCENES: `["*"]`
          - `ss_baselines.common.env_utils`
          - `habitat.datasets.registration`
            - `habitat.datasets.pointnav.pointnav_dataset`
              - `habitat.core.dataset`
        - VERSION: `'v1'`
        - SCENES_DIR: `"data/scene_datasets/replica"`
        - DATA_PATH: `"data/datasets/audionav/replica/{version}/{split}/{split}.json.gz"`
  - `get_config()` from `ss_baselines.av_nav.config.default`
    - `from habitat import get_config as get_task_config`
    - `from habitat.config import Config as CN`
    - experiment config
      - BASE_TASK_CONFIG_PATH `pointgoal.yaml`
      - TRAINER_NAME `"AVNavTrainer"`
      - ENV_NAME `"AudioNavRLEnv"`
      - VIDEO_OPTION `["disk", "tensorboard"]`
      - SENSORS `["RGB_SENSOR", "DEPTH_SENSOR"]`
      - RL.PPO
    - task config
      - AUDIOGOAL_SENSOR.TYPE `"AudioGoalSensor"`
      - SPECTROGRAM_SENSOR.TYPE `"SpectrogramSensor"`
      - SIMULATOR.SCENE_DATASET `'replica'`
      - DATASET.VERSION `'v1'`
    - `config.merge_from_file()` from `yacs`

  - `baseline_registry()` from `ss_baselines.common.baseline_registry`
    - `from habitat.core.registry import Registry`
  - `trainer_init()`
  - `trainer.train()`
  - `trainer.eval()`



## Playground