<a href="https://colab.research.google.com/github/yasu-k2/multimodal-active-inference/blob/main/sound_spaces.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SoundSpaces

[web site](https://soundspaces.org/)

[main repo](https://github.com/facebookresearch/sound-spaces)


## Description

- Tasks
  - PointGoal
  - AudioGoal
  - AudioPointGoal

- [Challenge](https://github.com/facebookresearch/soundspaces-challenge)
  - AudioNav Task
  - Metric is ['Success weighted by Path Length' (SPL)](https://eval.ai/web/challenges/challenge-page/1621/evaluation)

- Datasets
  - **[Replica-Dataset (Replica Dataset v1)](https://github.com/facebookresearch/Replica-Dataset)**
    - 18 scenes
      - **apartment 0-2**
      - office 0-4
      - room 0-2
      - hotel 0
      - FRL apartment 0-5
    - ReplicaSDK
      - ReplicaViewer
      - ReplicaRenderer
    - Smaller in size
    - download script [available](https://raw.githubusercontent.com/facebookresearch/Replica-Dataset/main/download.sh)
  - [Matterport3D](https://niessner.github.io/Matterport/)
    - 90 scenes
    - Used for challenge
    - Need to request access
  - cf. Keep dataset size < 100GB for Colab.

- Data
  - audio renderings (room impulse responses; RIRs), 867GB
    - Replica
      - full binaural, 81GB
    - Matterport
      - full binaural, 682GB
      - full ambisonic, 3.6T
  - metadata of each scene, 1MB
  - episode datasets, 77MB -> 115MB
  - mono sound files, 13MB -> 640MB
  - pretrained weights, 303MB

- Baselines
  - `av-nav` Audio-Visual Navigation (AV-Nav) Model
  - `av-wan` Audio-Visual Waypoints (AV-WaN) Model
  - `savi` Semantic Audio-Visual Navigation (SAVi) Model


## Installation

The entire process took around 1.5 hours in colab.

### habitat-sim (v0.1.7)

- simulator for embodied AI
- requires Python>=3.7.
- latest: v0.2.1

```
!conda create -n habitat python=3.7 cmake=3.14.0
!conda activate habitat
# Installation for a machine without an attached display
!conda install habitat-sim=0.1.7 withbullet headless -c conda-forge -c aihabitat
```

Test habitat-sim installation (options incl. --enable_physics, --save_png)
```
!python habitat-sim/examples/example.py --scene /data/scene_datasets/habitat-test-scenes/skokloster-castle.glb
```

### habitat-lab (v0.1.7)

- embodied AI tasks and agents
- `Env`, `Dataset`, `Episode`, `Task`, `Sensor`, `Observation`
- requires Python>=3.7. Python 3.7 preferred.
- latest: v0.2.1

```
!git clone https://github.com/facebookresearch/habitat-lab.git --branch v0.1.7
!cd habitat-lab
# Install only core of Habitat Lab
!pip install -e .
# Include habitat_baselines (PPO, SLAM, utilities)
!pip install -r requirements.txt
!python setup.py develop --all
```

Test habitat-lab installation
```
!python habitat-lab/examples/example.py
```

### Helper script and settings

Installation script based on [`conda_install.sh`](https://github.com/facebookresearch/habitat-sim/blob/main/examples/colab_utils/colab_install.sh) from official repo and settings from official examples for [habitat-sim](https://github.com/facebookresearch/habitat-sim/blob/main/examples/tutorials/colabs/) and [habitat-lab](https://github.com/facebookresearch/habitat-lab/blob/main/examples/tutorials/colabs/).

In [1]:
!curl -L https://raw.githubusercontent.com/yasu-k2/multimodal-active-inference/main/colab_install_habitat.sh | bash -s

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3110  100  3110    0     0   8162      0 --:--:-- --:--:-- --:--:--  8184
--2022-04-18 00:26:18--  https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
Resolving repo.continuum.io (repo.continuum.io)... 104.18.201.79, 104.18.200.79, 2606:4700::6812:c84f, ...
Connecting to repo.continuum.io (repo.continuum.io)|104.18.201.79|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh [following]
--2022-04-18 00:26:18--  https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Resolving repo.anaconda.com (repo.anaconda.com)... 104.16.130.3, 104.16.131.3, 2606:4700::6810:8203, ...
Connecting to repo.anaconda.com (repo.anaconda.com)|104.16.130.3|:443... connected.
HTTP request sent, awaiting response..

In [2]:
%cd /content/habitat-sim

/content/habitat-sim


In [3]:
# !wget -c http://dl.fbaipublicfiles.com/habitat/habitat-test-scenes.zip && unzip -o habitat-test-scenes.zip
## !wget -c http://dl.fbaipublicfiles.com/habitat/objects_v0.2.zip && unzip -o objects_v0.2.zip -d data/objects/
## !wget -c http://dl.fbaipublicfiles.com/habitat/locobot_merged_v0.2.zip && unzip -o locobot_merged_v0.2.zip -d data/objects

In [4]:
#!rm habitat-test-scenes.zip
## !rm objects_v0.2.zip
## !rm locobot_merged_v0.2.zip

In [5]:
# !python examples/example.py --scene data/scene_datasets/habitat-test-scenes/skokloster-castle.glb

In [6]:
%cd /content/habitat-lab

/content/habitat-lab


In [7]:
# Some errors with habitat_baselines
# !python setup.py test

In [8]:
# !python examples/example.py

In [9]:
# !python examples/benchmark.py

In [10]:
%cd /content/habitat-sim

/content/habitat-sim


```bash
# !pip uninstall --yes pyopenssl
# !pip install pyopenssl
```

```python
# reload the cffi version
# import sys
# if "google.colab" in sys.modules:
#     import importlib
#     import cffi
#     importlib.reload(cffi)
```

```python
import math
import os
import random
import sys

import git
import imageio
import magnum as mn
import numpy as np
%matplotlib inline
from matplotlib import pyplot as plt
from PIL import Image

# You need to restart runtime before importing habitat
import habitat
import habitat_sim

try:
    import ipywidgets as widgets
    from IPython.display import display as ipydisplay
    # For using jupyter/ipywidget IO components
    HAS_WIDGETS = True
except ImportError:
    HAS_WIDGETS = False

if "google.colab" in sys.modules:
    os.environ["IMAGEIO_FFMPEG_EXE"] = "/usr/bin/ffmpeg"

repo = git.Repo(".", search_parent_directories=True)
dir_path = repo.working_tree_dir
%cd $dir_path

data_path = os.path.join(dir_path, "data")
output_directory = "output/"  ## Based on your preference
output_path = os.path.join(dir_path, output_directory)
if not os.path.exists(output_path):
    os.mkdir(output_path)

# define some globals the first time we run.
if "sim" not in globals():
    global sim
    sim = None
    global obj_attr_mgr
    obj_attr_mgr = None
    global prim_attr_mgr
    obj_attr_mgr = None
    global stage_attr_mgr
    stage_attr_mgr = None
    global rigid_obj_mgr
    rigid_obj_mgr = None
```

## Install SoundSpaces

In [11]:
%cd /content

/content


In [12]:
!git clone https://github.com/facebookresearch/sound-spaces.git

Cloning into 'sound-spaces'...
remote: Enumerating objects: 963, done.[K
remote: Counting objects: 100% (963/963), done.[K
remote: Compressing objects: 100% (500/500), done.[K
remote: Total 963 (delta 639), reused 751 (delta 444), pack-reused 0[K
Receiving objects: 100% (963/963), 8.31 MiB | 7.52 MiB/s, done.
Resolving deltas: 100% (639/639), done.


In [13]:
%cd sound-spaces

/content/sound-spaces


In [14]:
!pip install -e .

Obtaining file:///content/sound-spaces
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pydub
  Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Collecting getch
  Downloading getch-1.0.tar.gz (1.3 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: getch
  Building wheel for getch (setup.py) ... [?25l[?25hdone
  Created wheel for getch: filename=getch-1.0-cp37-cp37m-linux_x86_64.whl size=15108 sha256=31e47df6e4ec2736ea80c218e49b208a0872ce94e5c1a2d4a79363a2deb6bf65
  Stored in directory: /root/.cache/pip/wheels/f1/12/2d/cda22b14c0da6e39eca4a204585db4f7ea4e5c478207dfe1b3
Successfully built getch
Installing collected packages: pydub, getch, sound-spaces
  Running setup.py develop for sound-spaces
Successfully installed getch-1.0 pydub-0.25.1 sound-spaces-0.1.1
[0m

## Download dataset

In [15]:
!mkdir data

In [16]:
%cd data

/content/sound-spaces/data


In [17]:
# !wget http://dl.fbaipublicfiles.com/SoundSpaces/binaural_rirs.tar && tar xvf binaural_rirs.tar
!wget http://dl.fbaipublicfiles.com/SoundSpaces/metadata.tar.xz && tar xvf metadata.tar.xz
!wget http://dl.fbaipublicfiles.com/SoundSpaces/sounds.tar.xz && tar xvf sounds.tar.xz
!wget http://dl.fbaipublicfiles.com/SoundSpaces/datasets.tar.xz && tar xvf datasets.tar.xz
!wget http://dl.fbaipublicfiles.com/SoundSpaces/pretrained_weights.tar.xz && tar xvf pretrained_weights.tar.xz

--2022-04-18 00:35:29--  http://dl.fbaipublicfiles.com/SoundSpaces/metadata.tar.xz
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 172.67.9.4, 104.22.74.142, 104.22.75.142, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|172.67.9.4|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 962450 (940K) [application/x-bzip2]
Saving to: ‘metadata.tar.xz’


2022-04-18 00:35:30 (969 KB/s) - ‘metadata.tar.xz’ saved [962450/962450]

metadata/
metadata/replica/
metadata/replica/room_0/
metadata/replica/room_0/graph.pkl
metadata/replica/room_0/points.txt
metadata/replica/frl_apartment_2/
metadata/replica/frl_apartment_2/points.txt
metadata/replica/frl_apartment_2/graph.pkl
metadata/replica/room_1/
metadata/replica/room_1/points.txt
metadata/replica/room_1/graph.pkl
metadata/replica/frl_apartment_3/
metadata/replica/frl_apartment_3/graph.pkl
metadata/replica/frl_apartment_3/points.txt
metadata/replica/office_3/
metadata/replica/office_3/graph.pkl
me

In [18]:
# !rm binaural_rirs.tar
!rm metadata.tar.xz
!rm sounds.tar.xz
!rm datasets.tar.xz
!rm pretrained_weights.tar.xz

In [19]:
# Replica-Dataset
!apt-get install pigz

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  pigz
0 upgraded, 1 newly installed, 0 to remove and 39 not upgraded.
Need to get 57.4 kB of archives.
After this operation, 259 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 pigz amd64 2.4-1 [57.4 kB]
Fetched 57.4 kB in 1s (53.0 kB/s)
Selecting previously unselected package pigz.
(Reading database ... 155455 files and directories currently installed.)
Preparing to unpack .../archives/pigz_2.4-1_amd64.deb ...
Unpacking pigz (2.4-1) ...
Setting up pigz (2.4-1) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...


In [21]:
# replica_v1_0.tar.gz.partaa ~ .partap 1.86GB, .partaq 1.73GB -> 17 files (31.5GB) in total, takes about 45min to download
# -> 43GB after extraction
!curl -L https://raw.githubusercontent.com/yasu-k2/multimodal-active-inference/main/download_replica.sh | bash -s 

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   619  100   619    0     0   1528      0 --:--:-- --:--:-- --:--:--  1528

Downloading and decompressing Replica to data. The script can resume
partial downloads -- if your download gets interrupted, simply run it again.

--2022-04-18 01:49:03--  https://github.com/facebookresearch/Replica-Dataset/releases/download/v1.0/replica_v1_0.tar.gz.partaa
Resolving github.com (github.com)... 52.192.72.89
Connecting to github.com (github.com)|52.192.72.89|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/169771349/9fd1ec80-8e7a-11e9-8b79-92a548b347e3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20220418%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220418T014904Z&X-Amz-Expires=300&X-Amz-Signature=81d998bd

In [22]:
!rm replica_v1_0.tar.gz.parta*

In [23]:
%cd data

/content/sound-spaces/data/data


In [24]:
!rm -r room_0 room_1 room_2
!rm -r office_0 office_1 office_2 office_3 office_4
!rm -r hotel_0
!rm -r frl_apartment_0 frl_apartment_1 frl_apartment_2 frl_apartment_3 frl_apartment_4 frl_apartment_5

In [25]:
%cd ..

/content/sound-spaces/data


In [26]:
%cd metadata/replica/

/content/sound-spaces/data/metadata/replica


In [27]:
!rm -r room_0 room_1 room_2
!rm -r office_0 office_1 office_2 office_3 office_4
!rm -r hotel_0
!rm -r frl_apartment_0 frl_apartment_1 frl_apartment_2 frl_apartment_3 frl_apartment_4 frl_apartment_5

In [28]:
%cd ../..

/content/sound-spaces/data


In [29]:
# Matterport3D

In [30]:
%cd metadata/mp3d/

/content/sound-spaces/data/metadata/mp3d


In [31]:
!rm -r *

In [32]:
%cd ../..

/content/sound-spaces/data


In [44]:
# Organize relevant files > FROM HERE
%cd /content/sound-spaces/data

/content/sound-spaces/data


In [34]:
!rm -r datasets/audionav/mp3d/
!rm -r datasets/semantic_audionav/mp3d/
!rm -r metadata/mp3d/
!rm -r pretrained_weights/audionav/av_nav/mp3d/
!rm -r pretrained_weights/audionav/av_wan/mp3d/
# !rm -r pretrained_weights/semantic_audionav/
# !rm -r sounds/semantic_splits/

In [45]:
!rm -r datasets/audionav/replica/v1

In [35]:
!du -sh

15G	.


In [56]:
%cd /content

/content


In [58]:
!git clone https://github.com/yasu-k2/multimodal-active-inference.git
# %cd multimodal-active-inference/
# !git pull origin main
# %cd ..

/content/multimodal-active-inference
remote: Enumerating objects: 39, done.[K
remote: Counting objects: 100% (39/39), done.[K
remote: Compressing objects: 100% (17/17), done.[K
remote: Total 26 (delta 1), reused 26 (delta 1), pack-reused 0[K
Unpacking objects: 100% (26/26), done.
From https://github.com/yasu-k2/multimodal-active-inference
 * branch            main       -> FETCH_HEAD
   2e444b4..25bc9ef  main       -> origin/main
Updating 2e444b4..25bc9ef
Fast-forward
 .../replica/v1/test_telephone/content/apartment_1.json |   1 [31m-[m
 .../v1/test_telephone/content/apartment_1.json.gz      | Bin [31m0[m -> [32m12716[m bytes
 .../replica/v1/test_telephone/test_telephone.json      |   1 [31m-[m
 .../replica/v1/test_telephone/test_telephone.json.gz   | Bin [31m0[m -> [32m61[m bytes
 .../replica/v1/train_multiple/content/apartment_0.json |   1 [31m-[m
 .../v1/train_multiple/content/apartment_0.json.gz      | Bin [31m0[m -> [32m894039[m bytes
 .../replica/v1/train_mu

In [59]:
!cp -R multimodal-active-inference/datasets/audionav/replica/v1 sound-spaces/data/datasets/audionav/replica/

In [60]:
%cd /content/sound-spaces/data
# TO HERE < Organize relevant files

/content/sound-spaces/data


In [61]:
%cd ..

/content/sound-spaces


In [53]:
!pwd

/content/sound-spaces


```bash
# Download full RIRs
!python scripts/download_data.py --dataset mp3d --rir-type binaural_rirs
!python scripts/download_data.py --dataset replica --rir-type binaural_rirs
```

In [39]:
import os
from scripts.download_data import download_and_uncompress

output_dir = 'data'
dataset = 'replica'  # 'mp3d', 'replica'
rir_type = 'binaural_rirs'  # 'binaural_rirs', 'ambisonic_rirs'

dataset_rir_dir = os.path.join(output_dir, rir_type, dataset)
aws_root_dir = 'http://dl.fbaipublicfiles.com/SoundSpaces/'
# Select subset of available scenes
scenes = os.listdir(os.path.join('data/metadata/', dataset))
print(scenes)

['apartment_0', 'apartment_2', 'apartment_1']


In [40]:
# scenes = ['apartment_0', 'apartment_1', 'apartment_2']
for scene in scenes:
  scene_file = os.path.join(aws_root_dir, rir_type, dataset, scene + '.tar.gz')
  if os.path.exists(os.path.join(dataset_rir_dir, scene)):
    continue
  else:
    download_and_uncompress(scene_file, output_dir)

Downloading http://dl.fbaipublicfiles.com/SoundSpaces/binaural_rirs/replica/apartment_0.tar.gz ...
Downloading http://dl.fbaipublicfiles.com/SoundSpaces/binaural_rirs/replica/apartment_2.tar.gz ...
Downloading http://dl.fbaipublicfiles.com/SoundSpaces/binaural_rirs/replica/apartment_1.tar.gz ...


```python
from scripts.cache_observations import main
# Iterate over scenes in metadata dir and cache observations
#   default config path is 'ss_baselines/av_nav/config/audionav/{}/train_telephone/pointgoal_rgb.yaml'.format(dataset)
#   config.TASK_CONFIG.SIMULATOR.AGENT_0.SENSORS = ["RGB_SENSOR", "DEPTH_SENSOR"]
#   config.TASK_CONFIG.SIMULATOR.USE_RENDERED_OBSERVATIONS = False
print('Caching Replica observations ...')
main('replica')
print('Caching Matterport3D observations ...')
main('mp3d')
```

In [41]:
!mkdir data/scene_datasets
!mv data/data data/scene_datasets/replica

In [42]:
!sed -i -e "/.*Matterport3D.*/d" scripts/cache_observations.py
!sed -i -e "/.*mp3d.*/d" scripts/cache_observations.py

In [43]:
# Cache observations
!python scripts/cache_observations.py --config-path ss_baselines/av_nav/config/audionav/replica/train_telephone/pointgoal_rgb.yaml
!python scripts/cache_observations.py --config-path ss_baselines/av_nav/config/audionav/replica/val_telephone/pointgoal_rgb.yaml
!python scripts/cache_observations.py --config-path ss_baselines/av_nav/config/audionav/replica/test_telephone/pointgoal_rgb.yaml

Caching Replica observations ...
I0418 03:04:07.753610  7595 ManagedContainerBase.cpp:19] ManagedContainerBase::convertFilenameToJSON : Filename : default changed to proposed JSON configuration filename : default.scene_dataset_config.json
I0418 03:04:07.753674  7595 AttributesManagerBase.h:283] AttributesManager<T>::createFromJsonOrDefaultInternal  (Dataset) : Proposing JSON name : default.scene_dataset_config.json from original name : default | This file  does not exist.
I0418 03:04:07.754091  7595 AssetAttributesManager.cpp:117] Asset attributes (capsule3DSolid : capsule3DSolid_hemiRings_4_cylRings_1_segments_12_halfLen_0.75_useTexCoords_false_useTangents_false) created and registered.
I0418 03:04:07.754341  7595 AssetAttributesManager.cpp:117] Asset attributes (capsule3DWireframe : capsule3DWireframe_hemiRings_8_cylRings_1_segments_16_halfLen_1) created and registered.
I0418 03:04:07.754446  7595 AssetAttributesManager.cpp:117] Asset attributes (coneSolid : coneSolid_segments_12_hal

## Test SoundSpaces

1. Training

```bash
!python ss_baselines/av_nav/run.py \
  --exp-config ss_baselines/av_nav/config/audionav/replica/train_telephone/audiogoal_depth.yaml \
  --model-dir data/models/replica/audiogoal_depth
````

2. Validation

```bash
# EDIT ckpt.XXX.pth
!python ss_baselines/av_nav/run.py \
  --run-type eval \
  --exp-config ss_baselines/av_nav/config/audionav/replica/val_telephone/audiogoal_depth.yaml \
  --model-dir data/models/replica/audiogoal_depth
```

3. Test the best validation checkpoint based on validation curve

```bash
# EDIT ckpt.XXX.pth
!python ss_baselines/av_nav/run.py \
  --run-type eval \
  --exp-config ss_baselines/av_nav/config/audionav/replica/test_telephone/audiogoal_depth.yaml \
  --model-dir data/models/replica/audiogoal_depth \
  EVAL_CKPT_PATH_DIR data/models/replica/audiogoal_depth/data/ckpt.XXX.pth
```

4. Generate demo video

```bash
# EDIT ckpt.XXX.pth
!python ss_baselines/av_nav/run.py \
  --run-type eval \
  --exp-config ss_baselines/av_nav/config/audionav/replica/test_telephone/audiogoal_depth.yaml \
  --model-dir data/models/replica/audiogoal_depth \
  EVAL_CKPT_PATH_DIR data/models/replica/audiogoal_depth/data/ckpt.XXX.pth \
  VIDEO_OPTION [\"disk\"] \
  TASK_CONFIG.SIMULATOR.USE_RENDERED_OBSERVATIONS False \
  TASK_CONFIG.TASK.SENSORS [\"POINTGOAL_WITH_GPS_COMPASS_SENSOR\",\"SPECTROGRAM_SENSOR\",\"AUDIOGOAL_SENSOR\"] \
  SENSORS [\"RGB_SENSOR\",\"DEPTH_SENSOR\"] \
  EXTRA_RGB True \
  TASK_CONFIG.SIMULATOR.CONTINUOUS_VIEW_CHANGE True \
  DISPLAY_RESOLUTION 512 \
  TEST_EPISODE_COUNT 1
```

5. Evaluating the pretrained model

```bash
!python ss_baselines/av_nav/run.py \
  --run-type eval \
  --exp-config ss_baselines/av_nav/config/audionav/replica/test_telephone/audiogoal_depth.yaml \
  EVAL_CKPT_PATH_DIR data/pretrained_weights/audionav/av_nav/replica/heard.pth
!python ss_baselines/av_nav/run.py \
  --run-type eval \
  --exp-config ss_baselines/av_nav/config/audionav/replica/test_telephone/audiogoal_depth.yaml \
  EVAL_CKPT_PATH_DIR data/pretrained_weights/audionav/av_nav/replica/unheard.pth \
  EVAL.SPLIT test_multiple_unheard
```

6. Interactive demo

```bash
!python scripts/interactive_demo.py
```

In [54]:
!sed -i -e "s/.*'apartment_0'.*/REPLICA_SCENES = ['apartment_0', 'apartment_1', 'apartment_2']/g" ss_baselines/common/env_utils.py
!sed -i -e "/.*'frl_apartment_3'.*/d" ss_baselines/common/env_utils.py
!sed -i -e "/.*'office_3'.*/d" ss_baselines/common/env_utils.py

!sed -i -e "s/CONTENT_SCENES:.*]/CONTENT_SCENES: ['apartment_0', 'apartment_1', 'apartment_2']/" configs/audionav/av_nav/replica/audiogoal.yaml

!sed -i -e "s/NUM_PROCESSES.*/NUM_PROCESSES: 1/g" ss_baselines/av_nav/config/audionav/replica/train_telephone/audiogoal_depth.yaml

In [62]:
!python ss_baselines/av_nav/run.py --exp-config ss_baselines/av_nav/config/audionav/replica/train_telephone/audiogoal_depth.yaml --model-dir data/models/replica/audiogoal_depth

[1;30;43mストリーミング出力は最後の 5000 行に切り捨てられました。[0m
      LINEAR_FRICTION: 0.5
      MASS: 32.0
      RADIUS: 0.1
      SENSORS: ['RGB_SENSOR']
      START_POSITION: [0, 0, 0]
      START_ROTATION: [0, 0, 0, 1]
    AUDIO:
      BINAURAL_RIR_DIR: data/binaural_rirs
      EVERLASTING: True
      GRAPH_FILE: graph.pkl
      HAS_DISTRACTOR_SOUND: False
      METADATA_DIR: data/metadata
      POINTS_FILE: points.txt
      RIR_SAMPLING_RATE: 44100
      SCENE: 
      SOURCE_SOUND_DIR: data/sounds/1s_all
    CONTINUOUS_VIEW_CHANGE: False
    DEFAULT_AGENT_ID: 0
    DEPTH_SENSOR:
      HEIGHT: 128
      HFOV: 90
      MAX_DEPTH: 10.0
      MIN_DEPTH: 0.0
      NORMALIZE_DEPTH: True
      ORIENTATION: [0.0, 0.0, 0.0]
      POSITION: [0, 1.25, 0]
      TYPE: HabitatSimDepthSensor
      WIDTH: 128
    FORWARD_STEP_SIZE: 0.5
    GRID_SIZE: 0.5
    HABITAT_SIM_V0:
      ALLOW_SLIDING: True
      ENABLE_PHYSICS: False
      GPU_DEVICE_ID: 0
      GPU_GPU: False
      PHYSICS_CONFIG_FILE: ./data/default.ph

In [44]:
# !python ss_baselines/av_nav/run.py --run-type eval --exp-config ss_baselines/av_nav/config/audionav/replica/val_telephone/audiogoal_depth.yaml --model-dir data/models/replica/audiogoal_depth

In [45]:
## !python ss_baselines/av_nav/run.py --run-type eval --exp-config ss_baselines/av_nav/config/audionav/replica/test_telephone/audiogoal_depth.yaml --model-dir data/models/replica/audiogoal_depth --eval-best
# !python ss_baselines/av_nav/run.py --run-type eval --exp-config ss_baselines/av_nav/config/audionav/replica/test_telephone/audiogoal_depth.yaml --model-dir data/models/replica/audiogoal_depth EVAL_CKPT_PATH_DIR data/models/replica/audiogoal_depth/data/ckpt.0.pth

In [46]:
## !python ss_baselines/av_nav/run.py --run-type eval --exp-config ss_baselines/av_nav/config/audionav/replica/test_telephone/audiogoal_depth.yaml --model-dir data/models/replica/audiogoal_depth --eval-best VIDEO_OPTION [\"disk\"] TASK_CONFIG.SIMULATOR.USE_RENDERED_OBSERVATIONS False TASK_CONFIG.TASK.SENSORS [\"POINTGOAL_WITH_GPS_COMPASS_SENSOR\",\"SPECTROGRAM_SENSOR\",\"AUDIOGOAL_SENSOR\"] SENSORS [\"RGB_SENSOR\",\"DEPTH_SENSOR\"] EXTRA_RGB True TASK_CONFIG.SIMULATOR.CONTINUOUS_VIEW_CHANGE True DISPLAY_RESOLUTION 512 TEST_EPISODE_COUNT 1
# !python ss_baselines/av_nav/run.py --run-type eval --exp-config ss_baselines/av_nav/config/audionav/replica/test_telephone/audiogoal_depth.yaml --model-dir data/models/replica/audiogoal_depth EVAL_CKPT_PATH_DIR data/models/replica/audiogoal_depth/data/ckpt.0.pth VIDEO_OPTION [\"disk\"] TASK_CONFIG.SIMULATOR.USE_RENDERED_OBSERVATIONS False TASK_CONFIG.TASK.SENSORS [\"POINTGOAL_WITH_GPS_COMPASS_SENSOR\",\"SPECTROGRAM_SENSOR\",\"AUDIOGOAL_SENSOR\"] SENSORS [\"RGB_SENSOR\",\"DEPTH_SENSOR\"] EXTRA_RGB True TASK_CONFIG.SIMULATOR.CONTINUOUS_VIEW_CHANGE True DISPLAY_RESOLUTION 512 TEST_EPISODE_COUNT 1

In [None]:
!python ss_baselines/av_nav/run.py --run-type eval --exp-config ss_baselines/av_nav/config/audionav/replica/test_telephone/audiogoal_depth.yaml EVAL_CKPT_PATH_DIR data/pretrained_weights/audionav/av_nav/replica/heard.pth
## !python ss_baselines/av_nav/run.py --run-type eval --exp-config ss_baselines/av_nav/config/audionav/replica/test_telephone/audiogoal_depth.yaml EVAL_CKPT_PATH_DIR data/pretrained_weights/audionav/av_nav/replica/unheard.pth EVAL.SPLIT test_multiple_unheard

2022-04-18 03:28:30,709 env config: BASE_TASK_CONFIG_PATH: configs/audionav/av_nav/replica/audiogoal.yaml
CHECKPOINT_FOLDER: data/models/output/data
CHECKPOINT_INTERVAL: 50
CMD_TRAILING_OPTS: ['EVAL_CKPT_PATH_DIR', 'data/pretrained_weights/audionav/av_nav/replica/heard.pth']
DEBUG: False
DISPLAY_RESOLUTION: 128
ENV_NAME: AudioNavRLEnv
EVAL:
  SPLIT: test_telephone
  USE_CKPT_CONFIG: True
EVAL_CKPT_PATH_DIR: data/pretrained_weights/audionav/av_nav/replica/heard.pth
EXTRA_RGB: False
LOG_FILE: data/models/output/train.log
LOG_INTERVAL: 10
NUM_PROCESSES: 1
NUM_UPDATES: 10000
RL:
  DISTANCE_REWARD_SCALE: 1.0
  PPO:
    clip_param: 0.2
    entropy_coef: 0.01
    eps: 1e-05
    gamma: 0.99
    hidden_size: 512
    lr: 0.0007
    max_grad_norm: 0.5
    num_mini_batch: 16
    num_steps: 5
    ppo_epoch: 4
    reward_window_size: 50
    tau: 0.95
    use_gae: True
    use_linear_clip_decay: False
    use_linear_lr_decay: False
    value_loss_coef: 0.5
  SLACK_REWARD: -0.01
  SUCCESS_REWARD: 10.0

## Implementing a new agent

1. Simple agent

You can use the `RandomAgent()` in [`simple_agents.py`](https://github.com/facebookresearch/sound-spaces/blob/main/ss_baselines/common/simple_agents.py) included in `ss_baselines` as an example of implementing a relatively simple agent.It shows that you only need to implement an agent class inheriting `habitat.Agent` with your original `__init__()`, `reset()`, `is_goal_reached()`, and `act()` method inside the script. This script takes arguments of `task_config`(default `configs/tasks/pointnav.yaml`. I think this is supposed to be something like [`configs/audionav/av_nav/replica/pointgoal.yaml`](https://github.com/facebookresearch/sound-spaces/blob/main/configs/audionav/av_nav/replica/pointgoal.yaml)) to construct the task and `success_distance`(default `0.2`) and `agent_class`(default `RandomAgent`) for instantiating your agent, so specify according to your setup.

The evaluation happens in the `evaluate()` method of `Benchmark()` class defined in [`benchmark.py`](https://github.com/facebookresearch/sound-spaces/blob/main/ss_baselines/common/benchmark.py). The code corresponding to one episode is [here](https://github.com/facebookresearch/sound-spaces/blob/f11fef81db0c6b05d42fd062faa4929195de4ddf/ss_baselines/common/benchmark.py#L80-L98).

2. Sophisticated agent

If you opt for sophisticated agents, you can refer to the structure of the baseline agents in `ss_baselines` such as [`av_nav`](https://github.com/facebookresearch/sound-spaces/tree/main/ss_baselines/av_nav). The scripts in **bold** are the main points of modification.

- `config/`
  - **`audionav/`** This directory contains experiment configs specifying task, model parameters, and training options for each of train & val & test.
  - `__init__.py` Imports from `default`.
  - **`default.py`** Defines defaults for experiment config and task config.
- `models/` Defines neural network components.
- `ppo/`
  - `policy.py` Defines policy module with neural network components.
  - `ppo.py` Defines PPO module.
  - **`ppo_trainer.py`** Defines trainer implementing `train()` and `eval()` for PPO inheriting from [`BaseRLTrainer`](https://github.com/facebookresearch/sound-spaces/blob/f11fef81db0c6b05d42fd062faa4929195de4ddf/ss_baselines/common/base_trainer.py#L42).
- `__init__.py` Imports from `ppo.ppo_trainer`.
- **`run.py`** Main script for running experiments.It calls `train()` or `eval()` method of the trainer. Specify the appropriate experiment configs with your desired task config. (p.s. The help strings are partially incorrect.)

For your reference
- `run.py`
  - exp_config [`audiogoal_depth.yaml`](https://github.com/facebookresearch/sound-spaces/blob/main/ss_baselines/av_nav/config/audionav/replica/train_telephone/audiogoal_depth.yaml)
    - BASE_TASK_CONFIG [`audiogoal.yaml`](https://github.com/facebookresearch/sound-spaces/blob/main/configs/audionav/av_nav/replica/audiogoal.yaml)
      - ENVIRONMENT
      - SIMULATOR
        - HABITAT_SIM_V0
          - GPU_DEVICE_ID: `0`
        - TYPE: `"SoundSpacesSim"`
        - ACTION_SPACE_CONFIG: `"v0"`
        - SCENE_DATASET: `"replica"`
      - TASK
        - TYPE: `AudioNav`
      - DATASET
        - TYPE: `"AudioNav"`
        - SPLIT: `"train_telephone"`
        - CONTENT_SCENES: `["*"]`
          - `ss_baselines.common.env_utils`
          - `habitat.datasets.registration`
            - `habitat.datasets.pointnav.pointnav_dataset`
              - `habitat.core.dataset`
        - VERSION: `'v1'`
        - SCENES_DIR: `"data/scene_datasets/replica"`
        - DATA_PATH: `"data/datasets/audionav/replica/{version}/{split}/{split}.json.gz"`
  - `get_config()` from `ss_baselines.av_nav.config.default`
    - `from habitat import get_config as get_task_config`
    - `from habitat.config import Config as CN`
    - experiment config
      - BASE_TASK_CONFIG_PATH `pointgoal.yaml`
      - TRAINER_NAME `"AVNavTrainer"`
      - ENV_NAME `"AudioNavRLEnv"`
      - VIDEO_OPTION `["disk", "tensorboard"]`
      - SENSORS `["RGB_SENSOR", "DEPTH_SENSOR"]`
      - RL.PPO
    - task config
      - AUDIOGOAL_SENSOR.TYPE `"AudioGoalSensor"`
      - SPECTROGRAM_SENSOR.TYPE `"SpectrogramSensor"`
      - SIMULATOR.SCENE_DATASET `'replica'`
      - DATASET.VERSION `'v1'`
    - `config.merge_from_file()` from `yacs`

  - `baseline_registry()` from `ss_baselines.common.baseline_registry`
    - `from habitat.core.registry import Registry`
  - `trainer_init()`
  - `trainer.train()`
  - `trainer.eval()`



## Playground