Skip to content


Repository files navigation

See, Hear, Explore: Curiosity via Audio-Visual Association

[Website]    [Video]

Victoria Dean    Shubham Tulsiani    Abhinav Gupta

Carnegie Mellon University, Facebook AI Research

This is an implementation of our paper on curiosity via audio-visual association. In this paper, we introduce a form of curiosity that rewards novel associations between different sensory modalities. Our approach exploits multiple modalities to provide a stronger signal for more efficient exploration. Our method is inspired by the fact that, for humans, both sight and sound play a critical role in exploration. We present results on Atari and Habitat (a photorealistic navigation simulator), showing the benefits of using an audio-visual association model for intrinsically guiding learning agents in the absence of external rewards.

This code trains an audio-visual exploration agent in Atari environments. It does not yet have support for the Habitat navigation setting, as the underlying environment is not open-sourced.


This installation requires a machine with a GPU.

git clone
cd audio-curiosity
conda env create -f environment.yml

Retro Setup

You will need to download and import the Atari 2600 game ROMs to retro. The below commands should do this automatically (you may need to install unrar). For more details, see: openai/retro#53

wget && unrar x Roms.rar && unzip Roms/
python3 -m retro.import ROMS/

To add audio support, copy our modified into retro. If you set up a conda environment as instructed above, this command should work:

cp $CONDA_PREFIX/lib/python3.7/site-packages/retro/

Baselines Setup

Modify the following line in $CONDA_PREFIX/lib/python3.7/site-packages/baselines/

summary =[summary_val(k, v) for k, v in kvs.items()])


summary =[summary_val(k, v) for k, v in kvs.items() if v != None])



The following command should train an audio-visual exploration agent on Breakout with default experiment parameters.

python --env_kind=Breakout --feature_space=fft --train_discriminator=True --discriminator_weighted=True

To train a visual prediction baseline agent on Breakout:

python --env_kind=Breakout --feature_space=visual

Creating Plots

To create a figure with the 12 Atari environments we used (after you have trained), run:

python --all=True --mean=True


Code built off the open-source reposity from Large-Scale Study of Curiosity-Driven Learning [1]:

[1] Yuri Burda, Harri Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, and Alexei A Efros. Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355, 2018.


No description, website, or topics provided.






No releases published


No packages published