Skip to content


Repository files navigation

Addressing Goal Misgeneralization with Natural Language Interfaces

Official repository for my MSc thesis in Artificial Intelligence at the University of Amsterdam:

Addressing Goal Misgeneralization with Natural Language Interfaces

Find and read the thesis here or here.

Requirements and Setup

Details such as python and package versions can be found in the generated pyproject.toml and poetry.lock files.

We recommend using an environment manager such as conda. After setting up your environment with the correct python version, please proceed with the installation of the required packages

First, install the pre-requirements1:

pip install -r requirements/pre-reqs.txt

Then install the rest of the requirements:

pip install -r requirements/requirements-complete.txt

These requirements.txt file are generated by running

Project Organization

    ├── LICENSE
    ├──          <- The top-level README
    ├── data/              <- Datasets
    ├── checkpoints/       <- Trained and serialized models.
    ├── notebooks/         <- Jupyter notebooks.
    ├── documents/         <- Documents as HTML, PDF, LaTeX, etc.
    ├── pyproject.toml     <- Project metadata, handled by poetry.
    ├── poetry.lock        <- Resolving and locking dependencies, handled by poetry.
    ├── requirements.txt   <- For non-poetry users.
    ├──    <- For generating the pip requirements.txt file
    ├── tests/             <- Tests
    ├── outputs/           <- Output files. Not committed.
    └── src/nlgoals/       <- Source code for use in this project.
        ├──    <- Makes src a Python module
        ├── babyai/        <- (modified) code from babyai repo
        ├── configs/       <- Miscellaneaous configuration
        ├──       <- Miscellaneaous utils
        ├── data/          <- Data processing and handling
        ├── interfaces/    <- Logic for connecting datasets and models
        ├── models/        <- Model definitions
        ├── losses/        <- Custom loss function implementations
        ├── trainer/       <- Configurations for training
        └── run/           <- Scripts to train, evaluate and use models

The project structure is largely based on the cookiecutter data-science template. This is purposely opinionated so that paths align over collaborators without having to edit config files. Users may find the cookiecutter data-science opinions page, of relevance

The top level data/ and checkpoints/ directories are in version control only to show structure. Their contents will not be committed and are ignored via .gitignore.

Demo videos

"turn on the light bulb"


"go to the key"



  1. This annoying first step is necessary because of stupid packages we depend on like pyhash and multicoretsne that don't know how to get their dependencies sorted out like everyone else. See this issue and this issue. These two packages are the main culprits but other packages haven't been very good at defining their dependencies either.


Official repository for my MSc thesis: "Addressing Goal Misgeneralization with Natural Language Interfaces."







No releases published


No packages published