Concreteness
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci
data
.flake8
.gitignore
.pylintrc
LICENSE
Makefile
README.md
concreteness.ipynb
concreteness.py
main.py
mirflickr.py
requirements-notebook.txt
requirements.txt

README.md

Concreteness

An implementation of Quantifying the Visual Concreteness of Words and Topics in Multimodal Datasets with PyTorch.

It uses a ResNet50 along with Spotify's Annoy library to compute the visual concreteness scores of words from MIRFLICKR.

Requirements

To install the basic requirements, run this:

pip install -r requirements.txt

If you'd like use a Jupyter Notebook for interacting with the concreteness scores after computing them, you'll also need:

pip install -r requirements-notebook.txt

As of now, the existing code has only been tested with Python3.6.

Usage

Downloading the dataset

Before running, you'll need to download the MIRFLICKR dataset. You can do that with:

cd data
./get_mirflickr.sh

It's 120GB, so it may take a while.

Shell usage

Once your download is finished, you can compute the concreteness scores with:

python main.py -d <mirflickr_directory> -c <cache_directory> -v

Swap in the path to where the mirflickr dataset was downloaded to and a directory of your choice to use for caching.

Jupyter Notebook

If you prefer, you can also run the provided Jupyter Notebook:

jupyter notebook concreteness.ipynb

TODO

  • Add support for MSCOCO dataset
  • Improve Jupyter Notebook formatting

Thanks to

@jmhessel for helpful pointers and a great paper.

Citation:

@inproceedings{hessel2018concreteness,
               title={Quantifying the visual concreteness of words and topics in multimodal datasets},
               author={Hessel, Jack and Mimno, David and Lee, Lillian},
               booktitle={NAACL},
               year={2018}
}