viswordembeddings is a web-based application to help researchers and practitioners explore and analyze word vector embeddings. It combines implementations of multiple interactive visualization designs that support tasks to understand embeddings. The tasks, designs and the rationale behind them are described in a publication presented at the EuroVis 2018 conference.
viswordembeddings has a back-end written in Python, version 3.6. The back-end runs a server implemented using the flask library. Processing of embeddings and vector operations are done using the numpy and scipy libraries. The python-sharearray library is used to make sure that only one copy of an embedding is kept in memory, even if multiple instances of the server are running.
The python dependencies are listed in the
requirements.txt file, and can be installed using pip:
pip install -r requirements.txt
Running the server locally
The server expects a
/data folder, containing embeddings and additional data files.
A description of its structure and an example for download is available on the project webpage.
Once the folder has been created, the server can be started with:
python app/main.py demo
The front-end is accessible with a browser (preferably Google Chrome) at
Alternatively, we provide a docker image of viswordembeddings that contains all dependencies. It can be easily set up locally, or on public server infrastructure. The image is based on the tiangolo/uwsgi-nginx-flask:python3.6-index image. It uses uWSGI to run the Python back-end, and nginx to distribute traffic between uWSGI instances. uWSGI keeps a variable pool of workers as needed through its cheaper subsystem.
Installing the docker image
After installing docker on your system, the image can be installed using:
docker pull fheimerl/viswordembeddings
Running the docker image
Before we can run the image, we have to make sure that a folder containing the embeddings to analyze exists (more information about the directory structure that viswordembedding expects is available on the project webpage).
This folder should be mounted into the docker container as the
To access the front-end, the docker container exposes port 80, which can be mapped to any local port.
During runtime, all embeddings are loaded into (shared) memory addresses.
The available shared memory (--shm-size) must therefore be set to a large enough value.
The following example for running the image uses shared memory size of 10GB, makes the front-end available at local port 8080, and mounts the local folder
/embeddings_data as the data folder into the docker container:
docker run --shm-size=10GB -p 8080:80 -v /embeddings_data/:/data/ fheimerl/viswordembeddings