fastText is a library for training models for text classification and word representations, created by Fascebook AI Research.
From the official fastText documentation:
What is fastText?
FastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware.
Compiling fastText requires C++ build tool and other stuff that are usually not installed in non-C++ development machines, and differ between different OSs. This Dockerfile will create a small image that compiles everything for you. All you need to do is to supply the data (in the right format) and run fastText.
The difference between this container and other containers out there is the fact that this one is based only on the basic Ubuntu image, and does not require Python, which makes it very small (less than 380MB)
The simplest usage is to pull the image from the Docker hub:
docker pull yarongon/fasttext
Now you can train a model.
Alternatively, you can build the image.
If you wish to build the image by yourselves, first, pull the repo from Github.
git clone https://github.com/yarongon/fasttext_docker.git
Then, build the docker image for training:
docker build -t fasttext .
The endpoint in the Dockerfile is the fastText binary, so generally, to use the image, simply pass the parameters used by fastText. Following are a few examples.
Run the container by mounting two volumes, /data
and /models
, and then supplying the parameters that fastText accept. For example:
docker run \
--rm \
-v $DATA_DIR:/data \
-v $MODELS_DIR:/models \
yarongon/fasttext supervised \
-input /data/<training data> \
-output /models/<output model filename>
DATA_DIR
is the full path of the input directory that contains the training data, e.g.,TRAIN_DIR=/data/train/
.MODELS_DIR
is the full path to the output directory that stores the fastText trained modelsmodel.bin
to be generated, e.g.,MODEL_DIR=/data/models/
.
Another example, using fastText's autovalidation:
docker run \
--rm \
-v $DATA_DIR:/data \
-v $MODELS_DIR:/models \
yarongon/fasttext supervised \
-input /data/<training data> \
-output /models/<output model filename> \
-autotune-validation /data/<validation data> \
-autotune-duration 10000 \
-verbose 5
Run the container by mounting only the models directory. The following is a command to make a prediction accepted at the command prompt.
docker run \
--rm \
-it \
-v $MODELS_DIR:/models \
yarongon/fasttext predict /models/<model filename> -