Add fairseq to PyPI (facebookresearch#495)

Summary: - fairseq can now be installed via pip: `pip install fairseq` - command-line tools are globally accessible: `fairseq-preprocess`, `fairseq-train`, `fairseq-generate`, etc. Pull Request resolved: facebookresearch#495 Differential Revision: D14017761 Pulled By: myleott fbshipit-source-id: 10c9f6634a3056074eac2f33324b4f1f404d4235
wouterkool · Feb 9, 2019 · fbd4cef · fbd4cef
1 parent cea0e4b
commit fbd4cef
Show file tree

Hide file tree

Showing 30 changed files with 143 additions and 136 deletions.
diff --git a/README.md b/README.md
@@ -45,10 +45,18 @@ Please follow the instructions here: https://github.com/pytorch/pytorch#installa
 If you use Docker make sure to increase the shared memory size either with
 `--ipc=host` or `--shm-size` as command line options to `nvidia-docker run`.
 
-After PyTorch is installed, you can install fairseq with:
+After PyTorch is installed, you can install fairseq with `pip`:
 ```
-pip install -r requirements.txt
-python setup.py build develop
+pip install fairseq
+```
+
+**Installing from source**
+
+To install fairseq from source and develop locally:
+```
+git clone https://github.com/pytorch/fairseq
+cd fairseq
+pip install --editable .
 ```
 
 # Getting Started

diff --git a/docs/command_line_tools.rst b/docs/command_line_tools.rst
@@ -5,81 +5,81 @@ Command-line Tools
 
 Fairseq provides several command-line tools for training and evaluating models:
 
-- :ref:`preprocess.py`: Data pre-processing: build vocabularies and binarize training data
-- :ref:`train.py`: Train a new model on one or multiple GPUs
-- :ref:`generate.py`: Translate pre-processed data with a trained model
-- :ref:`interactive.py`: Translate raw text with a trained model
-- :ref:`score.py`: BLEU scoring of generated translations against reference translations
-- :ref:`eval_lm.py`: Language model evaluation
+- :ref:`fairseq-preprocess`: Data pre-processing: build vocabularies and binarize training data
+- :ref:`fairseq-train`: Train a new model on one or multiple GPUs
+- :ref:`fairseq-generate`: Translate pre-processed data with a trained model
+- :ref:`fairseq-interactive`: Translate raw text with a trained model
+- :ref:`fairseq-score`: BLEU scoring of generated translations against reference translations
+- :ref:`fairseq-eval-lm`: Language model evaluation
 
 
-.. _preprocess.py:
+.. _fairseq-preprocess:
 
-preprocess.py
-~~~~~~~~~~~~~
+fairseq-preprocess
+~~~~~~~~~~~~~~~~~~
 .. automodule:: preprocess
 
     .. argparse::
-        :module: preprocess
-        :func: get_parser
-        :prog: preprocess.py
+        :module: fairseq.options
+        :func: get_preprocessing_parser
+        :prog: fairseq-preprocess
 
 
-.. _train.py:
+.. _fairseq-train:
 
-train.py
-~~~~~~~~
+fairseq-train
+~~~~~~~~~~~~~
 .. automodule:: train
 
     .. argparse::
         :module: fairseq.options
         :func: get_training_parser
-        :prog: train.py
+        :prog: fairseq-train
 
 
-.. _generate.py:
+.. _fairseq-generate:
 
-generate.py
-~~~~~~~~~~~
+fairseq-generate
+~~~~~~~~~~~~~~~~
 .. automodule:: generate
 
     .. argparse::
         :module: fairseq.options
         :func: get_generation_parser
-        :prog: generate.py
+        :prog: fairseq-generate
 
 
-.. _interactive.py:
+.. _fairseq-interactive:
 
-interactive.py
-~~~~~~~~~~~~~~
+fairseq-interactive
+~~~~~~~~~~~~~~~~~~~
 .. automodule:: interactive
 
     .. argparse::
         :module: fairseq.options
         :func: get_interactive_generation_parser
-        :prog: interactive.py
+        :prog: fairseq-interactive
 
 
-.. _score.py:
+.. _fairseq-score:
 
-score.py
-~~~~~~~~
+fairseq-score
+~~~~~~~~~~~~~
 .. automodule:: score
 
     .. argparse::
-        :module: score
+        :module: fairseq_cli.score
         :func: get_parser
-        :prog: score.py
+        :prog: fairseq-score
 
 
-.. _eval_lm.py:
+.. _fairseq-eval-lm:
 
-eval_lm.py
-~~~~~~~~~~
+fairseq-eval-lm
+~~~~~~~~~~~~~~~
 .. automodule:: eval_lm
 
     .. argparse::
         :module: fairseq.options
         :func: get_eval_lm_parser
-        :prog: eval_lm.py
+        :prog: fairseq-eval-lm
diff --git a/docs/conf.py b/docs/conf.py
@@ -60,9 +60,9 @@
 # built documents.
 #
 # The short X.Y version.
-version = '0.6.0'
+version = '0.6.1'
 # The full version, including alpha/beta/rc tags.
-release = '0.6.0'
+release = '0.6.1'
 
 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.

diff --git a/docs/data.rst b/docs/data.rst
@@ -46,8 +46,6 @@ Dictionary
 Iterators
 ---------
 
-.. autoclass:: fairseq.data.BufferedIterator
-    :members:
 .. autoclass:: fairseq.data.CountingIterator
     :members:
 .. autoclass:: fairseq.data.EpochBatchIterator

diff --git a/docs/getting_started.rst b/docs/getting_started.rst
@@ -15,17 +15,17 @@ done with the
 script using the ``wmt14.en-fr.fconv-cuda/bpecodes`` file. ``@@`` is
 used as a continuation marker and the original text can be easily
 recovered with e.g. ``sed s/@@ //g`` or by passing the ``--remove-bpe``
-flag to :ref:`generate.py`. Prior to BPE, input text needs to be tokenized
+flag to :ref:`fairseq-generate`. Prior to BPE, input text needs to be tokenized
 using ``tokenizer.perl`` from
 `mosesdecoder <https://github.com/moses-smt/mosesdecoder>`__.
 
-Let's use :ref:`interactive.py` to generate translations
+Let's use :ref:`fairseq-interactive` to generate translations
 interactively. Here, we use a beam size of 5:
 
 .. code-block:: console
 
     > MODEL_DIR=wmt14.en-fr.fconv-py
-    > python interactive.py \
+    > fairseq-interactive \
         --path $MODEL_DIR/model.pt $MODEL_DIR \
         --beam 5 --source-lang en --target-lang fr
     | loading model(s) from wmt14.en-fr.fconv-py/model.pt
@@ -66,7 +66,7 @@ datasets: IWSLT 2014 (German-English), WMT 2014 (English-French) and WMT
     > bash prepare-iwslt14.sh
     > cd ../..
     > TEXT=examples/translation/iwslt14.tokenized.de-en
-    > python preprocess.py --source-lang de --target-lang en \
+    > fairseq-preprocess --source-lang de --target-lang en \
         --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
         --destdir data-bin/iwslt14.tokenized.de-en
 
@@ -76,17 +76,17 @@ This will write binarized data that can be used for model training to
 Training
 --------
 
-Use :ref:`train.py` to train a new model. Here a few example settings that work
+Use :ref:`fairseq-train` to train a new model. Here a few example settings that work
 well for the IWSLT 2014 dataset:
 
 .. code-block:: console
 
     > mkdir -p checkpoints/fconv
-    > CUDA_VISIBLE_DEVICES=0 python train.py data-bin/iwslt14.tokenized.de-en \
+    > CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/iwslt14.tokenized.de-en \
         --lr 0.25 --clip-norm 0.1 --dropout 0.2 --max-tokens 4000 \
         --arch fconv_iwslt_de_en --save-dir checkpoints/fconv
 
-By default, :ref:`train.py` will use all available GPUs on your machine. Use the
+By default, :ref:`fairseq-train` will use all available GPUs on your machine. Use the
 ``CUDA_VISIBLE_DEVICES`` environment variable to select specific GPUs and/or to
 change the number of GPU devices that will be used.
 
@@ -98,12 +98,12 @@ Generation
 ----------
 
 Once your model is trained, you can generate translations using
-:ref:`generate.py` **(for binarized data)** or
-:ref:`interactive.py` **(for raw text)**:
+:ref:`fairseq-generate` **(for binarized data)** or
+:ref:`fairseq-interactive` **(for raw text)**:
 
 .. code-block:: console
 
-    > python generate.py data-bin/iwslt14.tokenized.de-en \
+    > fairseq-generate data-bin/iwslt14.tokenized.de-en \
         --path checkpoints/fconv/checkpoint_best.pt \
         --batch-size 128 --beam 5
     | [de] dictionary: 35475 types
@@ -136,7 +136,7 @@ to training on 8 GPUs:
 
 .. code-block:: console
 
-    > CUDA_VISIBLE_DEVICES=0 python train.py --update-freq 8 (...)
+    > CUDA_VISIBLE_DEVICES=0 fairseq-train --update-freq 8 (...)
 
 Training with half precision floating point (FP16)
 --------------------------------------------------
@@ -152,7 +152,7 @@ Fairseq supports FP16 training with the ``--fp16`` flag:
 
 .. code-block:: console
 
-    > python train.py --fp16 (...)
+    > fairseq-train --fp16 (...)
 
 Lazily loading large training datasets
 --------------------------------------
@@ -178,7 +178,7 @@ replacing ``node_rank=0`` with ``node_rank=1`` on the second node:
     > python -m torch.distributed.launch --nproc_per_node=8 \
         --nnodes=2 --node_rank=0 --master_addr="192.168.1.1" \
         --master_port=1234 \
-        train.py data-bin/wmt16_en_de_bpe32k \
+        $(which fairseq-train) data-bin/wmt16_en_de_bpe32k \
         --arch transformer_vaswani_wmt_en_de_big --share-all-embeddings \
         --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
         --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 \

diff --git a/docs/lr_scheduler.rst b/docs/lr_scheduler.rst
@@ -29,6 +29,6 @@ epoch boundaries via :func:`step`.
 .. autoclass:: fairseq.optim.lr_scheduler.reduce_lr_on_plateau.ReduceLROnPlateau
     :members:
     :undoc-members:
-.. autoclass:: fairseq.optim.lr_scheduler.reduce_angular_lr_scheduler.TriangularSchedule
+.. autoclass:: fairseq.optim.lr_scheduler.triangular_lr_scheduler.TriangularSchedule
     :members:
     :undoc-members:
diff --git a/docs/overview.rst b/docs/overview.rst
@@ -49,7 +49,10 @@ new plug-ins.
 
 **Loading plug-ins from another directory**
 
-New plug-ins can be defined in a custom module stored in the user system. In order to import the module, and make the plugin available to *fairseq*, the command line supports the ``--user-dir`` flag that can be used to specify a custom location for additional modules to load into *fairseq*.
+New plug-ins can be defined in a custom module stored in the user system. In
+order to import the module, and make the plugin available to *fairseq*, the
+command line supports the ``--user-dir`` flag that can be used to specify a
+custom location for additional modules to load into *fairseq*.
 
 For example, assuming this directory tree::
 
@@ -65,6 +68,6 @@ with ``__init__.py``::
   def transformer_mmt_big(args):
       transformer_vaswani_wmt_en_de_big(args)
 
-it is possible to invoke the ``train.py`` script with the new architecture with::
+it is possible to invoke the :ref:`fairseq-train` script with the new architecture with::
 
-  python3 train.py ... --user-dir /home/user/my-module -a my_transformer --task translation
+  fairseq-train ... --user-dir /home/user/my-module -a my_transformer --task translation
diff --git a/docs/tutorial_classifying_names.rst b/docs/tutorial_classifying_names.rst
@@ -28,7 +28,7 @@ train, valid and test sets.
 Download and extract the data from here:
 `tutorial_names.tar.gz <https://dl.fbaipublicfiles.com/fairseq/data/tutorial_names.tar.gz>`_
 
-Once extracted, let's preprocess the data using the :ref:`preprocess.py`
+Once extracted, let's preprocess the data using the :ref:`fairseq-preprocess`
 command-line tool to create the dictionaries. While this tool is primarily
 intended for sequence-to-sequence problems, we're able to reuse it here by
 treating the label as a "target" sequence of length 1. We'll also output the
@@ -37,7 +37,7 @@ enhance readability:
 
 .. code-block:: console
 
-  > python preprocess.py \
+  > fairseq-preprocess \
     --trainpref names/train --validpref names/valid --testpref names/test \
     --source-lang input --target-lang label \
     --destdir names-bin --output-format raw
@@ -324,19 +324,19 @@ following contents::
 4. Training the Model
 ---------------------
 
-Now we're ready to train the model. We can use the existing :ref:`train.py`
+Now we're ready to train the model. We can use the existing :ref:`fairseq-train`
 command-line tool for this, making sure to specify our new Task (``--task
 simple_classification``) and Model architecture (``--arch
 pytorch_tutorial_rnn``):
 
 .. note::
 
   You can also configure the dimensionality of the hidden state by passing the
-  ``--hidden-dim`` argument to :ref:`train.py`.
+  ``--hidden-dim`` argument to :ref:`fairseq-train`.
 
 .. code-block:: console
 
-  > python train.py names-bin \
+  > fairseq-train names-bin \
     --task simple_classification \
     --arch pytorch_tutorial_rnn \
     --optimizer adam --lr 0.001 --lr-shrink 0.5 \

diff --git a/docs/tutorial_simple_lstm.rst b/docs/tutorial_simple_lstm.rst
@@ -341,7 +341,7 @@ function decorator. Thereafter this named architecture can be used with the
 3. Training the Model
 ---------------------
 
-Now we're ready to train the model. We can use the existing :ref:`train.py`
+Now we're ready to train the model. We can use the existing :ref:`fairseq-train`
 command-line tool for this, making sure to specify our new Model architecture
 (``--arch tutorial_simple_lstm``).
 
@@ -352,7 +352,7 @@ command-line tool for this, making sure to specify our new Model architecture
 
 .. code-block:: console
 
-  > python train.py data-bin/iwslt14.tokenized.de-en \
+  > fairseq-train data-bin/iwslt14.tokenized.de-en \
     --arch tutorial_simple_lstm \
     --encoder-dropout 0.2 --decoder-dropout 0.2 \
     --optimizer adam --lr 0.005 --lr-shrink 0.5 \
@@ -362,12 +362,12 @@ command-line tool for this, making sure to specify our new Model architecture
   | epoch 052 | valid on 'valid' subset | valid_loss 4.74989 | valid_ppl 26.91 | num_updates 20852 | best 4.74954
 
 The model files should appear in the :file:`checkpoints/` directory. While this
-model architecture is not very good, we can use the :ref:`generate.py` script to
+model architecture is not very good, we can use the :ref:`fairseq-generate` script to
 generate translations and compute our BLEU score over the test set:
 
 .. code-block:: console
 
-  > python generate.py data-bin/iwslt14.tokenized.de-en \
+  > fairseq-generate data-bin/iwslt14.tokenized.de-en \
     --path checkpoints/checkpoint_best.pt \
     --beam 5 \
     --remove-bpe
@@ -498,7 +498,7 @@ Finally, we can rerun generation and observe the speedup:
 
   # Before
 
-  > python generate.py data-bin/iwslt14.tokenized.de-en \
+  > fairseq-generate data-bin/iwslt14.tokenized.de-en \
     --path checkpoints/checkpoint_best.pt \
     --beam 5 \
     --remove-bpe
@@ -508,7 +508,7 @@ Finally, we can rerun generation and observe the speedup:
 
   # After
 
-  > python generate.py data-bin/iwslt14.tokenized.de-en \
+  > fairseq-generate data-bin/iwslt14.tokenized.de-en \
     --path checkpoints/checkpoint_best.pt \
     --beam 5 \
     --remove-bpe

diff --git a/examples/language_model/README.md b/examples/language_model/README.md
@@ -24,20 +24,20 @@ $ cd ../..
 # Binarize the dataset:
 $ TEXT=examples/language_model/wikitext-103
 
-$ python preprocess.py --only-source \
+$ fairseq-preprocess --only-source \
   --trainpref $TEXT/wiki.train.tokens --validpref $TEXT/wiki.valid.tokens --testpref $TEXT/wiki.test.tokens \ 
   --destdir data-bin/wikitext-103
 
 # Train the model:
 # If it runs out of memory, try to reduce max-tokens and max-target-positions
 $ mkdir -p checkpoints/wikitext-103
-$ python train.py --task language_modeling data-bin/wikitext-103 \
+$ fairseq-train --task language_modeling data-bin/wikitext-103 \
   --max-epoch 35 --arch fconv_lm_dauphin_wikitext103 --optimizer nag \
   --lr 1.0 --lr-scheduler reduce_lr_on_plateau --lr-shrink 0.5 \
   --clip-norm 0.1 --dropout 0.2 --weight-decay 5e-06 --criterion adaptive_loss \
   --adaptive-softmax-cutoff 10000,20000,200000 --max-tokens 1024 --tokens-per-sample 1024
 
 # Evaluate:
-$ python eval_lm.py data-bin/wikitext-103 --path 'checkpoints/wiki103/checkpoint_best.pt'
+$ fairseq-eval-lm data-bin/wikitext-103 --path 'checkpoints/wiki103/checkpoint_best.pt'
 
 ```