Setting up development environment (recommendation)
This section shows a way to configure a development environment that allows you to run tests and build documentation.
virtualenv env source env/bin/activate pip install -U pip setuptools pip install -e .[opencv,tf,test,torch]
To run unit tests:
pytest -v petastorm
NOTE: you need to have Java 1.8 to be installed for the test to pass (it's a dependency of Spark)
pytest has mulitple useful plugins. Consider installing the following plugins:
pip install pytest-xdist pytest-repeat pytest-pycharm
which enable you to run tests in parallel (
-n switch) and repeat tests multiple times (
Caching test datasets
Some unit tests rely on mock data. Generating these datasets is not very fast as it spins up local Spark isntance.
-Y switch to cache these datasets. Be careful, as the dataset generation exercises Petastorm code, hence
in some cases you would need to invalidate the cache for the test to take all code changes into account.
--cache-clear switch to do so.
The petastorm project uses sphinx autodoc capabilities, along with free documentation hosting by ReadTheDocs.org (RTD), to serve up auto-generated API docs on http://petastorm.rtfd.io .
The RTD site is configured via webhooks to trigger sphinx doc builds from changes in the petastorm github repo. Documents are configured to build the same locally or on RTD.
All the source files needed to generate the autodocs reside under
To make documents locally:
pip install -e .[docs] cd docs/autodoc # To nuke all generated HTMLs make clean # Each run incrementally updates HTML based on file changes make html
Once the HTML build process completes successfully, naviate your browser to
Some changes may require build and deployment to see, including:
- Changes to
- Changes to
- A change that makes RTD build different from a local build
To see the above documentation changes:
- One needs to create a petastorm branch and push it
- Then configure RTD to activate a version for that branch
- A project maintainer will need to effect such version activation
- The status of a built version, as well as the resulting docs, can then be viewed
By default, RTD defines the
latest version, which can be pointed at master
or another branch. Additionally, each release may have an associated RTD build
version, which must be explicitly activated in the
Versions settings page.
As with any source file, once a release is tagged, it is essentially immutable, so be sure that all the desired documentation changes are in place before tagging a release.
conf.py defines a
version property. For ease
of maintenance, we've set that to be the same version string as defined in
Known doc-build caveats and issues
- Due to RTD's build resource limitations, we are unable to pip install any of the petastorm extra-required library packages.
- Since Sphinx must be able to load a python module to read its docstrings,
the doc page for any module that imports
torchwill, unfortunately, fail to build.
- The alabaster Sphinx theme defaults to using
travis-ci.orgfor the Travis CI build badge, whereas the petastorm project is served on
.com, so we don't currently have a working Travis CI build status.
Future: auto-generate with
Sphinx has the ability to auto-generate the entire API, either via the
autosummary extension, or the
sphinx-apidoc invocation will autogenerate an
subdirectory of rST files for each of the petastorm modules. Those files can
then be glob'd into a TOC tree.
cd docs/autodocs sphinx-apidoc -fTo api ../.. ../../setup.py
- Code package reorganization
- Experimentation with sphinx settings, if available, to shorten link names
- Configuration change to auto-run
sphinx-apidocin RTD build, as opposed to committing the