Skip to content

Latest commit

 

History

History
 
 

pixlib

PixLib - training library

pixlib is built on top of a framework whose core principles are:

  • modularity: it is easy to add a new dataset or model with custom loss and metrics;
  • reusability: components like geometric primitives, training loop, or experiment tools are reused across projects;
  • reproducibility: a training run is parametrized by a configuration, which is saved and reused for evaluation;
  • simplicity: it has few external dependencies, and can be easily grasped by a new user.

Framework structure

pixlib includes of the following components:

  • datasets/ contains the dataloaders, all inherited from BaseDataset. Each loader is configurable and produces a set of batched data dictionaries.
  • models/ contains the deep networks and learned blocks, all inherited from BaseModel. Each model is configurable, takes as input data, and outputs predictions. It also exposes its own loss and evaluation metrics.
  • geometry/ groups Numpy/PyTorch primitives for 3D vision: poses and camera models, linear algebra, optimization, etc.
  • utils/ contains various utilities, for example to manage experiments.

Datasets, models, and training runs are parametrized by omegaconf configurations. See examples of training configurations in configs/ as .yaml files.

Workflow

Training:

The following command starts a new training run:

python3 -m pixloc.pixlib.train experiment_name \
		--conf pixloc/pixlib/configs/config_name.yaml

It creates a new directory experiment_name/ in TRAINING_PATH and dumps the configuration, model checkpoints, logs of stdout, and Tensorboard summaries.

Extra flags can be given:

  • --overfit loops the training and validation sets on a single batch (useful to test losses and metrics).
  • --restore restarts the training from the last checkpoint (last epoch) of the same experiment.
  • --distributed uses all GPUs available with multiple processes and batch norm synchronization.
  • individual configuration entries to overwrite the YAML entries. Examples: train.lr=0.001 or data.batch_size=8.

Monitoring the training: Launch a Tensorboard session with tensorboard --logdir=path/to/TRAINING_PATH to visualize losses and metrics, and compare them across experiments. Press Ctrl+C to gracefully interrupt the training.

Inference with a trained model:

After training, you can easily load a model to evaluate it:

from pixloc.pixlib.utils.experiments import load_experiment

test_conf = {}  # will overwrite the training and default configurations
model = load_experiment('name_of_my_experiment', test_conf)
model = model.eval().cuda()  # optionally move the model to GPU
predictions = model(data)  # data is a dictionary of tensors
Adding new datasets or models:

We simply need to create a new file in datasets/ or models/. This makes it easy to collaborate on the same codebase. Each class should inherit from the base class, declare a default_conf, and define some specific methods. Have a look at the base files BaseDataset and BaseModel for more details. Please follow PEP 8 and use relative imports.