# Quick start guide

This notebook details the minimal steps needed to train a particle detection model and then use that model to extract predicted particle coordinates with Topaz. For a more detailed walkthrough with visualization of outputs from the different steps, please see: https://github.com/tbepler/topaz/blob/master/tutorial/02_walkthrough.ipynb

__Topaz is assumed to be installed in a conda environment named "topaz" for purposes of running topaz commands within bash cells.__ If topaz was installed in some other way, then the "source activate topaz" lines will need to be removed or changed below. See https://github.com/tbepler/topaz#installation for details on installing Topaz.

### Demo dataset

This guide uses a demo dataset that can be downloaded [here](http://bergerlab-downloads.csail.mit.edu/topaz/topaz-tutorial-data.tar.gz) and should be unpacked directly in this (the tutorial) directory.

```
wget http://bergerlab-downloads.csail.mit.edu/topaz/topaz-tutorial-data.tar.gz
tar -xzvf topaz-tutorial-data.tar.gz
```

## The Topaz pipeline

The Topaz particle picking pipeline proceeds as follows:

(Before running Topaz): label a small (100-1000, more is likely to give better results) number of particles on your micrographs using the software of your choice.

1. (Preprocessing) Micrographs are downsampled and normalized, any labeled particle coordinates also need to be scaled appropriately
2. (Training) The particle detection model is trained on the preprocessed micrographs using the labeled particle coordinates. This requires setting the expected number of particles per micrograph.
3. (Extraction) Using the trained model, particle coordinates and their associated scores are extracted from the micrographs. This requires knowing the particle radius in pixels on the downsampled micrographs.

(Optional postprocessing): examine classifier performance, rescale particle coordinates, extract particle stack, change particle file format, filter particles by model score, etc.

### In this guide

We assume that the user already has a file containing their labeled particle coordinates (data/EMPIAR-10025/rawdata/particles.txt in this case), that the expected number of particles per micrograph is 300, and that the particle radius on the processed micrographs is 7 pixels.

# 1. Preprocessing

Downsample the micrographs by a factor of 16 and also scale the labeled coordinates to match.

In [1]:
%%bash
source activate topaz

# we'll store the processed data in data/EMPIAR-10025/processed
# so we need to make these directories first
mkdir -p data/EMPIAR-10025/processed
mkdir -p data/EMPIAR-10025/processed/micrographs

# to run the preprocess command, we pass the input micrographs as command line arguments
# preprocess will write the processed images to the directory specified with the -o argument
# -s sets the downsampling amount (in this case, we downsample by a factor of 16)
topaz preprocess -s 16 -o data/EMPIAR-10025/processed/micrographs/ data/EMPIAR-10025/rawdata/micrographs/*.mrc

# this command takes the particle coordinates matched to the original micrographs
# and scales them by 1/16 (-s is downscaling)
# the -x option applies upscaling instead
topaz convert -s 16 -o data/EMPIAR-10025/processed/particles.txt data/EMPIAR-10025/rawdata/particles.txt

# 2. Model training

Given the preprocessed micrographs and particle coordinates, we train the particle detection model using positive-unlabeled learning. This requires us to specify the expected number of particles per micrograph, which is 300 in this case.

In [2]:
%%bash
source activate topaz

# first, make sure we have the folders where we want to put the saved models
# store the saved models in saved_models/EMPIAR-10025
mkdir -p saved_models
mkdir -p saved_models/EMPIAR-10025

# Now, we train the model
# We set -n 300 to tell Topaz that we expect there to be on average 300 particles per micrograph
# and --num-workers=8 to speed up training
# the models will be saved to the saved_models/EMPIAR-10025 directory
topaz train -n 300 \
            --num-workers=8 \
            --train-images data/EMPIAR-10025/processed/micrographs/ \
            --train-targets data/EMPIAR-10025/processed/particles.txt \
            --save-prefix=saved_models/EMPIAR-10025/model \
            -o saved_models/EMPIAR-10025/model_training.txt

# Loading model: resnet8
# Model parameters: units=32, dropout=0.0, bn=on
# Receptive field: 71
# Using device=0 with cuda=True
# Loaded 30 training micrographs with 1500 labeled particles
# source	split	p_observed	num_positive_regions	total_regions
# 0	train	0.00654	43500	6652800
# Specified expected number of particle per micrograph = 300.0
# With radius = 3
# Setting pi = 0.03923160173160173
# minibatch_size=256, epoch_size=5000, num_epochs=10
  q_discrete = F.softmax(q_discrete) # dim=0 doesn't work for pytorch=0.2.0
# Done!


# 3. Extract particle coordinates

Now that we have a trained model, we use it to extract predicted particle coordinates using a particle radius of 7 pixels.

In [3]:
%%bash
source activate topaz

## make a directory to write the topaz particles to
mkdir -p data/EMPIAR-10025/topaz

## extract particle coordinates using the  trained model
## we set the radius parameter to 7 (-r 7)
## to prevent extracting particle coordinates closer than the radius of the particle
## i.e. we don't want multiple predictions for a single particle
## we also set -x 16 in order to scale the coordinates back to the original micrograph size

topaz extract -r 7 -x 16 -m saved_models/EMPIAR-10025/model_epoch10.sav \
              -o data/EMPIAR-10025/topaz/predicted_particles_all_upsampled.txt \
              data/EMPIAR-10025/processed/micrographs/*.mrc

# (Optional) change format of particle coordinates file

In [4]:
%%bash
source activate topaz

# we can convert the particles file to .star format (and others) by changing the file extension
# of the output file (data/EMPIAR-10025/topaz/predicted_particles_all_upsampled.txt)
# to .star (data/EMPIAR-10025/topaz/predicted_particles_all_upsampled.star)
# the convert command can also filter the particle table by model score using the -t argument
# e.g. -t 0 would only keep particles with scores >= 0

topaz convert -o data/EMPIAR-10025/topaz/predicted_particles_all_upsampled.star \
              data/EMPIAR-10025/topaz/predicted_particles_all_upsampled.txt

# That's it!

We now have a table containing particle coordinates for each micrograph with their corresponding model score.