# More: Speeding up loading by pre-parsing
Since `bioimageloader` is designed for computer vision ML/DL, it expects to have image
arrays both for an image and its annotation. But datasets sometimes come with encoded
annotation or in formats other than image formats. By its design, `bioimageloader`
does not transform or modify the original source. That being said, as you may guess,
decoding and parsing them to build image arrays take a while and easily become a bottle
neck. The solution is to simply **pre-parse them only once and save them**.

Let's see an example. We have [*ComputationalPathology*](https://ieeexplore.ieee.org/document/7872382)
dataset, which comes with fully annotated instance masks. It is one of the high quality
datasets you can find for instance segmentation tasks. But its annotations are stored
in `.xml` format and thus need a parsing step. Conveniently, you do not have to worry
about how to parse them, because it is already implemented in `bioimageloader`. As
mentioned, however, iterating these masks and parsing them one by one is a huge bottle neck.


In [1]:
from bioimageloader.collections import ComputationalPathology

In [2]:
# `mask_tif` is specific to ComputationalPathology dataset
compath = ComputationalPathology(
    '../../Data/ComputationalPathology',
    mask_tif=False  # by default
)
print(compath, len(compath))

ComPath 30


In [3]:
%%timeit
for data in compath:
    ...

16.1 s ± 41.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [4]:
# You can see annotation is stored in .xml format
compath.anno_dict[0]

PosixPath('../../Data/ComputationalPathology/Annotations/TCGA-18-5592-01Z-00-DX1.xml')

Below `save_xml_to_tif()` method is specific and bound to `ComputationalPathology`.
What is does is clear. Let's print out its documentation.

In [5]:
compath.save_xml_to_tif?

[0;31mSignature:[0m [0mcompath[0m[0;34m.[0m[0msave_xml_to_tif[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Parse .xml to mask and write it as tiff file

Having masks in images is much faster than parsing .xml for each call.
This func iterates through ``anno_dict``, parse and save each in .tif
format in the same annotation directory. Re-initiate an instance with
``mask_tif`` argument to load them.
[0;31mFile:[0m      ~/workspace/bioimageloader/bioimageloader/collections/_compath.py
[0;31mType:[0m      method


Let's execute it

In [6]:
compath.save_xml_to_tif()

[0/29] Wrote '../../Data/ComputationalPathology/Annotations/TCGA-18-5592-01Z-00-DX1.tif'
[1/29] Wrote '../../Data/ComputationalPathology/Annotations/TCGA-21-5784-01Z-00-DX1.tif'
[2/29] Wrote '../../Data/ComputationalPathology/Annotations/TCGA-21-5786-01Z-00-DX1.tif'
[3/29] Wrote '../../Data/ComputationalPathology/Annotations/TCGA-38-6178-01Z-00-DX1.tif'
[4/29] Wrote '../../Data/ComputationalPathology/Annotations/TCGA-49-4488-01Z-00-DX1.tif'
[5/29] Wrote '../../Data/ComputationalPathology/Annotations/TCGA-50-5931-01Z-00-DX1.tif'
[6/29] Wrote '../../Data/ComputationalPathology/Annotations/TCGA-A7-A13E-01Z-00-DX1.tif'
[7/29] Wrote '../../Data/ComputationalPathology/Annotations/TCGA-A7-A13F-01Z-00-DX1.tif'
[8/29] Wrote '../../Data/ComputationalPathology/Annotations/TCGA-AR-A1AK-01Z-00-DX1.tif'
[9/29] Wrote '../../Data/ComputationalPathology/Annotations/TCGA-AR-A1AS-01Z-00-DX1.tif'
[10/29] Wrote '../../Data/ComputationalPathology/Annotations/TCGA-AY-A8YK-01A-01-TS1.tif'
[11/29] Wrote '../..

We will re-initialize an instance with `mask_tif=True` to load pre-parsed masks in `.tif` format.

In [7]:
compath_tif = ComputationalPathology(
    '../../Data/ComputationalPathology',
    mask_tif=True
)

In [8]:
%%timeit
for data in compath_tif:
    ...

1.21 s ± 7.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Iteration that took **16.1 seconds** now takes **1.21 seconds**!