File curator is a utility gear that performs a user provided custom curation script on a single file.
- curator: A python script implementing a FileCurator class. See below.
- file-input: File to curate.
- additional-input-one, additional-input-two, additional-input-three: Optional additional inputs to be provided. For example a CSV of data could be passed in that the curator checks against in order to properly classify a file.
- debug (boolean, default
False
): Include debug statements in output. - tag (string, default
""
): The tag to be added on input file upon run completion.
The FileCurator
class is provided in the flywheel_gear_toolkit.
This class should be extended in order to define a custom curation script.
Example curate.py
script which could be passed as the curator input. This
example script trivially sets the file classification 'Measurement' key to 'T1'
import logging
from pathlib import Path
from typing import Dict, Any
import pydicom
from flywheel_gear_toolkit import GearToolkitContext
from flywheel_gear_toolkit.utils.curator import FileCurator
from flywheel_gear_toolkit.utils.reporters import AggregatedReporter
log = logging.getLogger(__name__)
class Curator(FileCurator):
def __init__(self, **kwargs):
# Set gear context, and read only flywheel Client in parent constructor
super().__init__(**kwargs)
# Define curate_file. The input file will be passed into this method
def curate_file(self, file_: Dict[str, Any]):
"""Sets file measurement to T1.
file_ format defined here: https://gitlab.com/flywheel-io/public/gears/-/tree/master/spec#the-input-configuration
file_ : {
'base': 'file',
'location': {
'path': '<path>',
'name': '<file_name>'
},
'hierarchy': {
'type': '<container_type>',
'name': '<file_name>'
},
"object" : {
"info" : {},
"mimetype" : "application/octet-stream",
"tags" : [],
"measurements" : [],
"type" : "<file_type>",
"modality" : None,
"size" : <size>
}
}
"""
container_type = file_.get('hierarchy').get('type')
# Set up output metadata
file_metadata = {}
file_path = file_.get('location').get('path')
label = file_.get('location').get('name')
# update classification
if file_.get('object').get('type') == 'dicom':
file_metadata['classification'] = {
'Measurement':['T1']
}
# Specify which file to update by passing in file name
file_metadata['name'] = label
#output metadata: https://gitlab.com/flywheel-io/public/gears/-/tree/master/spec#output-metadata
metadata = {
container_type: {
'files': [file_metadata]
}
}
out_file = Path(self.context.output_dir) / '.metadata.json'
with open(outfile,'w') as out:
json.dump(metadata, out)
log.info('Wrote metadata')
The file-curator gear comes with the following python packages installed:
- lxml
- pandas
- nibabel
- Pillow
- piexif
- pydicom
- pypng
- flywheel-gear-toolkit Note: See package versions in ./pyproject.toml
If you need other dependencies that aren't installed by default, these can be installed in two ways.
Extra packages can be specified as an argument when instantiating the FileCurator class. This is the recommended way to install extra packages.
class Curator(FileCurator):
def __init__(self, **kwargs):
super().__init__(context=GearToolkitContext(), extra_packages=["polars"], **kwargs)
However, if you have a requirements.txt
file you wish to use, you can specify this
file as one of the additional inputs for installation in the Curator.__init__
method:
from flywheel_gear_toolkit.utils import install_requirements
...
class Curator(FileCurator):
def __init__(self, **kwargs):
super().__init__(**kwargs)
install_requirements(self.additional_input_one)