-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle numerical data generated by PlantCV pipeline #31
Comments
the image analysis scripts do output quite a bit of text info that looks like numerical data. Is that what you are talking about? I can piggyback that info as part of the metadata. should I do that before or after David's demo? i don't want to interrupt his demo when I debug. |
It is not sufficiently urgent to risk breaking the demo. Having a working Jupyter analog would be more valuable. |
Here is an example of vis_tv output:
|
Yep, that's the sort of output I mean. This is the raw output from an individual analysis script, which I think is what is integrated into the demo. Alternatively we have a script that parallelizes the processing of individual images and aggregates/reformats the results, but that might not work in the computing-pipeline framework? |
If the script you mentioned works in parallel computing environment, we will take it and deploy on our supercomputers. I believe this is part of the pipeline framework. The current extractor does not work on cluster yet. But it is a work ongoing. could you share a pointer to that script? |
It's currently here: plantcv/scripts/dev/image_analysis.py We will be working on cleaning up the PlantCV repository soon to make everything clearer and less cluttered. Right now this script only uses multiprocessing but I plan on adding support for batching systems in the near future. We can add support for whatever methods are useful for the system at UIUC if that's helpful. |
ah, I see. for batch processing images, we have quite a few options to achieve automation and max performance on cluster nodes. using your script is one option too. the decision depends on the data pipeline design, e.g., the granularity of image batches, how often to process them, data movement b/w cluster and data storage. we can have a discussion as next step for these design decisions. |
the normalization/calibration step will be integrated into the plantcv pipeline by danforthcenter/plantcv#19 |
@dlebauer @gsrohde @robkooper @max-zilla I have a script setup now that can be run by the Clowder extractor (in theory). It takes an RGB and NIR image as inputs and also needs the "perspective" metadata value from Clowder (side-view or top-view). It currently spits out a table-like raw output, but it would be easy for me to reformat it. What is the best way I can output data for both Clowder and BETYdb? I was thinking JSON to put the results in Clowder so that @gsrohde could grab the data from there but we had talked about a table last time we spoke. Would the table be a file added to the dataset or something else? |
I think we decided to put the data in tabular (csv?) format. @gsrohde is writing the data importer that will convert the table to the xml format required by BETYdb. BETYdb will then be able to export as json, xml, or a table. Lets running this on the same Clowder instance as the field scanner (terraref.ncsa.illinois.edu/clowder). The next question is if we should put the data on Roger at |
If the script output a CSV file, would it need to be explicitly added to Clowder? Or I guess in other words I'm not necessarily certain how Clowder captures the output or whether I need to post it. |
@nfahlgren , could you give some details about the new script? the git branch version, script location, sample run commands, sample data, etc.? that way, @caicai89- can pick it up and start working on integrating into plantcv extractor. |
@yanliu-chn the new script relies on the latest development version of PlantCV: danforthcenter/plantcv@ead12f2. I just pushed the draft version of the script to the computing-pipeline repository here: https://github.com/terraref/computing-pipeline/blob/master/scripts/plantcv/PlantcvClowderIndoorAnalysis.py. In addition to needing the PlantCV library it relies on a hard-coded image mask file from PlantCV (mask_brass_tv_z1_L1.png). We probably need to reference it differently, but I may be able to bypass the need for the image altogether. |
@robkooper can you address this? I think that the csv file should be explicitly added to Clowder, and that the code that inserts into BETYdb will need to find the csv file and the metadata file. |
@nfahlgren @dlebauer how do I know if an NIR or VIS image is a top or side view? @caicai89- Yaping, on ROGER, please check out Noah's plantcv version and the script mentioned above. We have a few sample images in /gpfs/largeblockFS/scratch/arpae/plantcv-input/samples/ . To set up opencv and plantcv environment, please refer to /gpfs/largeblockFS/projects/arpae/sw/extractors-plantcv/bin/extract.sh and the following bash setup: module purge module load git mpich gdal2-stack anaconda parallel export PLANTCV_HOME=/projects/arpae/sw/plantcv export PYTHONPATH=/projects/arpae/sw/opencv/lib/python2.7/site-packages:/project s/arpae/sw/plantcv/lib:$PYTHONPATH export PLANTCV_VENV=/projects/arpae/sw/pyenv.plantiv export PLANTCV_EX=/projects/arpae/sw/extractors-plantcv |
in image metadata:
Rule to pair two images as input to Noah's script: camera_type is different; rotation_angle must be the same; perspective must be the same. Each snapshot currently contains 10 images. The pairing process should create 5 pairs and call Noah's script 5 times. e.g.: http://141.142.209.122/clowder/files/56d48457e4b0c7e3b16ea709 |
Summary for the discussion with Noah and Luigi: Inside of file extractor, we can access dataset metadata. We can leverage this feature to implement dataset-level plantcv extractor. We define field in dataset metadata, say plantcv_file_info, as an array. Each element in the array has the following info: { "fileid": "$fileId", "camera_type": "NIR|VIS", "perspective": "side|top-view", rotation_angle: number (0-360) } We also have a dataset metadata field called plantcv_processed to record files that have been processed at dataset level. The update to these two fields is add only, no override. Then the logic of a file-based extractor would be:
@robkooper please check if it works. |
@dlebauer @yanliu-chn I just thought of one more thing, do we want to populate BETYdb with data for a plant or the individual images? If we need to aggregate the data from a snapshot (all one plant) then we will need to process the whole Clowder dataset and not just an image pair. We would then post only one result file back to Clowder. |
Either way i guess we have to append the tabular output of each pair to dataset metadata since there is no dataset-level extractor. |
@nfahlgren the general idea is that BETYdb will hold the 'trait vectors' that could be used in GWAS. It can store data at the subsample, replicate plant / plot, or site level. I think it would make sense to aggregate to the plant level for each time point. |
@dlebauer makes sense. I wonder if the extractor could just provide the script with the dataset key? Then I can query out the files with the API. The general format of the output would looks similar to this (but with more traits):
|
that is possible. @nfahlgren so this output is generated by looking at all 10 images or just 2? |
@yanliu-chn that's right, the output would be the result of looking at all 10 images instead of only two. Internally the image pairs would be analyzed together, but the overall results would be aggregated. |
@nfahlgren So does it mean you will have a separate script to look at all 10 images? the current script takes one NIR and one VIS as input. |
@yanliu-chn I will modify the current script to read all image files in a given dataset. |
@robkooper does this pika IncompatibleProtocolError look familiar? I've not encountered this before. 2016-05-11 10:50:12,015 ERROR : pika.adapters.base_connection - Read empty data, calling disconnect |
I think this might be related to an older version of pika, try and see if you can upgrade it. |
2 other pieces that @caicai89- is waiting for:
|
@robkooper I tried and it did not work.
|
you will need to specify the host in the rabbitmq url, in our case that should be clowder-dev for now. so set your environment variable RABBITMQ_URI="amqp://xxx:xxxxxxx@rabbitmq.ncsa.illinois.edu:15672/clowder-dev" |
@robkooper same error... |
@robkooper currently, the pika version is 0.10.0 |
@max-zilla @robkooper I'm finishing up the PlantCV extractor code and am running into an issue when I try to post results back to the files in Clowder. I might just be using the wrong API call. I have the PlantCV results encoded in a JSON string and am using a call like:
I get an error 500 when I do this. |
@nfahlgren can you share the metadata object you're sending? I'll look into it. I know that the metadata we had for a while had a field name that was something like "other metadata here..." and the three periods in the name were causing errors with the JSON parsing in Clowder - I had a little "clean_json" method to change those periods to underscores. |
@max-zilla they are a bit different, depending on the image, but it definitely dies on the first one which looks like this:
|
@nfahlgren : is the new extractor code somewhere online? |
@nfahlgren can you try:
If posting to /metadata works we're in pretty good shape - we'll need to identify some context links that can point to metadata definitions (if you don't have something like that already). |
@max-zilla posting to metadata instead of metadata.jsonld did the trick. The in_bounds thing was converted to lowercase by the json.dumps function. It's working now though! |
@nfahlgren and @max-zilla what is the status of this? |
@dlebauer @nfahlgren Noah, I'll look at your pull request before our meeting tomorrow. |
@caicai89- @robkooper is "terra" the correct exchange to use for this extractor config? as opposed to "clowder"? @caicai89- we can sit down and try to figure this out tomorrow if you have time. something that would be worth doing is installing a local test instance of Clowder and RabbitMQ to test this on a local environment: https://opensource.ncsa.illinois.edu/confluence/display/CATS/Installing+Clowder
|
yes for plantcv terra is the right one to use. clowder is good for generic extractors. |
The text was updated successfully, but these errors were encountered: