Skip to content


Repository files navigation

SMC Data Challenge

Code for our submission to the SMC Data Challenge #1, machine-learning approaches to high-throughput phenotyping. View the solution paper here.


If the repository is newly cloned, run $ . scripts/ See more information about scripts in the scripts/ folder here. Note: all scripts can only be run on a Unix-based system, not Windows.

Getting the Dataset

If you have run the script, you do not need to download the dataset this way.

Visit the link here and download the folder. Then, unzip the download, move the folder to this repository, and rename it to 'dataset'.


This project works with Python 3.9/3.10 (not tested with Python versions lower than 3.9). To install dependencies, run $ . scripts/

File Tree

This tree contains relevant information about the files in the project.

├─ LICENSE                  
├─         // format all the sheets for each step to look nicer
├─ paper                    
│  ├─ images                    // contains images used in the paper
│  │  └─ ... 
│  ├─ main.bib                  // bibliography for paper
│  ├─ main.pdf                  // pdf of paper
│  └─ main.tex                  // paper in latex
├─ requirements.txt         // information about dependencies
├─ sample_dataset           // a sample of the images in the main dataset
│  └─ ...
├─ scripts                  // Unix shell scripts, helpful in project
│  ├─                
│  ├─            // compiles the paper with pdflatex
│  ├─             // gets all large image folders needed for this code
│  ├─                // general startup script - does everything needed
│  └─                   // creates a virtual environment and installs dependencies
├─ step1                    
│  ├─ archive                   // contains old, unused code
│  │  ├─           // was used to generate synthetic data for training OCR
│  │  └─ assets                     
│  │     └─ image_font.ttf              // the font used in the white label
│  ├─ data.xlsx                 // product of step 1
│  ├─ main.ipynb                // all code for solving the first step
│  ├─ ocr_test                  // testing possible OCR models
│  │  ├─ read_texts.csv             // stores texts read by each model
│  │  ├─ test_models.ipynb          // main notebook for testing and displaying the graphs
│  │  └─ timer.txt                  // stores times taken by each model
│  └─ pipeline                  // contains images showing the steps of the image augmentation pipeline
│     └─ ...
├─ step2
│  ├─ archive                   // contains old, unused code
│  │  ├─ model_from_contour.ipynb   // used to predict if a leaf was good or not from the contour itself 
│  │  └─ seg_from_contour           // files for the above model
│  │     ├─ data.csv                    // training data
│  │     ├─ model.pkl                   // model itself
│  │     └─ scaler.pkl                  // StandardScaler
│  ├─ data.xlsx                 // main spreadsheet after step 2 was finished
│  ├─ leaves                    // example cropped images of leaves generated later on in step 2
│  │  └─ ...
│  ├─        // loads the onnx model into the folder
│  ├─ main.ipynb                // main notebook for solving
│  ├─ model_from_masks.ipynb    // code for the model that filters segmented leaves based on leaf-ness
│  ├─ model_morph_class.ipynb   // code for the model that classifies leaf morphologies
│  ├─ morph_model               // stores files for morph classification model
│  │  ├─ data.csv                   // training data
│  │  ├─ encoders.pkl               // LabelEncoders for y-values
│  │  ├─ model.pkl                  // model itself
│  │  └─ scaler.pkl                 // StandardScaler
│  ├─ pipeline                  // contains images showing the steps of the image processing pipeline
│  │  └─ ...
│  └─ seg_from_SAM              // stores files for the segmentation filter model
│     ├─ data.csv                   // training data
│     ├─ images                     // example images of masks generated
│     │  └─ ...
│     ├─ model.pkl                  // model itself
│     └─ scaler.pkl                 // StandardScaler
├─ step3
│  ├─ data.xlsx                 // main spreadsheet after step 3 is done
│  ├─ main.ipynb                // all code required to run 
│  └─ model.pkl                 // model stored in a pickle file
└─ step4
   └─ main.ipynb                // all code required to run

© generated by Project Tree Generator.

Helpful Links


This software is released under the MIT License.



No releases published


No packages published
