Skip to content

GSoC 2022 Project Ideas

henry senyondo edited this page Feb 25, 2022 · 6 revisions

Please ask questions through issues on the respective project's repo.

Tags available @henrykironde, @juniperlsimonis, @MarconiS, @bw4sz, @ethanwhite

  • Preferred names (Henry, Juniper, Sergio, Ben, Ethan)
  • Preferred_greeting (Hi|Hello|Dear|Thanks|Thank you [First_name])

Join the chat at https://gitter.im/weecology/retriever

The code of conduct should be your first read.

Multi-class training and prediction in DeepForest

Approach

DeepForest is an open source Python package for detecting trees (and other organisms) in remote sensing (RGB) imagery from airplanes and drones. The underlying model structure allows for classification as well as detection, allowing DeepForest to be used for identifying trees to species or distinguishing between alive and dead trees, but support for the multi-class aspects of the package need further development. This project would involve a combination of software engineering to improve the UI for working with multi-class models and developing models that are pre-trained to provide features that are useful for transfer learning for species classification and alive/dead classification.

Source Code: https://github.com/weecology/DeepForest

Degree of difficulty

  • Intermediate, long (350 hours)

Skills:

  • Python
  • Deep learning using Pytorch
  • git/GitHub

Expected outcomes

  • An improved UI for working with multi-class models and developing models that are pre-trained to provide features that are useful for transfer learning for species classification and alive/dead classification.

Mentors

  • @bw4sz
  • @henrysenyondo
  • @ethanwhite

High-performance parallel computing for model fitting and prediction in Portalcasting

Approach

Portalcasting is an open source R package that supports ecological forecasting of biodiversity for a long-term ecological research program that has been studying desert biodiversity for 45 years. The package provides automated data integration and modular models to produce forecasts for a range of ecological outcomes. While the forecasting system makes large numbers of forecasts it currently does this sequentially instead of in parallel. This project would involve the parallelization of the code base to allow for running on multiple cores both on individual machines and HPCs.

Source Code: https://github.com/weecology/portalcasting

Degree of difficulty

  • Intermediate, long (350 hours)

Skills:

  • R
  • Parallel programming for embarrassingly parallel problems (i.e., the simple end of parallel programming)
  • git/GitHub

Expected outcomes

  • A parallelized program which will reduce the runtime.

Mentors

  • @juniperlsimonis
  • @henrysenyondo
  • @ethanwhite

The NeonVegWrangler: The NEON vegetation structure (vst) and Airborne Observation Platform Data manager

Rationale

The National Ecological Observatory Network (NEON) collects and provides long-term, open-access ecological data. The NEON Data API provides access to this data. Users must query the data in an optimal way. The NeonVegWrangleR helps in retrieving a targeted sample of this data, clean it and provide researchers in a format ready for ecological analyses.

Approach

The NeonVegWrangler is an R and Python package that handles the integration of NEON vegetation structure (vst) and airborne observation platform (AOP) data. The package communicates with the NEON API to obtain and restructure data based on predefined protocols. This project would involve converting R functions in Python and refactoring the software to meet Python packaging release standards. The project would involve improving the existing software through the addition of tests, setting up a CI/CD platform for the software, and documentation.

Source Code: https://github.com/weecology/neonVegWrangleR

Degree of difficulty

  • Intermediate, long (350 hours)

Skills:

  • Python and Python package deployment
  • somehow familiar with R programming language
  • git/GitHub
  • experience with CI/CD
  • Software testing

Expected outcomes

  • An improved software product that retrieves Neon data in Python

Mentors

  • @MarconiS
  • @henrysenyondo
  • @ethanwhite