Skip to content

This is the main code used for my MSc dissertation project.

Notifications You must be signed in to change notification settings

tropicbird/msc-dissertation-code

Repository files navigation

Application of Machine Learning to Species Distribution Models with Citizen Science Data (MATH5872M Dissertation in Data Science and Analytics)

This code is part of my MSc dissertation project of University of Leeds. If you have any question about the project or code, please kindly let me know :). Thank you!

Abstract

Species Distribution Models (SDMs) can predict the distribution of the species being considered, which are often used for a decision-making process mainly regarding environmental conservation. SDMs are built by using species observation data and environmental data together with (often machine learning) algorithms based on the characteristics of the dataset. Traditionally, species observation data has been collected by experts for each different project. On the other hands, citizen science is an alternative scientific approach to collect data with public participation. eBird is one of the successful citizen science projects managed by the Cornell Lab of Ornithology. eBird contains more than seven million bird observation data submitted by over a half million people in the globe. The uniqueness of eBird data is that it contains both presence and absence class data. Such data is preferable to be used for SDMs. Thus, it is beneficial to explore the potential of using eBird data for SDMs to use more accurate SDMs in practice.

In this study, we evaluated the discriminatory ability and the calibration ability of SDMs. The calibration ability is often overlooked in previous studies. For their evaluation metrics, the area under a receiver operating characteristic curve (AUC), sensitivity, and specificity were used for the discriminatory ability, and calibration plots were used for the calibration ability.

We had two aims in this study. For the first aim, we explored the performance of four algorithms as SDMs with eBird data for three target species. As a result, XGBoost showed the best performance, followed by multi-layer perceptron (MLP), random forest and logistic regression algorithms. More importantly, the discriminatory ability and the calibration ability behaved very differently among the target species as well as the algorithms. Although some extent of the trade-off relationship between discrimination and calibration are known, our results strongly emphasise the importance to evaluate both abilities for the application of SDMs.

Regarding our second aim, we compared the performance of SDMs between using the short term climatic data (i.e. less than five years preceding the observation dates) and using the long term climatic data (i.e. 1970-2000 data). This aim was considered because existing studies suggest the potential importance of the short term climatic data for SDMs because of climate change. However, for our three target species, our results showed that the species occurrence was more responsive to the long term data than the short term data. Since our approach was simple and easy to interpret, we assume our approach could be widely used to understand the characteristics of species toward climatic data in general, especially to find those who are more responsive to climate change.

Our findings mentioned above seem very useful of the SDMs application, but one of the limitations of this study was the number of species we used. For the future study, it is encouraged to apply our approach to a number of species data to generalise our findings.

Since we used public data in this study, we could apply our study to any other bird species in eBird. This fact strongly suggests the considerable potential of the application of the citizen science data for SDMs. The R code and Python code used in this study are found on the author’s GitHub account (https://github.com/tropicbird/msc-dissertation-code).

About

This is the main code used for my MSc dissertation project.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published