This project models observed daily rainfall data over NSW, Australia. Specifically it employs building and deploying ensemble machine learning models in the cloud, in order to forecast future daily rainfall amount in Australia.
The rainfall data is collected from daily rainfall from year 1889 to year 2014 in New South Wales, Australia (approximately 12 GB). This dataset is also hosted in figshare and we can use the figshareAPI
in order to download the data locally.
The features of the data set are outputs from different climate models and the response is the amount of actual rainfall observation.
There are total of four objectives for this project:
-
Acquire the data from website using API into analysis friendly, machine learning efficient format.
-
Transfer the formated data into cloud, and set up the infrastructure for machine learning model.
-
Build distributed infrastructure in cloud, for example EMR-spark, to perform machine learning in cloud.
-
Deploy our machine learning model in cloud so our customer can run the model in could.
TBA
Members of the team and their Github username are shown below:
Name | Github username |
---|---|
Deepak Sidhu | @deepaksidhu |
Zhenrui Yu | @yzr1996 |
Bruhat Musunuru | @BruhatM |
Jiacheng Wang | @wangjc640 |
The data set in use is taken from from figshareAPI, it was created by Tomas Beuzen.
The preprocess, including download and process data, is demonstrated by this example by Dr.Gittu George.
-
All meterial used in this Github Repo of predicting daily rainfall in New South Wales are licensed under the MIT License
-
The NSW rainfall dataset is licensed under the Attribution 4.0 International (CC BY 4.0)