This project aims to train a model using machine learning techniques to help predict probability of rainfall on the next day using data collected from various weather stations in Australia.
The data collected from the stations contains many missing values as well as categorical data that needs to be converted to numeric values. This data is divided into training and test sets, so that trained model can be evaluated on unseen data. After processing the data, it is used to train a random forest model and it is evaluated using test data. The performance of the model is improved by using grid search to find best parameters that can be used for prediction.
All the steps and their results are documented in the report file, and the output at each step can be seen in Python notebook file.