Rain-Prediction

Weather forecasting has been in existence since 1835. But, modern forecasting methods using sensors and physical techniques demands a lot of resources. In 2009, the US spent approximately $5.1 billion on weather forecasting. Hence, our aim is to cut this cost by training a Machine learning model that will use the previous day’s data to predict tomorrow’s downpour with appreciable accuracy.

Data used

Observations were drawn from numerous weather stations. The daily observations are available from Data source:
http://www.bom.gov.au/climate/dwo/
http://www.bom.gov.au/climate/data.

weatherAUS.csv

contains 24 features and 1 output column. for different models we further make 2 more dataset from the existing dataset.

dataset_maker.py

After some preprocessing, This file has a variable numdays, Using this variable it find numdays number of continuous days of data and clups them into a single row giving us data for numdays number of continuous days into a single row, this is repeated for all such continuous dates and finally returns a csv file for all the continuous days.

balanced_dataset.py

After some preprocessing, This file first seperates the data into two parts; one with RainTomorrow : 1 and RainTomorrow : 0. Seeing the skewness of the data it takes all the RainTomorrow : 1 data and randomly sample equal number of RainTomorrow : 0, Then concats them into a single data frame and then returnes. Now this dataframe as no skewness.

EDA and basic modeling

In this Notebook we performed EDA on the data, and then applied some simple regression and decision based classifiers to get a gist for the data.

One Week Predictions

We are using above mentioned data to predict the Rain a week in advance. We have done this in 2 ways,

Using a data from a single day and predictiong rain for the comming week here we modifiy the dataset using dataset_maker keeping numdays = 7, splitting 1 day data as feature vector and the rest 6 days rain as target values. Then appiles RandomForest classifier on it. This gives us 6 models for each day.
Using a data from a Week and predictiong rain for the comming week here we modifiy the dataset using dataset_maker keeping numdays = 14, splitting 7 day data as feature vector and the rest 7 days rain as target values. Then appiles RandomForest classifier on it. This gives us 7 models for each day.

We can see that using more data gives us more positive results, although both lack in precision, which brings us to the next part.

Boosting Trees ( Improving precision )

For Improving the precision we are using a modifies training dataset which can be generated using balanced_dataset.py. This will return a dataset with no skewness wrt RainTomorrow.

Now we split the data into train and split and apply sklearn.ensemble.GradientBoostingClassifier Which finally gives us reasonable precision for our model.

How to Use ?

For using any model just use the following code

 import pickle
 model = pickle.load(open("Model-Name","rb"))

To predict, use

 predictions = model.predict("Your Values")

If you need the probability distribution for the target classes, use

 predictions = model.predict_prob("Your Values")

This will return a nx2 matrix which will contain probability for both the classes.

If you want to make any modifications to the model itself, you can download its respective notebook and make the changes accordingly.

Contact

Anshul Raj (anshul18020@iiitd.ac.in)
Siddhant Yadav (siddhant18196@iiitd.ac.in)
Yash Vats (yash18204@iiitd.ac.in)

Thank You : )

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
7 Day Prediction using Random Forest		7 Day Prediction using Random Forest
Database		Database
Weights		Weights
Boosting_Trees.ipynb		Boosting_Trees.ipynb
EDA_and_Basic_Modeling.ipynb		EDA_and_Basic_Modeling.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rain-Prediction

Data used

weatherAUS.csv

dataset_maker.py

balanced_dataset.py

EDA and basic modeling

One Week Predictions

Boosting Trees ( Improving precision )

How to Use ?

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Rain-Prediction

Data used

weatherAUS.csv

dataset_maker.py

balanced_dataset.py

EDA and basic modeling

One Week Predictions

Boosting Trees ( Improving precision )

How to Use ?

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages