Skip to content
Algorithms used for polldentify
Branch: master
Clone or download
Pull request Compare This branch is even with brendabrandy:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
algorithm
data_collection
data_finalize
README.md

README.md

Polldentify algorithms

Polldentify is a web app that tracks and traces the sources of pollution within a US city base on its geographical location and wind direction at a moment in history. It compiles 15 years of data, run all the data through an algorithm inspired by the Gaussian Dispersion Model of Pollution, and showcased the info through http://polldentify.mybluemix.net/#/ .This github open sources all the programs that we use to collect, compile and analyze the data

##Data_collection

airdata.py: crawls concentration of pollutants (ozone, sulphur dioxide, nitrogen dioxide, carbon monoxide, PM10) from the EPA and save it as a dataframe (can be converted to csv through pd.to_csv).

singledata.py: in case your computer doesn't have enough memory for all the pollutant files downloaded through airdata.py, this allows you to download each pollutant one at a time.

openfile.m: MATLAB program to download nc files on wind data from NOAA and save it as a csv file.

wind_extract.py / wind_extract_final.py: extract wind data from file generated from openfile.m

groupbylocation.py: downloads altitude information at each sampling longitude and latitude point through GoogleAPI and save it as csv file.

combine.py: combine wind data, altitude, longitude, latitude and air pollutant data into one final dataframe, indexed by date and save it as a csv file.

##Data_finalize

fillinna.py: in case of missing data, we either interpolate the data or use a randomize function to put a best possible value for the data.

frame_finalize.py / new_finalize.py: finalize the dataframe so that it could be directly delivered into an algorithm.

##Algorithms algorithm.py: This file contains the general algorithm for manipulating the final dataframe. For a location (x,y,z), the location would have pollution concentration C and the rate of source emission is Q

new_linearreg.py: linear regression algorithm to predict pollutant concentration in the future. Open for improvements since linear regression is a very basic machine learning algorithm.

You can’t perform that action at this time.