Skip to content

wangzhiwubigdata/FlinkExperiments

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 

Repository files navigation

FlinkExperiments

Project

This project is a sample project for Apache Flink. The application parses the Quality Controlled Local Climatological Data (QCLCD) of March 2015, calculates the maximum daily temperature of the stream by using Apache Flink and writes the results back into an Elasticsearch and PostgreSQL database.

Dataset

The data is the Quality Controlled Local Climatological Data (QCLCD):

Quality Controlled Local Climatological Data (QCLCD) consist of hourly, daily, and monthly summaries for approximately 1,600 U.S. locations. Daily Summary forms are not available for all stations. Data are available beginning January 1, 2005 and continue to the present. Please note, there may be a 48-hour lag in the availability of the most recent data.

The data is available at:

Running the Examples

The records in the Quality Controlled Local Climatological Data (QCLCD) dataset are not sorted by the timestamp. The dataset needs to be prepared first, so that all records are sorted ascending by the time of measurement.

I have written a small application, that sorts the original CSV data by the measurement time:

The result is a sorted CSV file, which can be used to run the examples.

Further Reading

I have written several blog posts on Apache Flink:

About

Experiments with Apache Flink.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 94.4%
  • Batchfile 2.3%
  • Shell 1.8%
  • Other 1.5%