Skip to content

The final project for the Cloud Computing Specialization in Coursera: Analysis of Airline On-Time Performance Data in Hadoop and Spark.

Notifications You must be signed in to change notification settings

wuyichen24/CloudCapstone

Repository files navigation

Cloud Capstone

The final project for the Cloud Computing Specialization in Coursera: Analysis of Airline On-Time Performance Data in Hadoop and Spark.

  • Launched and configured multiple AWS EC2 instances with proper security setting (IAM Role, Security Group, Private Key Access).
  • Installed and deployed Hadoop and Spark on the cluster of multiple AWS EC2 instances by Ambari.
  • Designed and implemented the solutions for several in-practice problems in Hadoop and Spark respectively for analyzing the Airline On-Time Performance Data (all non-canceled flights between 1988 and 2008) from the BTS (US Bureau of Transportation Statistics).
  • Installed and deployed Cassandra database on multiple nodes and stored the results into the cluster.
  • Applied system-level optimizations by creating instances with a higher ratio of vCPUs of memory and application-level optimizations by adjusting spark.locality.* properties in SparkConf for increasing data locality.

About

The final project for the Cloud Computing Specialization in Coursera: Analysis of Airline On-Time Performance Data in Hadoop and Spark.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published