Welcome to our Carnegie Mellon 15619-Cloud Computing final project repository!
Our team is named "ThreeCobblers"
Which comes an ancient Chinese proverb "Three Cobblers could outperform Liang Zhuge (三个臭皮匠顶一个诸葛亮)", or equivalently "Two heads are better than one" in English.
It essentially states our belief that by collaborating together we three could make a successful project of excellence.
🌟Team member:
In this project, developments are divided into three phases incrementally. The ultimate goal is to develop and deploy a fully-managed microservice cloud web service on AWS with extremely low latency, high throughput and fault tolerance.
🌟Phase1🌟: develop three invidiual services and host them on a webframe chosen. Test each service's performance individually
Microservice 1 is about a QR Code encode and decode functionality where user sends a request containing the byte representation of the QR code and the web service encode/decode it accordingly and sends back the response.
Microservice 2 is computationally heavy that the web service needs to validate an existing blockchain and do work to append a new transaction verified onto it.
Microservice 3 involves both the web service and backend storage tier. The raw dataset of ~1TB twitter data is first cleaned and preprocessed using PySpark on Azure HDInsight cluster and loaded into the MySQL database. The web service will need to set up connections with backend database and query dynamically, and recommend high interaction scored friends back to the user.
🌟Phase2🌟: deploy all the three microservices using self-managed Kubernetes. Ends with a live test of hourly budget $0.7/hour
Database schemea is divided into three tables in phase 1. However, when web service is handling a large volume of requests, it consumes a high amount of disk I/O burst credits and performance drops accordingly. Therefore a unified optimized schema is composed. And in order to increase the throughput and lower latency, network load balancer (NLB) replaces default elastic load balancer (ELB).
🌟Phase3🌟: migrate the self-managed cluster to AWS fully-managed service. Ends with a live test of hourly budget $1.28/hour
Originally in phase2 we deploy the microservice architecture as a self-managed Kubernetes cluster. Both the web-tier service and storage-tier database are deployed on EC2 instances and we have to consider how to place them so as to fully utilize the computational power and network transfer efficiency. In phase3, we migrate to a fully-managed architecture on AWS which eases the burden of management and enhances fault tolerance. We use Elastic Kubernetes Service (EKS) and Relational Database Service (RDS).
Programming Language: Java
Framework: Vert.x
ETL: Spark
Database: MySQL (InnoDB, MyISAM), Aurora
Orchestration: Docker, Kubernetes, kOps, Helm
Cloud: AWS (EC2, RDS, ECS, EKS, Managed Node Group), Azure (HDInsight), Load Balancer (ALB, NLB)
/etl-spark # the ETL Spark Code script
/k8s # the self-managed Kops and helm script used in phase2
/phase3-deployment # the folder for EKS cluster configuration,
RDS and ALB Terraform scripts, Microservice Helm chart in phase3
/pic # some auxiliary pictures
/script # additional scripts to setup environment and test suite
/web # the codes for three microservices
/phase1-checkpoint-report
/phase1-final-report
/phase2-final-report
/phase3-final-report
The above graph is our final architecture for the microservice. The whole process of handling a user request is as follows:
- We set up the EKS cluster with managed node group, backend RDS loaded with pre-ETLed data.
- The user sends a HTTP service request to our service endpoint.
- The request arrives at the network load balancer (NLB).
- The NLB dispatches the request to one of the service pods in the cluster.
- The pod selected handles the request by doing computations or querying the backend RDS.
- The pod sends back the response to the client if the request is valid.
The project comes with two live test, one in Phase 2 and one in Phase 3. There are in total 70 teams participating in the live tests, 3 CMU students per team.
In Phase 2, we ranked 2nd.
In Phase 3, we ranked exceptionally well.