MSCI436 Term Project
-
Updated
Jan 5, 2023 - Java
MSCI436 Term Project
Complete data platform that performs sentiment analysis on tweets. Built using Cassandra, Kafka, Spark, Node, and React.
Tweetpipe Kinesis Producer Java application
Workflow code for the CLIN project
This repository comprises the design, implementation, and analysis of a near real-time data warehouse prototype for an electronics business chain, utilising a multi-threaded Extract, Transform, Load (ETL) pipeline leveraging the efficient HYBRIDJOIN algorithm implemented with Java and MySQL on customer sales data.
A demo data pipeline is about Flink for Batch processing
An pipelined application providing reusable ETL (Extract, Transform, Load) functions for processing large volumes of records.
Spring Boot 2 service that forms the analysis tier of the TweetPipe streaming data pipeline. This application consumes tweets from a Kafka topic, analyses them, and will persist the result to a target database.
a Data warehouse project for the Finance data.
ETL Console is the Canary Islands Government's corporate solution for registering and executing ETL files developed using Pentaho Data Integration technology
Read step by step tutorial here: https://frazynondo.medium.com/etl-with-gcp-part-i-apache-beam-eclipse-gcs-and-bigquery-dc9529ee7f19
Created a dimensional star model, conducted data profiling and preparation, carried out ETL processes for staging and data integration, and developed BI reporting to analyze trends and generate valuable insights.
Spring Boot 2 service that forms the collection tier of the TweetPipe streaming data pipeline. This application consumes tweets matching specific keywords from Twitter and publishes them to a Kafka topic.
TweetPipe Apache Flink AWS Kinesis Consumer. A Flink-based consumer that reads from an AWS Kinesis source and maps the input stream elements to a domain model. Future iterations will output the transformed data to a sink.
Sample code for Apache Beam to perform ETL from a stream-processing service (Pub/Sub) to BigQuery using Dataflow as the runner
Project with ETL pipelines logic made to demonstrate the power of Spring framework
Extract,Transform and load of Iowa Liqour sales dataset comprising of 24M rows
Add a description, image, and links to the etl-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the etl-pipeline topic, visit your repo's landing page and select "manage topics."