Streaming data pipeline to continuously load data from an Amazon MSK or MSK Serverless cluster to Amazon S3 using Amazon Kinesis Data Firehose.
-
Updated
Feb 12, 2024 - Python
Streaming data pipeline to continuously load data from an Amazon MSK or MSK Serverless cluster to Amazon S3 using Amazon Kinesis Data Firehose.
Streaming data pipeline to continuously load data from an Amazon MSK or MSK Serverless cluster to Amazon S3 using Amazon Kinesis Data Firehose.
This is a collecton of Amazon CDK projects to show how to directly ingest streaming data from Amazon Mananged Service for Apache Kafka (MSK) and MSK Serverless into Apache Iceberg table in S3 with AWS Glue Streaming.
Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming using Amazon MSK Serverless and MSK Connect (Debezium)
Demo event analytics platform based on Apache Kafka (Confluent).
A Cloud based Reddit stock sentiment analyzer that analyzes overall sentiment from a configurable selection of stock subreddits for each stock. The architecture utilizes AWS MSK (Kafka), AWS EMR (PySpark) and AWS Lambda (Python 3) for maximum scalability and the OpenAI API for sentiment analysis through prompt engineering.
My AWS Playground
Pinterest's experiment analytics data pipeline which runs thousands of experiments per day and crunches billions of datapoints to provide valuable insights to improve the product.
Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming and MSK Connect (Debezium)
Pinterest data pipeline - designing an end-to-end pipeline utilising AWS cloud technologies and Databricks for analysing real-time and historical pinterest-emulated data.
🌳 A sustainable Terraform Package which creates resources for Messaging Services (EventBridge, MSK, SNS, SQS) on AWS
Terraform module to create kafka resource on AWS. AWS MSK (Managed Streaming for Apache Kafka) is a fully managed service that simplifies the deployment, management, and operation of Apache Kafka clusters. Apache Kafka is an open-source distributed streaming platform used for building real-time streaming data p
Spring boot demo app. Polls an external service and produces kafka topics which are sent to a kafka cluster (AWS-MSK). This topic is later consumed from the same demo app and returned when calling the REST API endpoint.
Oracle audit files to Apache Kafka/Amazon MKS/Amazon Kinesis transfer
Step by step guidance on how to setup MirrorMaker2 on AWS to perform data replication between 2 AWS MSK clusters
Pinterest's experiment analytics data pipeline which runs thousands of experiments per day and crunches billions of datapoints to provide valuable insights to improve the product.
Create terraform module for deploying AWS MSK cluster
Oracle Database Automatic Diagnostic Repository messages (alert.log & listener.log) to Apache Kafka or Amazon Kinesis transfer
Add a description, image, and links to the aws-msk topic page so that developers can more easily learn about it.
To associate your repository with the aws-msk topic, visit your repo's landing page and select "manage topics."