Course covers big data fundamentals, processes, technologies, platform ecosystem, and management for practical application development.
-
Updated
Apr 7, 2024 - Jupyter Notebook
Course covers big data fundamentals, processes, technologies, platform ecosystem, and management for practical application development.
This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.
This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessary infrastructure components, including Apache Flink, Elasticsearch, and Postgres
Eskimo is a state of the art Big Data Infrastructure and Management Web Console to build, manage and operate Big Data 2.0 Analytics clusters on Kubernetes. This is the git repository of Eskimo Community Edition.
A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.
Flink SQL 实战 -中文博客专栏
A pipeline that consumes twitter data to extract meaningful insights about a variety of topics using the following technologies: twitter API, Kafka, MongoDB, and Tableau.
SUTD 2021 50.043 Database and Big Data Systems Code Dump
🛠️ Python library to import OCR data in various formats into the canonical JSON format defined by the Impresso project.
Study of French hospital production. (2021)
Data modeling with Cassandra, building Data Warehouse using Redshift and creation of Data Lake using Spark and Airflow
A list of awesome big data testing frameworks, resources and other awesomeness.
Reservoir Sampling for Group-By Queries in Flink Platform. Answering effectively Single Aggregate.
Introduction to Spark Batch processing.
This Git repo showcases my analysis of Sparkify dataset with PySpark on Apache Spark cluster mode and JupyterLab on Docker. The goal was to identify at-risk customers and develop retention strategies. The analysis tested multiple machine learning models and uncovered insights into customer behavior and churn patterns.
Yet Another SPark Framework
全球电信资源分布不均衡指数刻画
Crack Detection model using yolov7
A movie recommender written in Go that suggests movies considering various factors within a particular dataset, encompassing users, movies, and movie ratings.
This is the cloud model analyzing real world dataset with BigQuery and other big-data analyzing tools. I implemented docker image for running this app on cross-platform environments.
Add a description, image, and links to the big-data-processing topic page so that developers can more easily learn about it.
To associate your repository with the big-data-processing topic, visit your repo's landing page and select "manage topics."