big-data-processing

Here are 74 public repositories matching this topic...

drshahizan / BDM

Course covers big data fundamentals, processes, technologies, platform ecosystem, and management for practical application development.

big-data big-data-analytics big-data-processing big-data-architecture

Updated Apr 7, 2024
Jupyter Notebook

felipefrizzo / terraform-aws-kinesis-firehose

Star

This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.

big-data analytics terraform kinesis-firehose cloudwatch-logs parquet terraform-provider etl-job terraform-aws big-data-processing

Updated Aug 4, 2021
HCL

This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessary infrastructure components, including Apache Flink, Elasticsearch, and Postgres

python big-data apache-flink big-data-processing realtime-streaming

Updated Dec 4, 2023
Java

eskimo-sh / eskimo

Star

Eskimo is a state of the art Big Data Infrastructure and Management Web Console to build, manage and operate Big Data 2.0 Analytics clusters on Kubernetes. This is the git repository of Eskimo Community Edition.

Updated Sep 14, 2023
Java

souvik-databricks / dlt-with-debug

Star

A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.

big-data spark etl python3 databricks dlt etl-pipeline big-data-processing delta-live-tables

Updated Dec 7, 2022
Python

StarPlatinumStudio / Flink-SQL-Practice

Star

Flink SQL 实战 -中文博客专栏

sql stream-processing apache-flink big-data-processing

Updated Jun 17, 2022
Java

chandnii7 / Big-Data-Processing-Pipeline

Star

A pipeline that consumes twitter data to extract meaningful insights about a variety of topics using the following technologies: twitter API, Kafka, MongoDB, and Tableau.

kafka big-data mongodb twitter-api data-visualization zookeeper data-analytics kafka-consumer kafka-producer tableau nosql-database kafka-streaming big-data-processing data-processing-pipelines

Updated Aug 2, 2021
Python

jamestiotio / dbsys

Sponsor

Star

SUTD 2021 50.043 Database and Big Data Systems Code Dump

Updated May 17, 2022
Java

impresso / impresso-text-acquisition

Star

🛠️ Python library to import OCR data in various formats into the canonical JSON format defined by the Impresso project.

big-data-processing historical-newspapers impresso-project

Updated Oct 1, 2024
Jupyter Notebook

VincianeDesbois / Hopitaux_Production

Star

Study of French hospital production. (2021)

python econometrics big-data-processing

Updated Sep 19, 2023
Jupyter Notebook

bdnf / BigData-Engineering-Projects

Star

Data modeling with Cassandra, building Data Warehouse using Redshift and creation of Data Lake using Spark and Airflow

airflow spark cassandra data-warehouse data-lake redshift big-data-analytics big-data-processing

Updated Feb 28, 2020
Jupyter Notebook

isandratskiy / awesome-bigdata-testing

Star

A list of awesome big data testing frameworks, resources and other awesomeness.

testing data automation awesome database big-data awesome-list automation-testing big-data-processing big-data-testing big-data-automation

Updated Feb 9, 2022

vvittis / FlinkSampling

Star

Reservoir Sampling for Group-By Queries in Flink Platform. Answering effectively Single Aggregate.

java topic stratum apache-flink sampling reservoir-sampling streaming-data big-data-analytics group-by big-data-processing streaming-tuples

Updated Aug 12, 2023
Java

mtumilowicz / big-data-scala-spark-batch-workshop

Star

Introduction to Spark Batch processing.

big-data workshop spark workshop-materials batch-processing spark-sql big-data-processing

Updated May 27, 2024
Scala

alessiococchieri / BDA-project-sparkify

Star

This Git repo showcases my analysis of Sparkify dataset with PySpark on Apache Spark cluster mode and JupyterLab on Docker. The goal was to identify at-risk customers and develop retention strategies. The analysis tested multiple machine learning models and uncovered insights into customer behavior and churn patterns.

machine-learning big-data spark apache-spark pyspark churn-prediction big-data-analytics big-data-processing churn-analysis sparkify

Updated Feb 15, 2023
Jupyter Notebook

giucris / yasp

Star

Yet Another SPark Framework

framework scala big-data spark etl sparksql elt etl-framework etl-pipeline big-data-processing

Updated Feb 5, 2023
Scala

chuanting / imbalance_index

Star

全球电信资源分布不均衡指数刻画

6g big-data-processing digital-divide connect-the-unconnected

Updated Jun 2, 2021
HTML

zaid-24 / Crack-Detection-using-CNN

Star

Crack Detection model using yolov7

python cnn pytorch big-data-processing yolov7

Updated Jul 2, 2023
Jupyter Notebook

john-fotis / Movie-Recommender

Star

A movie recommender written in Go that suggests movies considering various factors within a particular dataset, encompassing users, movies, and movie ratings.

go golang big-data web-application recommender-system cosine-similarity cli-application jaccard-similarity movie-recommendation-system pearson-correlation dice-coefficient corellation big-data-processing

Updated Apr 21, 2024
Go

khanovico / energy-data-analysis

Star

This is the cloud model analyzing real world dataset with BigQuery and other big-data analyzing tools. I implemented docker image for running this app on cross-platform environments.

python docker bigquery scikit-learn jupyter-notebook seaborn xgboost google-app-engine mlflow big-data-processing

Updated Aug 4, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the big-data-processing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the big-data-processing topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

big-data-processing

Here are 74 public repositories matching this topic...

drshahizan / BDM

felipefrizzo / terraform-aws-kinesis-firehose

airscholar / FlinkCommerce

eskimo-sh / eskimo

souvik-databricks / dlt-with-debug

StarPlatinumStudio / Flink-SQL-Practice

chandnii7 / Big-Data-Processing-Pipeline

jamestiotio / dbsys

impresso / impresso-text-acquisition

VincianeDesbois / Hopitaux_Production

bdnf / BigData-Engineering-Projects

isandratskiy / awesome-bigdata-testing

vvittis / FlinkSampling

mtumilowicz / big-data-scala-spark-batch-workshop

alessiococchieri / BDA-project-sparkify

giucris / yasp

chuanting / imbalance_index

zaid-24 / Crack-Detection-using-CNN

john-fotis / Movie-Recommender

khanovico / energy-data-analysis

Improve this page

Add this topic to your repo