A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.
-
Updated
Dec 7, 2022 - Python
A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.
A pipeline that consumes twitter data to extract meaningful insights about a variety of topics using the following technologies: twitter API, Kafka, MongoDB, and Tableau.
BigQuery data pipeline with dbt, Spark, Docker, Airflow, Terraform, GCP
Setting up a Spark cluster in a Docker environment for improved repeatability and reliability. This project includes a simple transformation on a dataset containing approximately 31 million rows.
Building Data Lake and ETL pipelines using Amazon EMR, S3, and Apache Spark
Solved tasks of the master's degree courses of speciality "Algorithms and Systems for Big Data Processing".
"Provides tools for parallel pipeline processing of large data structures
Software basati su metodi di intelligenza artificiale per l'automazione dell'analisi di big data.
rock-solid pillars for enterprise-grade solutions
Implementation of algorithms for big data using python, numpy, pandas.
Project using Python, Hive and MapReduce to compare various techniques to find the top K words in a very large file i.e. different techniques to process Big Data.
excel, markdown, csv, sql 数据源批量/单独格式互相转换
big data processing and machine learning platform,just like useing sql
Simple CSV parser for huge volumes of data with the use of the library Pandas for Python for getting specific columns of a CSV file and putting the extracted data into one or more files (each column in a separated file or all of them in the same output) in a short amount of time.
Sentiment-Analysis-API
Exploring and Implementing Scalable Data Processing Techniques
Add a description, image, and links to the big-data-processing topic page so that developers can more easily learn about it.
To associate your repository with the big-data-processing topic, visit your repo's landing page and select "manage topics."