A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.
-
Updated
Dec 7, 2022 - Python
A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.
big data processing and machine learning platform,just like useing sql
Implementation of algorithms for big data using python, numpy, pandas.
A pipeline that consumes twitter data to extract meaningful insights about a variety of topics using the following technologies: twitter API, Kafka, MongoDB, and Tableau.
rock-solid pillars for enterprise-grade solutions
excel, markdown, csv, sql 数据源批量/单独格式互相转换
Sentiment-Analysis-API
Simple CSV parser for huge volumes of data with the use of the library Pandas for Python for getting specific columns of a CSV file and putting the extracted data into one or more files (each column in a separated file or all of them in the same output) in a short amount of time.
BigQuery data pipeline with dbt, Spark, Docker, Airflow, Terraform, GCP
Setting up a Spark cluster in a Docker environment for improved repeatability and reliability. This project includes a simple transformation on a dataset containing approximately 31 million rows.
Solved tasks of the master's degree courses of speciality "Algorithms and Systems for Big Data Processing".
"Provides tools for parallel pipeline processing of large data structures
Software basati su metodi di intelligenza artificiale per l'automazione dell'analisi di big data.
Building Data Lake and ETL pipelines using Amazon EMR, S3, and Apache Spark
Project using Python, Hive and MapReduce to compare various techniques to find the top K words in a very large file i.e. different techniques to process Big Data.
Exploring and Implementing Scalable Data Processing Techniques
Add a description, image, and links to the big-data-processing topic page so that developers can more easily learn about it.
To associate your repository with the big-data-processing topic, visit your repo's landing page and select "manage topics."