Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
-
Updated
Aug 6, 2024 - Scala
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
Lambda Architecture Framework for Big Data, Spark, Versioned Data, NoSQL and SQL-Stores.
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
Analysis of socio-economic and climatic data in the USA (1975-2020) using Apache Spark and ELK Stack 🌍
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Byzer (former MLSQL): A low-code open-source programming language for data pipeline, analytics and AI.
This project features Scala code powered by Apache Spark for analyzing minimum wage data, aiming to uncover trends and variations in minimum wage rates across states and over time. It encompasses data transformations, mean wage computation, inflation analysis, and comparisons with Department of Labor (DOL) reported wages.
Average Temperature - Hadoop - Mapper - Reducer
大数据框架 Spark MLlib 机器学习库基础算法全面讲解,附带齐全的测试文件
This project demonstrates how to use Spark SQL to execute SQL queries on structured data in Spark, and display the results in a tabular format using the show() method.
Write ETL using your favorite SQL dialects
A free, open-source, web-based self-service BI tailor-made for clickhouse, google bigquery, mysql, postgresql, vertica
Add a description, image, and links to the bigdata topic page so that developers can more easily learn about it.
To associate your repository with the bigdata topic, visit your repo's landing page and select "manage topics."