MapReduce, Spark, Java, and Scala for Data Algorithms Book
-
Updated
Oct 14, 2024 - Java
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
hadoop-cos(CosN文件系统)为Apache Hadoop、Spark以及Tez等大数据计算框架集成提供支持,可以像访问HDFS一样读写存储在腾讯云COS上的数据。同时也支持作为Druid等查询与分析引擎的Deep Storage
Export Hadoop YARN (resource-manager) metrics in prometheus format
Containerized Apache Hive Metastore for horizontally scalable Hive Metastore deployments
A fast, scalable and distributed community detection algorithm based on CEIL scoring function.
A Spark application to merge small files on Hadoop
The implementation of Apache Spark (combine with PySpark, Jupyter Notebook) on top of Hadoop cluster using Docker
Implement a Hive data warehouse to store meaningful data, apply Machine Learning like Clustering or Regression for dealing with business problems
Some simple, kinda introductory projects based on Apache Hadoop to be used as guides in order to make the MapReduce model look less weird or boring.
Simplified Hadoop Setup and Configuration Automation
An email spam filter using Apache Spark’s ML library
Apache Hadoop. Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Originally designed for co…
This project implemented a lambda architecture for analyzing domestic flight data in the US from 2009 to 2020. It used Apache Spark for batch processing, Spark Streaming for real-time analysis, and SVM models to predict flight cancellations and delays, with Docker for cluster management and Grafana for real-time visualization.
A BASH script to setup Apache Hadoop and Apache Hive with Derby database on Debian GNU/Linux
The source code developed and used for the purposes of my thesis with the same title under the guidance of my supervisor professor Vasilis Mamalis for the Department of Informatics and Computer Engineering of the University of West Attica.
Samples related to data engineering, e.g. spark, embulk, airflow, etc.
Add a description, image, and links to the apache-hadoop topic page so that developers can more easily learn about it.
To associate your repository with the apache-hadoop topic, visit your repo's landing page and select "manage topics."