apache-hadoop

Apache Hadoop. Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Originally designed for co…

library libraries java-library apache-hadoop apache-hadoop-framework bayudwiyansatria apache-hadoop-library

Updated Oct 7, 2021
Java

Lucass97 / FlightAnalysis

Star

This project implemented a lambda architecture for analyzing domestic flight data in the US from 2009 to 2020. It used Apache Spark for batch processing, Spark Streaming for real-time analysis, and SVM models to predict flight cancellations and delays, with Docker for cluster management and Grafana for real-time visualization.

Updated Jul 28, 2023
Jupyter Notebook

tspannhw / links

Star

Links

scala apache-spark sbt apache-hadoop

Updated Mar 12, 2018
Scala

aquib-sh / setup-hadoop

Star

A BASH script to setup Apache Hadoop and Apache Hive with Derby database on Debian GNU/Linux

linux bash hive hadoop debian ubuntu shell-script hadoop-cluster bash-script derby setup-script hadoop-hdfs apache-hadoop

Updated Dec 7, 2022
Shell

Coursal / Text-Sentiment-Analysis-In-Hadoop-And-Spark

Star

The source code developed and used for the purposes of my thesis with the same title under the guidance of my supervisor professor Vasilis Mamalis for the Department of Informatics and Computer Engineering of the University of West Attica.

text-mining spark apache-spark hadoop sentiment-analysis text-classification naive-bayes naive-bayes-classifier mapreduce opinion-mining support-vector-machines hadoop-mapreduce naive-bayes-classification sentiment-classification spark-mllib apache-hadoop support-vector-machine-svm

Updated Mar 13, 2021
Java

esakik / data-engineering-essentials

Star

Samples related to data engineering, e.g. spark, embulk, airflow, etc.

apache-spark protocol-buffers amazon-emr data-engineering digdag fluentd apache-beam embulk apache-avro mrjob apache-airflow cloud-dataflow apache-hadoop cloud-dataproc

Updated Dec 8, 2022
Python

Improve this page

Add a description, image, and links to the apache-hadoop topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the apache-hadoop topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apache-hadoop

Here are 80 public repositories matching this topic...

mahmoudparsian / data-algorithms-book

mahmoudparsian / big-data-mapreduce-course

s911415 / apache-hadoop-3.1.0-winutils

tencentyun / hadoop-cos

PBWebMedia / yarn-prometheus-exporter

realtimedatalake / hive-metastore-docker

RBC-DSAI-IITM / DCEIL

Guru107 / hadoop-small-files-merger

nghoanglongde / spark-cluster-with-docker

Narius2030 / Sakila-Business-Analysis

Coursal / Hadoop-Examples

bdoepf / aws-emr-prometheus

whoami-anoint / EasyHadoop

felidsche / mail-spam-filter

bayudwiyansatria / library-java-apache-hadoop

Lucass97 / FlightAnalysis

tspannhw / links

aquib-sh / setup-hadoop

Coursal / Text-Sentiment-Analysis-In-Hadoop-And-Spark

esakik / data-engineering-essentials

Improve this page

Add this topic to your repo