Distributed processing of Wikipedia history files using Hadoop and Spark
-
Updated
Jan 6, 2019 - Scala
Distributed processing of Wikipedia history files using Hadoop and Spark
A collection of useful scala scripts to work with Hadoop
In this project, we are going to build a Bicycle sharing demand prediction service using Apache Spark and Scala. I have created a two spark application one for model generation and another for model demand prediction.
An Automated ETL Data pipeline which extract complex json data from web API service (GBFS-bixi Data) and convert to CSV for loading into Data-warehouse HDFS. After-that, Hive will process the further by external and managed table. Same procedure is also applied with AWS S3 and Athena.
Batch ETL data pipeline built on HDP 3.0 to process daily sales and business data to procedure power Bi reports. Automated the pipelines using Airflow.
DFS-Lib is a scala flavoured api to the Hadoop java filesystem api
Distributed computational problem-solving project, which aims to perform large-scale graph matching using cloud computing technologies. The project allows users to import two directed graphs and analyze the differences between them.
Average Temperature - Hadoop - Mapper - Reducer
Add a description, image, and links to the hadoop-hdfs topic page so that developers can more easily learn about it.
To associate your repository with the hadoop-hdfs topic, visit your repo's landing page and select "manage topics."