hadoop-hdfs

Star

Here are 10 public repositories matching this topic...

StefanoFioravanzo / evolving-wikipedia-graph

Star

Distributed processing of Wikipedia history files using Hadoop and Spark

spark wikipedia hadoop-hdfs distributed-processing

Updated Jan 6, 2019
Scala

ra312 / spark_with_scala

Star

A collection of useful scala scripts to work with Hadoop

scala hadoop-hdfs

Updated Sep 5, 2020
Scala

kapilthakre / Bicycle-Sharing-Demand-Forecasting-Using-Spark-Scala

Star

In this project, we are going to build a Bicycle sharing demand prediction service using Apache Spark and Scala. I have created a two spark application one for model generation and another for model demand prediction.

machine-learning scala spark hadoop spark-streaming spark-sql spark-mllib databricks-notebooks hadoop-hdfs

Updated Jan 8, 2021
Scala

ManikHossain08 / Bixi-Cloud-ETL-Data-Pipeline-using-Scala-Hive-AWS_Athena_JDBC-Driver

Star

An Automated ETL Data pipeline which extract complex json data from web API service (GBFS-bixi Data) and convert to CSV for loading into Data-warehouse HDFS. After-that, Hive will process the further by external and managed table. Same procedure is also applied with AWS S3 and Athena.

scala hive aws-lambda athena s3-bucket sbt-plugin jdbc-driver hiveql hadoop-hdfs

Updated Nov 23, 2021
Scala

mehroosali / ABCStoresPipeline

Star

Batch ETL data pipeline built on HDP 3.0 to process daily sales and business data to procedure power Bi reports. Automated the pipelines using Airflow.

mysql airflow scala spark hadoop hadoop-cluster powerbi hadoop-hdfs etl-pipeline airflow-dags

Updated Dec 29, 2021
Scala

fbraza / scala-dfs-lib

Star

DFS-Lib is a scala flavoured api to the Hadoop java filesystem api

scala hdfs hadoop-filesystem hadoop-hdfs

Updated Apr 30, 2022
Scala

Driramohamedfarouk / bigdata-stock-market-pipeline

Star

kafka big-data cassandra spark-streaming hadoop-mapreduce hadoop-hdfs

Updated May 11, 2023
Scala

seyfal / MapReduceGraphComparison

Star

Distributed computational problem-solving project, which aims to perform large-scale graph matching using cloud computing technologies. The project allows users to import two directed graphs and analyze the differences between them.

scala hadoop aws-s3 aws-emr aws-ec2 hadoop-mapreduce hadoop-hdfs