The application converts any CSV file into Parquet or AVRO format and it reads and writes from any File System, HDFS, and S3.
-
Updated
Oct 13, 2020 - XSLT
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
The application converts any CSV file into Parquet or AVRO format and it reads and writes from any File System, HDFS, and S3.
A project that involves manipulating unstructured CSV data with Hadoop's HDFS & Hive, additionally performing queries using SparkSQL
Apache Spark Programs to perform data analysis on movielens data
An SVM classifier to determine whether two questions on quora are duplicates or not
Geo-Spatial Hotspot Analysis of Large scale datasets using Apache Spark and Scala
Twitter's Tweets Stream Sentiment Analyser using Apache Spark - Spark Stream, Spark SQL , Stanford NLP(Natural Language Processing)
Live twitter treaming from twitter4j API and Spam Detection Using Apache Spark for Stream Data Processing, ElasticSearch to index the data and Kibana to get live interatice dashboards
Apache Livy - Apache NiFi - Example Scala Spark Job
Created by Matei Zaharia
Released May 26, 2014