Report or block tmalaska
Contact Support about this user's behavior.Report abuse
Simple Spark example of generating table stats for use of data quality checks
Fast scalable time series database
Simple Example of HBase, SolR, and Kudu for Entity 360 using NY taxi data
An example of how to do a merge sort
This tool is designed to look through your HDFS folders to ether identify files with no data in them or delete files with no data in them.
This project is a collection of Spark Unit Tests Examples to help new Spark users have good examples on how to unit start their code for Spark Core, Spark SQL, and Spark Streaming
Examples for training
A tool to figure out when to grow or shrink a cluster
This is a demo/training application. Used to show how easy it is to do operations like ingestion, aggregation, and change data capture. Using tools like Kafka, Spark Streaming, Flume, Kudu, SolR, HBase, and HDFS
Based off the design of SparkOnHBase. This Repo will support Spark, Spark Streaming, and Spark SQL integration with Kudu.
HBase.MCC (HBase Multi Cluster Client). The goal is to support aways up solutions with HBase through multiple clusters
Just for Fun do not use in the real world. :)
This is an example of how to do window analysis with Spark
NRT Sessionization with Spark Streaming landing on HDFS and putting live stats in HBase
The ability to rebalance on clusters that have HBase by selecting folders to rebalance
Support to write Seq Files with Spark Streaming with similar functionality as Flume HDFS Sink with Seq Files
Mirror of Apache Spark
Using JRecord to build a mapred and mapreduce inputformat for HDFS, MAPREDUCE, PIG, HIVE, Spark, ...
This is a FixedLengthInputFormat for Hadoop map reduce.
This is an example of how to make Unique Sequences in a distributed way with Spark (No dups, No Skips)
Just some example of using GraphX
A simple example of using Giraph to root nodes in a tree
A simple program to put files from a directory into HDFS with the added functionality and defining how that action will happen
This will do a Merge Join of absolute Sorted data any number of files of ether side.
This will contain implementations that will copy records from a table with less regions then the final table.