Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
src
 
 
 
 
 
 
 
 
 
 

README.md

spark-streaming-wikiedits

A Spark Streaming custom receiver for Wikipedia edits.

Available in The Central Repository as com.wjoel:spark-streaming-wikiedits:0.1.3

Usage

Start a spark-shell

./bin/spark-shell --master local[4] \
  --packages "org.clojure:clojure:1.8.0,\
org.schwering:irclib:1.10,\
com.wjoel:clj-bean:0.2.0,\
com.wjoel:spark-streaming-wikiedits:0.1.3"

... and run the following (also available in the examples directory).

import org.apache.spark.streaming._
import org.apache.spark.sql.functions._
import com.wjoel.spark.streaming.wikiedits._

implicit val encoder = org.apache.spark.sql.Encoders.bean(classOf[WikipediaEditEvent])
val ssc = new org.apache.spark.streaming.StreamingContext(spark.sparkContext, Seconds(5))

ssc.receiverStream(new WikipediaEditReceiver()).
  window(Seconds(20)).
  filter { editEvent =>
    !editEvent.getTitle.contains(":")
  } foreachRDD { rdd =>
    spark.createDataset(rdd).
      groupBy($"title").
      agg(sum($"byteDiff") as "sumByteDiff").
      orderBy(abs($"sumByteDiff").desc).
      limit(10).
      show()
  }
ssc.start()

License

Copyright © 2017 Joel Wilsson

Distributed under the MIT License.

About

Spark Streaming receiver for Wikipedia edits

Topics

Resources

License

Releases

No releases published

Packages

No packages published
You can’t perform that action at this time.