Some analysis on Citibike's published data using Spark on Amazon EC2.
- Follow instructions here to set up a Spark cluster on Amazon EC2.
submit.sh script simply
scps the built jar file to the cluster and runs
I rapidly found my
submit.sh script to be annoying to maintain, so it no longer works if you have multiple jars. Use it if you like.
build.gradle to copy required libraries into
build/libs, so you should be able to
scp everything from that folder to the server, then pass the jars in as an argument to
spark-submit. I've just been doing that manually now.