GitHub - vaquarkhan/credit-card-fraud-detection-spark: A real time streaming implementation of markov chain based fraud detection

An example implementation of a Markov chain based credit card fraud detection

More information on our blog: http://

In order to run our example, we need to install the followings:

Building the examples:

$ sbt assembly - this creates [installation_dir]/marseille/target/scala-2.10/spark-kafka-fraud-detection-assembly-1.0.jar

Let's start all the components running.

Start spark in standalone mode

$ [spark_installation_dir]/sbin/start-all.sh

you should be able to bring up the spark monitoring page at http://localhost:8080 at this point. The page should indicate that there is one work node (default setting).

Start hbase

$ [hbase_installation_dir]/bin/start-hbase.sh

This would also start the attached zookeeper instance.

Start kafka. this will reuse the zookeeper instance started with hbase. Do not start a second instance.

$ [kafka_installation_dir]/bin/kafka-server-start.sh config/server.properties

Start hdfs

$ [hadoop_installation_dir]/sbin/start-all.sh

Use jps to check both spark and hbase are running

$ jps

You should see lines like 10461 Kafka 10322 HRegionServer 14175 Jps 13669 Main 97904 10222 HMaster 13893 Master 13546 SecondaryNameNode 13452 DataNode 13377 NameNode

Now start all the monitoring tools to keep an eye on each stage of execution

Create 3 kafka topics $ [kafka_installation_dir]/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic real-time-cc $ [kafka_installation_dir]/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic user-trans-history $ [kafka_installation_dir]/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic candidate-fraud-trans

Start 3 listeners for these 3 topics using kafka's console consumer $ [kafka_installation_dir]/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic real-time-cc $ [kafka_installation_dir]/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic user-trans-history $ [kafka_installation_dir]/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic candidate-fraud-trans

The first task to run is creating a training data set.

Run the TrainerApp to create the training set. The number of customers is set to 500 by default in the app_config.prop file $ /opt/spark-1.1.0-bin-hadoop2.4/bin/spark-submit --class com.jcalc.app.TrainerApp --master spark://yourhostname:7077 spark-kafka-fraud-detection-assembly-1.0.jar

this creates three hdfs files : /data/streaming_analysis/markovmodel.txt - this is the model file. it contains the transition probability for all state pairs /data/streaming_analysis/training_sequence - this is the raw transaction sequence /data/streaming_analysis/training_transaction - this is transaction sequence, which groups all transaction sequence per customer

Use spark-submit to start MarkovPredictor which listens on the real-time-cc topic, and publish to topics user-trans-history and candidate-fraud-trans $ /opt/spark-1.1.0-bin-hadoop2.4/bin/spark-submit --class com.jcalc.feed.MarkovPredictor --master spark://yourhostname:7077 spark-kafka-fraud-detection-assembly-1.0.jar

Finally, feed simulated credit card transaction data into the real-time-cc topic to feed MarkovPredictor. 50 is a typical number of customers to use. $java -cp spark-kafka-fraud-detection-assembly-1.0.jar com.jcalc.feed.CCStreamingFeed 50

When you feed this simulated data, you should see data being printed out on all three topics. In addition, you can check that the transaction history is recorded in hbase 'cchistory' table. Use hbase shell to see the records. $ [hbase_install_dir]/bin/hbase shell hbase(main)> scan 'cchistory'

Hbase record dump looks like the following

 UHRMHKBO1D                               column=transactions:records, timestamp=1418713010152, value=LNS,NNL,NNS,LHL,NNN,LHL,NNL,LNN,NNN,NNS,LNN                 
 UM2RPYPHRX                               column=transactions:records, timestamp=1418713010327, value=NNL,LHN,NNN,NNL,NNN,NNL,NHL,NNN,NNS,NNL,NNN,LHN,LNL         
 W19EW2550B                               column=transactions:records, timestamp=1418713010198, value=NNN,HHN,NNN,NHL,NNS,NNL,NHN,NNL,HNL,LNS,LNL                 
 X30RNTW3KG                               column=transactions:records, timestamp=1418713010215, value=NNN,LNN,NNL,LHS,LNN,NNL,NNN,LHN,LNN                         
 XM7YLRIDL8                               column=transactions:records, timestamp=1418713010173, value=LNN,LNL,NNL,NNL,LNL,NNL,HNN,NNN,NNN,NNN,LNL                 
 Y3LNSKDK02                               column=transactions:records, timestamp=1418713010207, value=NNS,NNN,NNN,NNN,HNN,LNN,HNL,NNL,NNS,NNL,NNL,LNN,LNN,LHL,NNN 
50 row(s) in 0.0520 seconds

Repeat the simulated data feed as often as you like.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
config		config
docs		docs
project		project
src/main		src/main
LICENSE		LICENSE
README.md		README.md
assembly.sbt		assembly.sbt
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config

config

docs

docs

project

project

src/main

src/main

LICENSE

LICENSE

README.md

README.md

assembly.sbt

assembly.sbt

build.sbt

build.sbt

Repository files navigation

About

Releases

Packages

Languages

License

vaquarkhan/credit-card-fraud-detection-spark

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Languages