Anomaly Detection in Bitcoin

Bitcoin is an emerging digital currency, the most distinctive characteristic is that all it's transactions are recorded in a publicaly avaiable blockchain.
Therefore the transaction data in Bitcoin serve as an alternative to financial data which might not always be accessible and poses privacy concerns.
This project shows how to detect abnormal transactions in the Bitcoin data.

Data

The data used in this project can be downloaded from this link.
http://compbio.cs.uic.edu/data/bitcoin/

The data was obtained from parsing the blockchain using a Python script.
The file we use contains Bitcoin transactions from January 2009 to April 2013, with over 15M transactions and 6.3M users.

Each transaction has:

A transaction key
User ID sending Bitcoin
User ID receiving Bitcoin
Date and time when the transaction occur
Number of Bitcoins transferred

txn_key	from_user	to_user	date	amount
1	2	2	20130410142250	24.375000
1	2	782477	20130410142250	0.770900
2	620423	4571210	20111227114312	614.174951
2	620423	3	20111227114312	128.040520
3	3	782479	20130410142250	47.140520

For demonstration purpose, we will use data from January 2009 to December 2010, with 218K transactions and 110K users.

Graph Theory

Transactions in a financial system forms a graph where each user is a node and each transaction is an edge.
As a result, concepts in graph theory will be very useful when analysing transactions and finding anomalies.

Degree - Number of nodes connecting this node
Clustering coefficient - Measures how neighboring nodes are connected to each other
Strongly connected components - Isolated communities

Isolation Forest

A robust algorithm for anomaly detection.

Does not require assumptions on the data, unlike Elliptic Envelope which requires data to be Guassian
Able to capture more complex structure than One-Class SVM

How it works:

Build a tree by randomly split data into different partitions until every data point is isolated.
The idea is data point located at leafs with a shorter path from the root are likely to be outliers.
Build many of those trees to obtain an average path length for each data point.
Identify the proportion with the shortest average path length as outliers.

(source: https://www.slideshare.net/mlvlc/l14-anomaly-detection)

(source: https://www.slideshare.net/turi-inc/anomaly-detection-using-isolation-forests)

An isolation forest of 100 trees was built from the transactions, and the most extreme 1% transactions were identified as anomalies.

Results

After running the isolation forest model, 2782 transactions were identified as abnormal.
The model determined transactions with an anomaly score (log average path length) -0.0948 or lower were anomalies.
The plot shows the distribution of anomaly scores of every transaction, all transactions with score lower than the threshold (left of the dotted red line) are classified as anomalies.

Let's visualize the transactions to verify that the anomalies found were different to normal transactions.
t-SNE is a dimensionality reduction technique to visualize high dimension data.
It tries to approximate the high dimension distribution with a low dimension distribution.
It is a machine learning algorithm itself!

The visualization shows 5000 of the transactions, with blue dots indicate normal transactions and red dots indicate abnormal transactions.
As shown in the plot, the anomalies identified lies in isolated regions and are away from structures formed by normal transactions.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
d3		d3
images		images
notebooks		notebooks
.gitignore		.gitignore
Bitcoin Anomaly Analysis.ipynb		Bitcoin Anomaly Analysis.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

d3

d3

images

images

notebooks

notebooks

.gitignore

.gitignore

Bitcoin Anomaly Analysis.ipynb

Bitcoin Anomaly Analysis.ipynb

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Anomaly Detection in Bitcoin

Data

Graph Theory

Isolation Forest

Results

About

Releases

Packages

Languages

License

wobiskai/anomaly-detection-in-Bitcoin

Folders and files

Latest commit

History

Repository files navigation

Anomaly Detection in Bitcoin

Data

Graph Theory

Isolation Forest

Results

About

Resources

License

Stars

Watchers

Forks

Languages