<a href="https://colab.research.google.com/github/wolego2uni/projects/blob/main/001140837_Fagbohun_COMP1831_Blockchain_transaction_anonymity.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Technologies for Anti-Money Laundering and Financial Crime**
### *COMP-1831-M01-2020-21*

## **SVM Classification**

In [None]:
!gdown --id 13nw-uRXPY8XIZQxKRNZ3yYlho-CYm_Qt

Downloading...
From: https://drive.google.com/uc?id=13nw-uRXPY8XIZQxKRNZ3yYlho-CYm_Qt
To: /content/bill_authentication.csv
  0% 0.00/46.4k [00:00<?, ?B/s]100% 46.4k/46.4k [00:00<00:00, 18.0MB/s]


#**Import Libraries**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
bankdata = pd.read_csv("/content/bill_authentication.csv")

print("Dataset Shape: ", bankdata.shape)

bankdata.head()

Dataset Shape:  (1372, 5)


Unnamed: 0,Variance,Skewness,Curtosis,Entropy,Class
0,3.6216,8.6661,-2.8073,-0.44699,0
1,4.5459,8.1674,-2.4586,-1.4621,0
2,3.866,-2.6383,1.9242,0.10645,0
3,3.4566,9.5228,-4.0112,-3.5944,0
4,0.32924,-4.4552,4.5718,-0.9888,0


In [None]:
X = bankdata.drop('Class', axis=1)
y = bankdata['Class']

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)

In [None]:
from sklearn.svm import SVC
svclassifier = SVC(kernel='linear')
svclassifier.fit(X_train, y_train)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

y_pred = svclassifier.predict(X_test)

print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           0       0.99      0.99      0.99       150
           1       0.98      0.98      0.98       125

    accuracy                           0.99       275
   macro avg       0.99      0.99      0.99       275
weighted avg       0.99      0.99      0.99       275



### **Blockchain transaction anonymity and intelligence.**






## Python Dependencies

In [None]:
!pip install -U git+https://github.com/apogiatzis/etherscan_py
!pip install neo4j

Collecting git+https://github.com/apogiatzis/etherscan_py
  Cloning https://github.com/apogiatzis/etherscan_py to /tmp/pip-req-build-jauy5vrt
  Running command git clone -q https://github.com/apogiatzis/etherscan_py /tmp/pip-req-build-jauy5vrt
Building wheels for collected packages: etherscan-py
  Building wheel for etherscan-py (setup.py) ... [?25l[?25hdone
  Created wheel for etherscan-py: filename=etherscan_py-0.2.4-py2.py3-none-any.whl size=6578 sha256=183f04f0aa5d3903f851114f36db459d5251c9d816ecfafda539388e0631e703
  Stored in directory: /tmp/pip-ephem-wheel-cache-qpgbc1o8/wheels/35/56/cf/ac7e9d2250b53dec727905714a99f4d5c7b2f613139324aa42
Successfully built etherscan-py
Installing collected packages: etherscan-py
Successfully installed etherscan-py-0.2.4
Collecting neo4j
[?25l  Downloading https://files.pythonhosted.org/packages/36/f7/3c0b20ad7cdeac89d44e5380b0b4507995b1aec843692e3e76dd6cd1c638/neo4j-4.2.1.tar.gz (69kB)
[K     |████████████████████████████████| 71kB 3.4MB/s 
B

## Neo4J Instance

For the purpose of visualizing some of the transactions in this lab, a Neo4J database will be used. Luckilym Neo4J provides a free Neo4j instance that persists the data for 3 days before resetting. To follow along with the lab go ahead and create an account here(https://sandbox.neo4j.com/) to acquire your free instance.

**Use the credentials and urls provided in the code below to connect yo your instance.**

In [None]:
from neo4j import GraphDatabase

# change these to your credentials and url
BOLT_URL = "bolt://18.233.0.156:7687"
USER = "neo4j"
PWD = "assignments-destination-guess"

class Neo4jConnection:
    
    def __init__(self, uri, user, pwd):
        self.__uri = uri
        self.__user = user
        self.__pwd = pwd
        self.__driver = None
        try:
            self.__driver = GraphDatabase.driver(self.__uri, auth=(self.__user, self.__pwd))
        except Exception as e:
            print("Failed to create the driver:", e)
        
    def close(self):
        if self.__driver is not None:
            self.__driver.close()
        
    def query(self, query, parameters=None, db=None):
        assert self.__driver is not None, "Driver not initialized!"
        session = None
        response = None
        try: 
            session = self.__driver.session(database=db) if db is not None else self.__driver.session() 
            response = list(session.run(query, parameters))
        except Exception as e:
            print("Query failed:", e)
        finally: 
            if session is not None:
                session.close()
        return response


conn = Neo4jConnection(uri=BOLT_URL, user=USER, pwd=PWD)

In [None]:
conn.query('CREATE CONSTRAINT transactions IF NOT EXISTS ON (t:Transaction) ASSERT t.hash IS UNIQUE')
conn.query('CREATE CONSTRAINT addresses IF NOT EXISTS ON (a:Address) ASSERT a.public_key IS UNIQUE')

[]

### Cypher Queries:

Show all nodes and relationships:
```
MATCH (n) MATCH (n)-[r]-() RETURN n,r
```

Delete all notes and relationships:
```
MATCH (n) MATCH (n)-[r]-() DELETE n,r
```

# Helper functions for Neo4J

In [None]:
def insert_data(query, rows, batch_size = 10000):
    # Function to handle the updating the Neo4j database in batch mode.
    
    total = 0
    batch = 0
    start = time.time()
    result = None
    
    while batch * batch_size < len(rows):

        res = conn.query(query, 
                         parameters= {
                         'rows': rows[batch*batch_size:(batch+1)*batch_size].to_dict('records')})
        total += res[0]['total']
        batch += 1
        result = {"total":total, 
                  "batches":batch, 
                  "time":time.time()-start}
        print(result)

        return result

In [None]:
def add_transactions(address, transactions, direction="forward"):
    # Adds transactions nodes to the Neo4j graph.
    query = '''
      UNWIND $rows AS row
      MERGE (t:Transaction {txhash: row.txhash})
      ON CREATE SET
        t.gas_used = row.gas_used,
        t.value = row.value
      RETURN count(*) as total
    '''

    if direction == "forward":
      rel_query = """
        UNWIND $rows AS row
        MATCH
          (t:Transaction),
          (from_a:Address),
          (to_a:Address)
        WHERE t.txhash = row.txhash AND from_a.public_key = $address AND to_a.public_key = row.to_address
        CREATE (from_a)-[r_created:CREATED]->(t)
        CREATE (t)-[r_to:TO]->(to_a)
        RETURN  count(*) as total
      """
    elif direction == "backward":
      rel_query = """
        UNWIND $rows AS row
        MATCH
          (t:Transaction),
          (from_a:Address),
          (to_a:Address)
        WHERE t.txhash = row.txhash AND from_a.public_key = row.from_address AND to_a.public_key = $address
        CREATE (from_a)-[r_created:CREATED]->(t)
        CREATE (t)-[r_to:TO]->(to_a)
        RETURN  count(*) as total
      """

    transactions_added =  conn.query(query, parameters = {'rows':transactions})
    address_added =  conn.query(rel_query, parameters = {'rows':transactions, "address": address})

    return transactions_added

def add_addresses(addresses):
    # Adds transactions nodes to the Neo4j graph.
    query = '''
        UNWIND $rows AS row
        MERGE (a:Address {public_key: row})
        RETURN count(*) as total
    '''
    return conn.query(query, parameters = {'rows':addresses})

def save_to_neo4j(data, direction="forward"):
  # Create addresses
  add_addresses(list(data.keys()))

  # Create transactions
  for addr, transactions in data.items():
    add_transactions(addr, transactions, direction=direction)

# Ethrescan Setup



In [None]:
from etherscan_py import etherscan_py

# Change this with your API key
ETHERSCAN_API_KEY="96BZQVDCQTUII3WX8Y3IMRV6BR12THD1IP"

goerli_client = etherscan_py.Client(ETHERSCAN_API_KEY,network="goerli")
mainnet_client = etherscan_py.Client(ETHERSCAN_API_KEY)

# Blockchain analysis functions



In [None]:
import time

addresses = {}

def forward_address_intel(client, address, depth=1, checked_addresses=set()):
  if depth == 0: return

  # Avoid rate limitting
  time.sleep(1.5)

  address = address.lower()
  print("Forward tracking from: ", address)

  # Get all transactions sent by that address
  transactions = client.get_all_transactions(from_address=address, status=2)

  outgoing_transactions = [t for t in transactions if t.from_address == address]

  addresses[address] = addresses.get(address, set())
  addresses[address] |= set(outgoing_transactions)
  checked_addresses.add(address)
  
  unchecked_addresses = set([t.to_address for t in outgoing_transactions]) - checked_addresses
    
  # Do the same thing on each distinct recipient address of the transactions
  for addr in unchecked_addresses:
    forward_address_intel(client=client, address=addr, depth=depth-1, checked_addresses=checked_addresses)

def backtrack_address_intel(client, address, depth=1, checked_addresses=set()):
  if depth == 0: return

  # Avoid rate limitting
  time.sleep(1.5)

  address = address.lower()
  print("Backtracking from: ", address)

  # Get all transactions sent by that address
  transactions = client.get_all_transactions(from_address=address, status=2)

  incoming_transactions = [t for t in transactions if t.to_address == address]
  # print(incoming_transactions)
  addresses[address] = addresses.get(address, set())
  addresses[address] |= set(incoming_transactions)
  checked_addresses.add(address)
  
  unchecked_addresses = set([t.from_address for t in incoming_transactions]) - checked_addresses
    
  # # Do the same thing on each distinct recipient address of the transactions
  for addr in unchecked_addresses:
    backtrack_address_intel(client=client, address=addr, depth=depth-1, checked_addresses=checked_addresses)

# Normal transaction backtracking

 Can you verify that the origin (3 levels deep) of the funds in this Goerli Testnet 
Ethereum address 0xA885fCA76Bd27198Dd8E498D85809DEA4d0cbf26 was 
0x5d0ca2Bb3c0ba222128a21b7e66bC5ffF1D22d0A? Was there any anonymity technique 
used on those transactions? If yes which one (Coin mixing/CoinJoin)? Show a network graph 
of the transaction trail if applicable.

In [None]:
addresses = {}
backtrack_address_intel(client=goerli_client, address="0xA885fCA76Bd27198Dd8E498D85809DEA4d0cbf26", depth=3, checked_addresses=set())

Backtracking from:  0xa885fca76bd27198dd8e498d85809dea4d0cbf26


IndexError: ignored

In [None]:
# Convert transactions to dicts (Just for exporting to Neo4J)
backtrack_data_dict = {addr: [t.__dict__ for t in trans] for addr, trans in addresses.items()}

save_to_neo4j(backtrack_data_dict, direction="backward")

# Normal transaction backtracking
Can you verify that the origin (3 levels deep) of the funds in this Goerli Testnet 
Ethereum address 0x4A69805B898E6f05cC3b01a8E37e51A81d46C754 was 
0x8c673E60b2d30D59F3CE7598CF4134d0EF9e773d? Was there any anonymity 
technique used on those transactions? If yes which one (Coin mixing/CoinJoin)? Show a 
network graph of the transaction trail if applicable.

In [None]:
addresses = {}
backtrack_address_intel(client=goerli_client, address="0x4A69805B898E6f05cC3b01a8E37e51A81d46C754", depth=3, checked_addresses=set())

Backtracking from:  0x4a69805b898e6f05cc3b01a8e37e51a81d46c754
Backtracking from:  0xc44b86e59bd8357de08f33d8c83251f31b57a85c
Backtracking from:  0x8c673e60b2d30d59f3ce7598cf4134d0ef9e773d


In [None]:
# Convert transactions to dicts (Just for exporting to Neo4J)
backtrack_data_dict = {addr: [t.__dict__ for t in trans] for addr, trans in addresses.items()}

save_to_neo4j(backtrack_data_dict, direction="backward")

![](https://i.ibb.co/BB6SgkW/graph.png)