KleptoSyn

Synthetic data generation for investigative graphs based on patterns of bad-actor tradecraft.

Get Started

build a local environment

This project uses poetry for dependency management, virtual environment, builds, packaging, etc.

The source code is currently based on Python 3.11 or later.

To set up an environment locally:

git clone https://github.com/DerwenAI/kleptosyn.git
cd kleptosyn

poetry install --no-root --extras=demo

Default datasets are already loaded, see the notes in data/README.md

run the demo script and notebooks

The kleptosyn library has three modules:

net.py: load a graph based on Senzing entity resolution run on OS + OA data
sim.py: simulate fraud rings by sampling from subgraphs
syn.py: generate synthetic transactions, based on parameters from OCCRP analysis

Jupyter notebooks get used to analyze transactions from known fraud cases, then develop parameters for simulation:

occrp.ipynb: network science and queueing theory applied to analyze OCCRP data
visualize.ipynb: interactive visualization of the structure of the UBO graph
classify.ipynb: training a classifier model to predict the roles of shell companies
load_kuzu.ipynb: load the UBO graph datasets plus entity resolution into KùzuDB

To run these notebooks:

poetry run jupyter-lab

To run the demo.py script which generates synthetic data:

poetry run python3 demo.py

make use of the results

By default, output results will be serialized as:

data/graph.json: the network representation
data/transact.csv: transactions generated by the simulation
data/entities.csv: entities generated by the simulation
data/occrp.json: annotated network of the OCCRP money transfer data
rf_nodes.joblib: serialized model for the shell company classifier

development

First, set up the dev environment:

poetry install --extras=dev

To run pre-commit explicitly:

poetry run pre-commit

Sources

Default input data sources:

Ontologies used:

https://followthemoney.tech/

Methods

The simulation uses the following process:

Construct a Network that represents bad-actor subgraphs
- Use OpenSanctions (risk data) and Open Ownership (link data) for real-world UBO topologies
- Run Senzing entity resolution to generate a "backbone" for organizing the graph
- Partition into subgraphs and run centrality measures to identify UBO owners
Configure a Simulation for generating patterns of bad-actor tradecraft
- Analyze the transactions of the OCCRP "Azerbaijani Laundromat" leaked dataset (event data)
- Sample probability distributions for shell topologies, transfer amounts, and transfer timing
- Generate a large portion of "legit" transfers (49:1 ratio)
Generate the SynData (synthetic data) by applying the simulation on the network
- Track the generated bad-actor transactions
- Serialize the transactions and people/companies involved

Note that much of the "heavy-lifting" here is entity resolution performed by Senzing and network analytics performed by NetworkX.

As simulations scale, both the data generation and the fraud pattern detection would benefit by using the cuGraph high performance back-end for NetworkX.

We also show an integration with KùzuDB, an embeddable, scalable, extremely fast graph database.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.github		.github
data		data
kleptosyn		kleptosyn
patterns		patterns
tests		tests
.env		.env
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
NOTES.md		NOTES.md
README.md		README.md
classify.ipynb		classify.ipynb
demo.py		demo.py
load_kuzu.ipynb		load_kuzu.ipynb
occrp.ipynb		occrp.ipynb
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
rf_nodes.joblib		rf_nodes.joblib
visualize.ipynb		visualize.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

KleptoSyn

Get Started

build a local environment

run the demo script and notebooks

make use of the results

development

Sources

Methods

About

Uh oh!

Releases

Uh oh!

Languages

License

DerwenAI/kleptosyn

Folders and files

Latest commit

History

Repository files navigation

KleptoSyn

Get Started

build a local environment

run the demo script and notebooks

make use of the results

development

Sources

Methods

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Languages