Skip to content

Commit 5321a30

Browse files
committed
typos
1 parent 337837d commit 5321a30

1 file changed

Lines changed: 25 additions & 25 deletions

File tree

content/posts/paysim-part3.md

Lines changed: 25 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title = "Analyzing First Party Fraud with Neo4j 👺 (PaySim pt.3)"
33
author = ["Dave Voutila"]
44
description = "How can we leverage Graph Theory to detect 1st Party Fraud in our PaySim network?"
55
date = 2020-03-20
6-
lastmod = 2020-03-23T09:48:29-04:00
6+
lastmod = 2020-05-15T09:32:56-04:00
77
tags = ["neo4j", "fraud", "java", "paysim", "data-science"]
88
draft = false
99
+++
@@ -76,7 +76,7 @@ The above Cypher will:
7676
- Run a sub-query using APOC to get label counts
7777
- Analyze the label counts against the global label counts
7878

79-
<a id="orgd06d2fd"></a>
79+
<a id="orge51390d"></a>
8080

8181
{{< figure src="/img/paysim-node_freq.png" caption="Figure 1: Relative Frequency of Labels in our PaySim Graph" >}}
8282

@@ -109,7 +109,7 @@ The above Cypher performs a pretty basic aggregation of the number of
109109
transactions by type, the total monetary value, and the average value
110110
of each transaction.
111111

112-
<a id="orgd88f7b4"></a>
112+
<a id="orgccea178"></a>
113113

114114
{{< figure src="/img/paysim-transaction_freq.png" caption="Figure 2: Aggregate Transaction statistical profile" >}}
115115

@@ -142,7 +142,7 @@ generate pools of identifiers like **Emails**, **SSNs**, and **Phone
142142
Numbers** that they remix into different (ideally unique) combinations
143143
when creating a client in our network. Then at some time in the
144144
future, they drain those accounts via an intermediary (a **mule**) and
145-
conduct a `CashOut` to exflitrate the money from our network.
145+
conduct a `CashOut` to exfiltrate the money from our network.
146146

147147
Our methodology for finding these fraudulent accounts will be as
148148
follows:
@@ -173,7 +173,7 @@ directionality of relationships.
173173
> algorithm. They're great for understanding the structure of a
174174
> graph.
175175
176-
<a id="orga2d73a2"></a>
176+
<a id="orgf6c38d9"></a>
177177

178178
{{< figure src="/img/3rdparty/Pseudoforest.svg" caption="Figure 3: \"A graph with three components\" by David Eppstein (Public Domain, Wikipedia, 2007)" >}}
179179

@@ -194,7 +194,7 @@ load it into memory.[^fn:2]
194194

195195
Recall our data model we built out in [part 1]({{< relref "paysim" >}}):
196196

197-
<a id="org6aa35d9"></a>
197+
<a id="org74926fe"></a>
198198

199199
{{< figure src="/img/paysim-2.1.0.png" caption="Figure 4: The PaySim 2.1 Data Model" >}}
200200

@@ -211,7 +211,7 @@ labels: **HAS\_SSN, HAS\_EMAIL, HAS\_PHONE**.
211211

212212
So let's target the following subgraph:
213213

214-
<a id="org07ad343"></a>
214+
<a id="orga9bc01f"></a>
215215

216216
{{< figure src="/img/simple-identity-model.png" caption="Figure 5: Just our Identifiers in PaySim 2.1" >}}
217217

@@ -226,7 +226,7 @@ CALL gds.graph.create.estimate(
226226
['HAS_SSN', 'HAS_EMAIL', 'HAS_PHONE'])
227227
```
228228

229-
<a id="org37d75b6"></a>
229+
<a id="orgff52b37"></a>
230230

231231
{{< figure src="/img/paysim-part3-wcc-estimate.png" caption="Figure 6: Our estimate for our Graph Projection" >}}
232232

@@ -251,7 +251,7 @@ You should see some metadata output telling you some details about the
251251
type and size of the graph projection. It'll detail how many
252252
relationships and nodes were processed plus some other facts.
253253

254-
<a id="orgae4738b"></a>
254+
<a id="org51f7b3b"></a>
255255

256256
{{< figure src="/img/paysim-part3-load-wcc.png" caption="Figure 7: Our \"wccGroups\" graph projection output" >}}
257257

@@ -286,7 +286,7 @@ Scanning the results, we have a few large clusters and a lot of small
286286
clusters. Those large clusters will probably be of interest and we'll
287287
come back to that shortly.
288288

289-
<a id="org0b195e0"></a>
289+
<a id="orge7a3538"></a>
290290

291291
{{< figure src="/img/paysim-part3-wcc-stream.png" caption="Figure 8: Our largest graph Components per WCC" >}}
292292

@@ -341,7 +341,7 @@ ORDER BY groupSize DESC
341341

342342
What's the data look like?
343343

344-
<a id="org54c7ee0"></a>
344+
<a id="orgbb7e774"></a>
345345

346346
{{< figure src="/img/paysim-part3-wcc-analysis.png" caption="Figure 9: Histogram of Group Size" >}}
347347

@@ -362,7 +362,7 @@ MATCH p=(c:Client {fraud_group:groupId})-[:HAS_SSN|HAS_EMAIL|HAS_PHONE]->()
362362
RETURN p
363363
```
364364

365-
<a id="org82ce1ab"></a>
365+
<a id="org5aeb3c4"></a>
366366

367367
{{< figure src="/img/paysim-part3-wcc-large-groups.svg" caption="Figure 10: Our Fraud Groups (of size > 8)" >}}
368368

@@ -398,7 +398,7 @@ WHERE c.fraud_group IS NULL
398398
RETURN p
399399
```
400400

401-
<a id="org1b2cc26"></a>
401+
<a id="org0c5b81a"></a>
402402

403403
{{< figure src="/img/paysim-part3-external-transactions.svg" caption="Figure 11: External Transactions with our Large Fraud Groups" >}}
404404

@@ -426,7 +426,7 @@ UNWIND labels(txn) AS txnType
426426
RETURN distinct(txnType), count(txnType)
427427
```
428428

429-
<a id="org3ae8d10"></a>
429+
<a id="orgbc83ea1"></a>
430430

431431
{{< figure src="/img/paysim-part3-external-transactions-analysis.png" caption="Figure 12: An Analysis of Transactions between our Fraud Groups and Others" >}}
432432

@@ -439,9 +439,9 @@ groups are **all Transfers.** Kinda fishy!
439439
We've now identified four potential fraud rings. Let's tag them and
440440
relate them to one another to make further analysis easier.
441441

442-
We'll simplify how our suspect Clients relat to one another connecting
443-
them via direct `TRANSACTED_WITH` relationships if they've performed a
444-
Transaction with one another:
442+
We'll simplify how our suspect Clients relate to one another
443+
connecting them via direct `TRANSACTED_WITH` relationships if they've
444+
performed a Transaction with one another:
445445

446446
```cypher
447447
// Recall our tagged Clients and group them by group size
@@ -468,7 +468,7 @@ RETURN count(r)
468468
469469
Now how do our simplified 2nd-level groups look?
470470

471-
<a id="org824163b"></a>
471+
<a id="org6f82f1e"></a>
472472

473473
{{< figure src="/img/paysim-part3-second-level.svg" caption="Figure 13: Our 2nd-Level Fraud Groups" >}}
474474

@@ -521,7 +521,7 @@ RETURN secondGroupId, size(members) AS groupSize
521521
ORDER BY groupSize DESC
522522
```
523523

524-
<a id="org25a543a"></a>
524+
<a id="org916e20b"></a>
525525

526526
{{< figure src="/img/paysim-part3-second-level-sizes.png" caption="Figure 14: How large are our 2nd Level Fraud Groups?" >}}
527527

@@ -533,7 +533,7 @@ to the others! Probably a high-value fraud ring we can try breaking up.
533533

534534
First thing we can do is use our eyeballs and our intuition. Graphs
535535
make it easy for humans to start asking questions because we're
536-
glorified pattern-recognition biocomputers doing it since birth using
536+
glorified pattern recognition biocomputers doing it since birth using
537537
any of our senses as input.
538538

539539
But how can we do this algorithmically?
@@ -545,11 +545,11 @@ Let's say we want to tackle that massive 140 Client potential fraud
545545
ring. Looking at the graph visually, there appear to be 3 Client
546546
accounts that tie the whole thing together:
547547

548-
<a id="org3583f32"></a>
548+
<a id="org1432015"></a>
549549

550550
{{< figure src="/img/paysim-part3-second-level-targets.png" caption="Figure 15: Our potential Targets" >}}
551551

552-
How can we programatically target `Thomas Gomez`, `Samuel Petty`, and
552+
How can we programmatically target `Thomas Gomez`, `Samuel Petty`, and
553553
`Luke Oneal`?
554554

555555

@@ -582,7 +582,7 @@ RETURN c.name AS name, centrality ORDER BY centrality DESC
582582

583583
Let's take a look at the highest scores:
584584

585-
<a id="org6464ab4"></a>
585+
<a id="org18a4ef0"></a>
586586

587587
{{< figure src="/img/paysim-part3-centrality-v1.png" caption="Figure 16: Clients of 2nd Level Fraud Group 1 sorted by Centrality" >}}
588588

@@ -629,7 +629,7 @@ RETURN name, newScore, original ORDER BY newScore DESC
629629

630630
Bingo! Our targets are now in the Top 3.
631631

632-
<a id="orgdbc6ecd"></a>
632+
<a id="org867f567"></a>
633633

634634
{{< figure src="/img/paysim-part3-centrality-v2.png" caption="Figure 17: Our bespoke Betweenness Scoring" >}}
635635

@@ -646,7 +646,7 @@ critical steps in our analysis of our financial transaction data:
646646
the groups we identified looked very different than they first
647647
appeared!
648648
4. We re-ran WCC and retagged our suspects.
649-
5. We algorithmically found a way to identify lynchpins in our largest
649+
5. We algorithmically found a way to identify linchpins in our largest
650650
potential fraud network using a combination of _Betweenness
651651
Centrality_ and some old fashioned intuition!
652652

0 commit comments

Comments
 (0)