# Demo: iBench DBLPToAmalgam1

In the following, we will connect to a [Neo4j Community Edition](https://neo4j.com/product/neo4j-graph-database/) instance running inside a Docker container.
You can type 
```
sudo docker run --name neo4jDemo -p 7687:7687 -p 7474:7474 -v ~/research/DTGraph/output-ibench-data:/var/lib/neo4j/import --env=NEO4J_AUTH=none neo4j:5.16.0-community
``` 
to install and run a Neo4j Community Edition locally. (Of course you need to have [Docker](https://docs.docker.com/engine/install/) already installed on your system.) You should then be able to access [Neo4j browser](http://localhost:7474/browser/) running locally on your computer.

You need to replace `~/research/DTGraph` with the DTGraph's installation path on your computer. We need to mount the volume on the Docker instance to run the import scripts.

*Note:* We have specifically tested the compatibility of this framework with Neo4j Community Edition 5.16.0, which was the latest versions by the time of writting this guide.

In [1]:
from dtgraph import Neo4jGraph, Rule, Transformation
hostname = "localhost"
password = ""
uri = f"bolt://{hostname}:7687"
graph = Neo4jGraph(uri, database="neo4j", username="", password=password)

For this tutorial, we will use the [DBLPToAmalgam1](https://github.com/yannramusat/TPG/tree/main/input-ibench-config/dta1) data integration scenario from [iBench](https://github.com/RJMillerLab/ibench), which can be loaded into the database using the following command.

In [2]:
from dtgraph.scenarios.ibench_dta1 import iBenchDBLPToAmalgam1
iBenchDBLPToAmalgam1.load(graph, size = 1_000)

Flushed database: Deleted 11402 nodes, deleted 1168 relationships, completed after 81 ms.
CSV:    Added 1000 labels, created 1000 nodes, set 8000 properties, created 0 relationships, completed after 305 ms.
CSV:    Added 1000 labels, created 1000 nodes, set 10000 properties, created 0 relationships, completed after 306 ms.
CSV:    Added 1000 labels, created 1000 nodes, set 2000 properties, created 0 relationships, completed after 209 ms.
CSV:    Added 1000 labels, created 1000 nodes, set 9000 properties, created 0 relationships, completed after 265 ms.
CSV:    Added 1000 labels, created 1000 nodes, set 4000 properties, created 0 relationships, completed after 222 ms.
CSV:    Added 1000 labels, created 1000 nodes, set 9000 properties, created 0 relationships, completed after 341 ms.
CSV:    Added 1000 labels, created 1000 nodes, set 4000 properties, created 0 relationships, completed after 292 ms.


In [3]:
rule1 = Rule('''
MATCH (dip:DInProceedings)
GENERATE
(x = (dip):InProceedings {
    pid = "SK1(" + dip.pid + ")",
    title = dip.title,
    bktitle = dip.booktitle,
    year = dip.year,
    month = dip.month,
    pages = dip.pages,
    vol = "SK2(" + dip.booktitle + "," + dip.year + ")",
    num = "SK3(" + dip.booktitle + "," + dip.year + "," + dip.month + ")", 
    loc = "SK4(" + dip.booktitle + "," + dip.year + "," + dip.month + ")", 
    class = "SK6(" + dip.pid + ")",
    note = "SK7(" + dip.pid + ")",
    annote = "SK8(" + dip.pid + ")"
})
''')

rule2 = Rule('''
MATCH (dip:DInProceedings)
MATCH (pa:PubAuthors)
WHERE pa.pid = dip.pid
GENERATE
(x = (dip):InProceedings {
    pid = "SK1(" + dip.pid + ")",
    title = dip.title,
    bktitle = dip.booktitle,
    year = dip.year,
    month = dip.month,
    pages = dip.pages,
    vol = "SK2(" + dip.booktitle + "," + dip.year + ")",
    num = "SK3(" + dip.booktitle + "," + dip.year + "," + dip.month + ")", 
    loc = "SK4(" + dip.booktitle + "," + dip.year + "," + dip.month + ")", 
    class = "SK6(" + dip.pid + ")",
    note = "SK7(" + dip.pid + ")",
    annote = "SK8(" + dip.pid + ")"
})-[():IN_PROC_PUBLISHED]->(au = (pa.author):Author {
    name = pa.author
})
''')

rule3 = Rule('''
MATCH (w:WWW)
GENERATE
(m = (w):Misc {
    miscid = "SK11(" + w.pid + ")",
    howpub = "SK12(" + w.pid + ")",
    confloc = "SK13(" + w.pid + ")",
    year = w.year,
    month = "SK14(" + w.pid + ")",
    pages = "SK15(" + w.pid + ")",
    vol = "SK16(" + w.pid + ")",
    num = "SK17(" + w.pid + ")",
    loc = "SK18(" + w.pid + ")",
    class ="SK19(" + w.pid + ")",
    note = "SK20(" + w.pid + ")",
    annote = "SK21(" + w.pid + ")"
})          
''')

rule4 = Rule('''
MATCH (w:WWW)
MATCH (pa:PubAuthors)
WHERE pa.pid = w.pid
GENERATE
(m = (w):Misc {
    miscid = "SK11(" + w.pid + ")",
    howpub = "SK12(" + w.pid + ")",
    confloc = "SK13(" + w.pid + ")",
    year = w.year,
    month = "SK14(" + w.pid + ")",
    pages = "SK15(" + w.pid + ")",
    vol = "SK16(" + w.pid + ")",
    num = "SK17(" + w.pid + ")",
    loc = "SK18(" + w.pid + ")",
    class ="SK19(" + w.pid + ")",
    note = "SK20(" + w.pid + ")",
    annote = "SK21(" + w.pid + ")"
})-[():MISC_PUBLISHED]->(au = (pa.author):Author {
    name = pa.author
})
''')

rule5 = Rule('''
MATCH (da:DArticle)
GENERATE
(a = (da):Article {
    articleid = "SK22(" + da.pid + ")",
    title = da.title,
    journal = da.journal,
    year = da.year,
    month = da.month,
    pages = da.pages,
    vol = da.volume,
    num = da.number, 
    loc = "SK23(" + da.pid + ")", 
    class = "SK24(" + da.pid + ")",
    note = "SK25(" + da.pid + ")",
    annote = "SK26(" + da.pid + ")"
})
''')

rule6 = Rule('''
MATCH (da:DArticle)
MATCH (pa:PubAuthors)
WHERE pa.pid = da.pid
GENERATE
(a = (da):Article {
    articleid = "SK22(" + da.pid + ")",
    title = da.title,
    journal = da.journal,
    year = da.year,
    month = da.month,
    pages = da.pages,
    vol = da.volume,
    num = da.number, 
    loc = "SK23(" + da.pid + ")", 
    class = "SK24(" + da.pid + ")",
    note = "SK25(" + da.pid + ")",
    annote = "SK26(" + da.pid + ")"
})-[():ARTICLE_PUBLISHED]->(au = (pa.author):Author {
    name = pa.author
})
''')

rule7 = Rule('''
MATCH (db:DBook)
GENERATE
(b = (db):Book {
    bookID = "SK27(" + db.pid + ")",
    title = db.title,
    publisher = db.publisher,
    year = db.year,
    month = "SK28(" + db.pid + ")",
    pages = "SK29(" + db.pid + ")",
    vol = "SK30(" + db.pid + ")",
    num = "SK31(" + db.pid + ")", 
    loc = "SK32(" + db.pid + ")", 
    class = "SK33(" + db.pid + ")",
    note = "SK34(" + db.pid + ")",
    annote = "SK35(" + db.pid + ")"
})
''')

rule8 = Rule('''
MATCH (db:DBook)
MATCH (pa:PubAuthors)
WHERE pa.pid = db.pid
GENERATE
(b = (db):Book {
    bookID = "SK27(" + db.pid + ")",
    title = db.title,
    publisher = db.publisher,
    year = db.year,
    month = "SK28(" + db.pid + ")",
    pages = "SK29(" + db.pid + ")",
    vol = "SK30(" + db.pid + ")",
    num = "SK31(" + db.pid + ")", 
    loc = "SK32(" + db.pid + ")", 
    class = "SK33(" + db.pid + ")",
    note = "SK34(" + db.pid + ")",
    annote = "SK35(" + db.pid + ")"
})-[():BOOK_PUBLISHED]->(au = (pa.author):Author {
    name = pa.author
})
''')

rule9 = Rule("""
MATCH (t:PhDThesis)
GENERATE
(m = (t):Misc {
    miscid = "SK36(" + t.author + "," + t.title + ")",
    title = t.title,
    howpub = "SK37(" + t.author + "," + t.title + ")",
    confloc = "SK38(" + t.author + "," + t.title + ")",
    year = t.year,
    month = t.month,
    pages = "SK39(" + t.author + "," + t.title + ")",
    vol = "SK40(" + t.author + "," + t.title + ")",
    num = t.number,
    loc = "SK41(" + t.author + "," + t.title + ")",
    class = "SK42(" + t.author + "," + t.title + ")",
    note = "SK43(" + t.author + "," + t.title + ")",
    annote = t.school
})-[():MISC_PUBLISHED]->(au = (t.author):Author {
    name = t.author
})
""")

rule10 = Rule("""
MATCH (t:MasterThesis)
GENERATE
(m = (t):Misc {
    miscid = "SK44(" + t.author + "," + t.title + ")",
    title = t.title,
    howpub = "SK45(" + t.author + "," + t.title + ")",
    confloc = "SK46(" + t.author + "," + t.title + ")",
    year = t.year,
    month = "SK47(" + t.author + "," + t.title + ")",
    pages = "SK48(" + t.author + "," + t.title + ")",
    vol = "SK49(" + t.author + "," + t.title + ")",
    num = "SK50(" + t.author + "," + t.title + ")",
    loc = "SK51(" + t.author + "," + t.title + ")",
    class = "SK52(" + t.author + "," + t.title + ")",
    note = "SK53(" + t.author + "," + t.title + ")",
    annote = t.school
})-[():MISC_PUBLISHED]->(au = (t.author):Author {
    name = t.author
})
""")

In [4]:
dta1_transform = Transformation([rule1, rule2, rule3, rule4, rule5, rule6, rule7, rule8, rule9, rule10], with_diagnose = False)
dta1_transform.apply_on(graph)

Index: Added 0 index, completed after 0 ms.
Rule: Added 2000 labels, created 1000 nodes, set 13000 properties, created 0 relationships, completed after 174 ms.
Rule: Added 470 labels, created 235 nodes, set 4309 properties, created 291 relationships, completed after 197 ms.
Rule: Added 2000 labels, created 1000 nodes, set 13000 properties, created 0 relationships, completed after 152 ms.
Rule: Added 156 labels, created 78 nodes, set 4208 properties, created 295 relationships, completed after 16 ms.
Rule: Added 2000 labels, created 1000 nodes, set 13000 properties, created 0 relationships, completed after 24 ms.
Rule: Added 90 labels, created 45 nodes, set 3937 properties, created 278 relationships, completed after 22 ms.
Rule: Added 2000 labels, created 1000 nodes, set 13000 properties, created 0 relationships, completed after 31 ms.
Rule: Added 88 labels, created 44 nodes, set 4300 properties, created 304 relationships, completed after 41 ms.


657