## Prepare your environment: connect to TigerGraph database, create schema, map and load data 

This section of the gsql_102 walks you through creating the graph and loading graph data. We will be using the LDBC Social Network Benchmark (LDBC SNB) as the example. This data set models a typical online forum where users post messages and discuss topics. It comes with a data generator, which allows you to generate data at different scale factors. Scale factor 1 generates roughly 1GB of raw data, scale factor 10 generates roughly 10GB of raw data, and so on. Further GSQL 102 documentation can be found here (https://docs.tigergraph.com/gsql-ref/current/tutorials/pattern-matching/).

### Create connection

In [1]:
import json
import pandas as pd
from pyTigerGraph import TigerGraphConnection

# Read in DB configs
with open('../config.json', "r") as config_file:
    config = json.load(config_file)

conn = TigerGraphConnection(
    host=config["host"],
    username=config["username"],
    password=config["password"],
)

### Download ldbc_snb dataset

In [2]:
from pyTigerGraph.datasets import Datasets
dataset_ldbc = Datasets("ldbc_snb")

Downloading:   0%|          | 0/286678171 [00:00<?, ?it/s]

### Ingest data

In [3]:
conn.ingestDataset(dataset_ldbc, getToken=config["getToken"])

---- Checking database ----
---- Creating graph ----
The graph ldbc_snb is created.
---- Creating schema ----
Using graph 'ldbc_snb'
Successfully created schema change jobs: [ldbc_snb_schema].
Kick off schema change job ldbc_snb_schema
Doing schema change on graph 'ldbc_snb' (current version: 0)
Trying to add local vertex 'Comment' to the graph 'ldbc_snb'.
Trying to add local vertex 'Post' to the graph 'ldbc_snb'.
Trying to add local vertex 'Company' to the graph 'ldbc_snb'.
Trying to add local vertex 'University' to the graph 'ldbc_snb'.
Trying to add local vertex 'City' to the graph 'ldbc_snb'.
Trying to add local vertex 'Country' to the graph 'ldbc_snb'.
Trying to add local vertex 'Continent' to the graph 'ldbc_snb'.
Trying to add local vertex 'Forum' to the graph 'ldbc_snb'.
Trying to add local vertex 'Person' to the graph 'ldbc_snb'.
Trying to add local vertex 'Tag' to the graph 'ldbc_snb'.
Trying to add local vertex 'Tag_Class' to the graph 'ldbc_snb'.
Trying to add local edge 'C

---- Cleaning ----
---- Finished ingestion ----


### Visualize schema

In [4]:
from pyTigerGraph.visualization import drawSchema
drawSchema(conn.getSchema(force=True))

CytoscapeWidget(cytoscape_layout={'name': 'circle', 'animate': True, 'padding': 1}, cytoscape_style=[{'selecto…

### Print graph stats

In [5]:
vertices = conn.getVertexTypes()
total_count = 0
for vertex in vertices:
    vertex_cnt = conn.getVertexCount(vertex)
    total_count += vertex_cnt
    print("Node count: ({} : {}) ".format(vertex, vertex_cnt))
print("Total node count: ", total_count)

Node count: (Comment : 2052169) 
Node count: (Post : 1003605) 
Node count: (Company : 1575) 
Node count: (University : 6380) 
Node count: (City : 1343) 
Node count: (Country : 111) 
Node count: (Continent : 6) 
Node count: (Forum : 90492) 
Node count: (Person : 9892) 
Node count: (Tag : 16080) 
Node count: (Tag_Class : 71) 
Total node count:  3181724


In [6]:
import pprint
edge_count = conn.getEdgeCount()
print("Edges count: total ", sum(edge_count.values()))
pprint.pprint(edge_count) 

Edges count: total  33127294
{'Container_Of': 1003605,
 'Container_Of_Reverse': 1003605,
 'Has_Creator': 3055774,
 'Has_Creator_Reverse': 3055774,
 'Has_Interest': 229166,
 'Has_Interest_Reverse': 229166,
 'Has_Member': 1611869,
 'Has_Member_Reverse': 1611869,
 'Has_Moderator': 90492,
 'Has_Moderator_Reverse': 90492,
 'Has_Tag': 3721417,
 'Has_Tag_Reverse': 3721417,
 'Has_Type': 0,
 'Has_Type_Reverse': 9055,
 'Is_Located_In': 3073621,
 'Is_Located_In_Reverse': 3073621,
 'Is_Part_Of': 0,
 'Is_Part_Of_Reverse': 111,
 'Is_Subclass_Of': 49,
 'Is_Subclass_Of_Reverse': 62,
 'Knows': 180623,
 'Likes': 1621994,
 'Likes_Reverse': 1633457,
 'Reply_Of': 2052169,
 'Reply_Of_Reverse': 2052169,
 'Study_At': 1523,
 'Study_At_Reverse': 0,
 'Work_At': 4194,
 'Work_At_Reverse': 0}


## One-hop patterns

After the loading job finishes running, your tutorial setup is complete, and you are ready to start learning about One-hop patterns.

Pattern matching by nature is declarative. It enables users to focus on specifying what they want from a query without worrying about the underlying query processing.

The pattern specifies sets of vertex types and how they are connected by edge types.

A pattern usually appears in the FROM clause, the most fundamental part of the query structure. A pattern can be refined further with conditions in the WHERE clause.

In this tutorial, we’ll start with simple one-hop path patterns, and then extend it multi-hop patterns and finally multiple-path patterns.

The easiest way to understand patterns is to start with a simple 1-Hop pattern. Even a single hop has several options. After we’ve tackled single hops, then we’ll see how to add repetition to make variable length patterns and how to connect single hops to form bigger patterns.

In GSQL syntax V2, we use the punctuation -( )- to denote a 1-hop pattern.

The edge type(s) is enclosed in the parentheses () and the hyphens - symbolize connection. The directionality of the connection is explicitly stated for each edge type. Arrowheads are used to indicate direction: > or <.

* For an undirected edge E, leave as is: E

* For a directed edge E from left to right, use a right-pointing arrowhead: E>

* For a directed edge E from right to left, use a left-pointing arrowhead: <E

In this notation, E is a placeholder for any edge in a graph. "Left" and "right" refer to the order of the actual written pattern itself.

## Examples of 1-Hop Fixed Length Query

### Example 1. Left-Directed Edge Pattern

Find persons who know the person named "Viktor Akhiezer" and return the top 3 oldest such persons.

In [7]:
query = """
INTERPRET QUERY() FOR GRAPH ldbc_snb SYNTAX v2 {
    friends = SELECT p
              FROM Person:s -(Knows:e)- Person:p
              WHERE s.first_name == "Viktor" AND s.last_name == "Akhiezer"
              ORDER BY p.birthday ASC
              LIMIT 3;
              
    PRINT friends[friends.first_name, friends.last_name, friends.birthday];
}
"""

## run interpret query
results = conn.runInterpretedQuery(query)

## print results
print(json.dumps(results, indent=1))

[
 {
  "friends": [
   {
    "v_id": "4398046517846",
    "v_type": "Person",
    "attributes": {
     "friends.first_name": "Abdul-Malik",
     "friends.last_name": "Glosca",
     "friends.birthday": "1980-04-24 00:00:00"
    }
   },
   {
    "v_id": "10995116279461",
    "v_type": "Person",
    "attributes": {
     "friends.first_name": "Gregorio",
     "friends.last_name": "Cajes",
     "friends.birthday": "1980-05-13 00:00:00"
    }
   },
   {
    "v_id": "6597069776731",
    "v_type": "Person",
    "attributes": {
     "friends.first_name": "Sven",
     "friends.last_name": "Carlsson",
     "friends.birthday": "1981-02-25 00:00:00"
    }
   }
  ]
 }
]


### Example 2. Right-directed Edge Pattern

Find the total number of comments and total number of posts liked by Viktor. A Person can reach Comments or Posts via a directed edge LIKES.

In [8]:
query = """
INTERPRET QUERY() FOR GRAPH ldbc_snb SYNTAX v2 {
    SumAccum<int> @comment_cnt= 0;
    SumAccum<int> @post_cnt= 0;

    // 1-hop pattern.
    Result = SELECT s
             FROM Person:s -(Likes>)- :tgt
             WHERE s.first_name == "Viktor" AND s.last_name == "Akhiezer"
             ACCUM CASE WHEN tgt.type == "Comment" THEN
                             s.@comment_cnt += 1
                        WHEN tgt.type == "Post" THEN
                             s.@post_cnt += 1
             END;

    PRINT  Result[Result.@comment_cnt, Result.@post_cnt];
}
"""

## run interpret query
results = conn.runInterpretedQuery(query)

## print results
print(json.dumps(results, indent=1))

[
 {
  "Result": [
   {
    "v_id": "28587302323577",
    "v_type": "Person",
    "attributes": {
     "Result.@comment_cnt": 108,
     "Result.@post_cnt": 51
    }
   }
  ]
 }
]


### Example 3. Left-directed Edge Pattern of Example 2

Solve the same problem as in Example 2, but use a left-directed edge pattern.

Note below (line 8) that the source vertex set are now Comment and Post, and the target is Person.

In [9]:
query = """
INTERPRET QUERY() FOR GRAPH ldbc_snb SYNTAX v2{
    SumAccum<int> @comment_cnt= 0;
    SumAccum<int> @post_cnt= 0;

    Result = SELECT tgt
             FROM Person:tgt -(<Likes_Reverse)- (Comment|Post):src
             WHERE tgt.first_name == "Viktor" AND tgt.last_name == "Akhiezer"
             ACCUM CASE WHEN src.type == "Comment" THEN
                             tgt.@comment_cnt += 1
                        WHEN src.type == "Post" THEN
                             tgt.@post_cnt += 1
             END;

    PRINT Result[Result.@comment_cnt, Result.@post_cnt];
}
"""

## run interpret query
results = conn.runInterpretedQuery(query)

## print results
print(json.dumps(results, indent=1))

[
 {
  "Result": [
   {
    "v_id": "28587302323577",
    "v_type": "Person",
    "attributes": {
     "Result.@comment_cnt": 108,
     "Result.@post_cnt": 51
    }
   }
  ]
 }
]


### Example 4. Disjunctive 1-hop edge pattern.

Find Viktor Akhiezer’s total number of related comments and total number of related posts. That is, a comment or post is either created by Viktor or is liked by Viktor. Note that the HAS_CREATOR edge type starts from Comment|Post, and the LIKES edge type starts from Person.

In [10]:
query = """
INTERPRET QUERY() FOR GRAPH ldbc_snb SYNTAX v2 {
    SumAccum<int> @comment_cnt= 0;
    SumAccum<int> @post_cnt= 0;

    Result = SELECT tgt
             FROM Person:tgt -(<Has_Creator|Likes>)- (Comment|Post):src
             WHERE tgt.first_name == "Viktor" AND tgt.last_name == "Akhiezer"
             ACCUM CASE WHEN src.type == "Comment" THEN
                             tgt.@comment_cnt += 1
                        WHEN src.type == "Post" THEN
                             tgt.@post_cnt += 1
             END;

    PRINT Result[Result.@comment_cnt, Result.@post_cnt];
}
"""

## run interpret query
results = conn.runInterpretedQuery(query)

## print results
print(json.dumps(results, indent=1))

[
 {
  "Result": [
   {
    "v_id": "28587302323577",
    "v_type": "Person",
    "attributes": {
     "Result.@comment_cnt": 152,
     "Result.@post_cnt": 96
    }
   }
  ]
 }
]


### Example 5. Disjunctive 1-hop edge pattern.
Find the total number of comments or posts related to "Viktor Akhiezer". This time, we count them together and, we use the wildcard _ to represent the two types of edges: HAS_CREATOR and LIKES_REVERSE. Both are following the same direction.

In [11]:
query = """
INTERPRET QUERY() FOR GRAPH ldbc_snb SYNTAX v2 {
    SumAccum<int> @@cnt= 0;

    Result = SELECT tgt
             FROM Person:tgt -(<_)- (Comment|Post):src
             WHERE tgt.first_name == "Viktor" AND tgt.last_name == "Akhiezer"
             ACCUM  @@cnt += 1;

    PRINT @@cnt;
}
"""

## run interpret query
results = conn.runInterpretedQuery(query)

## print results
print(json.dumps(results, indent=1))

[
 {
  "@@cnt": 248
 }
]


## Repeating a 1-Hop Pattern

A common pattern is the two-step "Friend of a Friend".

This is related to the question "Do I know any famous people?" Even if you aren’t friends with any famous people, at least one of your friends' friends might be famous. That’s a one-hop pattern, repeated twice.

In terms of data throughput on a network, you can also ask "If everyone who receives a message passes it along to everyone else they know, how many entities will receive it?"

GSQL pattern matching makes it easy to express such variable-length patterns which repeat a single hop. Everything else stays the same as introduced in the previous section, except we append an asterisk (or Kleene star) and an optional min..max range to an edge pattern.

(E*) means edge type E repeats any number of times (including zero)

(E*1..3) means edge type E occurs one to three times.

### Example 1. Directed Edge Pattern Unconstrained Repetition

Find the direct or indirect superclass (including the self class) of the tag_class whose name is "TennisPlayer".

In [12]:
query = """
INTERPRET QUERY() FOR GRAPH ldbc_snb SYNTAX v2 {
    tag_class1 = SELECT t
                 FROM Tag_Class:s - (Is_Subclass_Of>*) - Tag_Class:t
                 WHERE s.name == "TennisPlayer";

    PRINT  tag_class1;
}
"""

## run interpret query
results = conn.runInterpretedQuery(query)

## print results
print(json.dumps(results, indent=1))

[
 {
  "tag_class1": [
   {
    "v_id": "211",
    "v_type": "Tag_Class",
    "attributes": {
     "id": 211,
     "name": "Person",
     "url": "http://dbpedia.org/ontology/Person"
    }
   },
   {
    "v_id": "239",
    "v_type": "Tag_Class",
    "attributes": {
     "id": 239,
     "name": "Agent",
     "url": "http://dbpedia.org/ontology/Agent"
    }
   },
   {
    "v_id": "149",
    "v_type": "Tag_Class",
    "attributes": {
     "id": 149,
     "name": "Athlete",
     "url": "http://dbpedia.org/ontology/Athlete"
    }
   },
   {
    "v_id": "59",
    "v_type": "Tag_Class",
    "attributes": {
     "id": 59,
     "name": "TennisPlayer",
     "url": "http://dbpedia.org/ontology/TennisPlayer"
    }
   },
   {
    "v_id": "0",
    "v_type": "Tag_Class",
    "attributes": {
     "id": 0,
     "name": "Thing",
     "url": "http://www.w3.org/2002/07/owl#Thing"
    }
   }
  ]
 }
]


### Example 2. Exactly 1 Repetition of A Directed Edge

Find the immediate superclass of the tag_class whose name is "tennis_player". (This is equivalent to a 1-hop non-repeating pattern.)


In [13]:
query = """
INTERPRET QUERY() FOR GRAPH ldbc_snb SYNTAX v2 {

    tag_class1 =  SELECT t
        FROM Tag_Class:s - (Is_Subclass_Of>*1) - Tag_Class:t
        WHERE s.name == "TennisPlayer";

    PRINT tag_class1;
}
"""

## run interpret query
results = conn.runInterpretedQuery(query)

## print results
print(json.dumps(results, indent=1))

[
 {
  "tag_class1": [
   {
    "v_id": "149",
    "v_type": "Tag_Class",
    "attributes": {
     "id": 149,
     "name": "Athlete",
     "url": "http://dbpedia.org/ontology/Athlete"
    }
   }
  ]
 }
]


### Example 3. 1 to 2 Repetition Of A Directed Edge.

Find the 1 to 2 hops direct and indirect superclasses of the tag_class whose name is "TennisPlayer".

In [14]:
query = """
INTERPRET QUERY() FOR GRAPH ldbc_snb SYNTAX v2 {

    tag_class1 =  SELECT t
                  FROM Tag_Class:s - (Is_Subclass_Of>*1..2) - Tag_Class:t
                  WHERE s.name == "TennisPlayer";

    PRINT tag_class1;
}
"""

## run interpret query
results = conn.runInterpretedQuery(query)

## print results
print(json.dumps(results, indent=1))

[
 {
  "tag_class1": [
   {
    "v_id": "211",
    "v_type": "Tag_Class",
    "attributes": {
     "id": 211,
     "name": "Person",
     "url": "http://dbpedia.org/ontology/Person"
    }
   },
   {
    "v_id": "149",
    "v_type": "Tag_Class",
    "attributes": {
     "id": 149,
     "name": "Athlete",
     "url": "http://dbpedia.org/ontology/Athlete"
    }
   }
  ]
 }
]


### Example 4. Up-to 2 Repetition Of A Directed Edge.

Find the superclasses within 2 hops of the tag_class whose name is "TennisPlayer".

In [15]:
query = """
INTERPRET QUERY() FOR GRAPH ldbc_snb SYNTAX v2 {
    tag_class1 = SELECT t
                 FROM Tag_Class:s - (Is_Subclass_Of>*..2) - Tag_Class:t
                 WHERE s.name == "TennisPlayer";

    PRINT  tag_class1;
}
"""

## run interpret query
results = conn.runInterpretedQuery(query)

## print results
print(json.dumps(results, indent=1))

[
 {
  "tag_class1": [
   {
    "v_id": "211",
    "v_type": "Tag_Class",
    "attributes": {
     "id": 211,
     "name": "Person",
     "url": "http://dbpedia.org/ontology/Person"
    }
   },
   {
    "v_id": "149",
    "v_type": "Tag_Class",
    "attributes": {
     "id": 149,
     "name": "Athlete",
     "url": "http://dbpedia.org/ontology/Athlete"
    }
   },
   {
    "v_id": "59",
    "v_type": "Tag_Class",
    "attributes": {
     "id": 59,
     "name": "TennisPlayer",
     "url": "http://dbpedia.org/ontology/TennisPlayer"
    }
   }
  ]
 }
]


### Example 5. At Least 1 Repetition Of A Directed Edge.

Find the superclasses at least one hop from the tag_class whose name is "TennisPlayer".

In [16]:
query = """
INTERPRET QUERY() FOR GRAPH ldbc_snb SYNTAX v2 {

    tag_class1 = SELECT t
                 FROM Tag_Class:s - (Is_Subclass_Of>*1..) - Tag_Class:t
                 WHERE s.name == "TennisPlayer";

    PRINT tag_class1;
}
"""

## run interpret query
results = conn.runInterpretedQuery(query)

## print results
print(json.dumps(results, indent=1))

[
 {
  "tag_class1": [
   {
    "v_id": "211",
    "v_type": "Tag_Class",
    "attributes": {
     "id": 211,
     "name": "Person",
     "url": "http://dbpedia.org/ontology/Person"
    }
   },
   {
    "v_id": "149",
    "v_type": "Tag_Class",
    "attributes": {
     "id": 149,
     "name": "Athlete",
     "url": "http://dbpedia.org/ontology/Athlete"
    }
   },
   {
    "v_id": "239",
    "v_type": "Tag_Class",
    "attributes": {
     "id": 239,
     "name": "Agent",
     "url": "http://dbpedia.org/ontology/Agent"
    }
   },
   {
    "v_id": "0",
    "v_type": "Tag_Class",
    "attributes": {
     "id": 0,
     "name": "Thing",
     "url": "http://www.w3.org/2002/07/owl#Thing"
    }
   }
  ]
 }
]


### Example 6. Disjunctive 1-Repetition Directed Edge.

Find the 3 most recent comments that are liked or created by Viktor Akhiezer and the total number of comments liked or created by the same person.

In [17]:
query = """
INTERPRET QUERY() FOR GRAPH ldbc_snb SYNTAX v2 {
    SumAccum<INT> @@comment_cnt = 0;

    // find top 3 latest comments that is liked or created by Viktor Akhiezer
    // and the total number of comments related to Viktor Akhiezer
    top_3_comments = SELECT p
                     FROM Person:s - ((<Has_Creator|Likes>)*1) - Comment:p
                     WHERE s.first_name == "Viktor" AND s.last_name == "Akhiezer"
                     ACCUM @@comment_cnt += 1
                     ORDER BY p.creation_date DESC
                     LIMIT 3;

    PRINT top_3_comments;
    // total number of comments related to Viktor Akhiezer
    PRINT @@comment_cnt;
}
"""

## run interpret query
results = conn.runInterpretedQuery(query)

## print results
print(json.dumps(results, indent=1))

[
 {
  "top_3_comments": [
   {
    "v_id": "2061584720640",
    "v_type": "Comment",
    "attributes": {
     "id": 2061584720640,
     "creation_date": "2012-09-06 06:46:31",
     "location_ip": "194.62.64.117",
     "browser_used": "Chrome",
     "content": "fine",
     "length": 4
    }
   },
   {
    "v_id": "2061590804929",
    "v_type": "Comment",
    "attributes": {
     "id": 2061590804929,
     "creation_date": "2012-09-04 16:16:56",
     "location_ip": "194.62.64.117",
     "browser_used": "Chrome",
     "content": "About Muttiah Muralitharan, mit by nine degrees, five degrees being thAbout Steve M",
     "length": 83
    }
   },
   {
    "v_id": "2061586872389",
    "v_type": "Comment",
    "attributes": {
     "id": 2061586872389,
     "creation_date": "2012-08-28 14:54:46",
     "location_ip": "31.216.177.175",
     "browser_used": "Chrome",
     "content": "About Hector Berlioz, his compositions Symphonie fantastique and GraAbout Who Knew, the gu",
     "length": 90
    

## Multiple Hop Patterns and Accumulation

Repeating the same hop is useful sometimes, but the real power of pattern matching comes from expressing multi-hop patterns, with specific characteristics for each hop. a 2-hop pattern is a simple concatenation and merging of two 1-hop patterns where the two patterns share a common endpoint. Similarly, a 3-hop pattern concatenates three 1-hop patterns in sequence, each pair of adjacent hops sharing one end point. A multi-hop pattern has two endpoint vertex sets and one or more intermediate vertex sets. If the query does not need to express any conditions for an intermediate vertex set, then the vertex set can be omitted and the two surrounding edge sets can be joined with a simple.

### POST-ACCUM example 

At the end of the ACCUM clause, all the requested accumulation (+=) operators are processed in bulk, and the updated values are now visible. You can now use POST-ACCUM clauses to perform a second, different round of computation on the results of your pattern matching.

The ACCUM clause executes for each full path that matches the pattern in the FROM clause. In contrast, the POST-ACCUM clause executes for each vertex in one vertex set (e.g. one vertex column in the matching table); its statements can access the aggregated accumulator result computed in the ACCUM clause. If you want to perform per-vertex updates for more than one vertex alias, you should use a separate POST-ACCUM clause for each vertex alias. The multiple POST-ACCUM clauses are processed in parallel; it doesn’t matter in what order you write them. (For each binding, the statements within a clause are executed in order.)

For example, below we have two POST-ACCUM clauses. The first one iterates through s, and for each s, we do s.@cnt2 += s.@cnt1. The second POST-ACCUM iterations through t.

In [18]:
query = """
INTERPRET QUERY() FOR GRAPH ldbc_snb SYNTAX v2 {

    SumAccum<int> @cnt1;
    SumAccum<int> @cnt2;

    R = SELECT s
        FROM Person:s-(Likes>) -:msg - (Has_Creator>)-Person:t
        WHERE s.first_name == "Viktor" AND 
              s.last_name == "Akhiezer" AND 
              t.last_name LIKE "S%" AND 
              year(msg.creation_date) == 2012
        ACCUM s.@cnt1 +=1 //execute this per match of the FROM pattern.
        POST-ACCUM s.@cnt2 += s.@cnt1 //execute once per s.
        POST-ACCUM t.@cnt2 +=1;//execute once per t

    PRINT R [R.first_name, R.last_name, R.@cnt1, R.@cnt2];
}
"""

## run interpret query
results = conn.runInterpretedQuery(query)

## print results
print(json.dumps(results, indent=1))

[
 {
  "R": [
   {
    "v_id": "28587302323577",
    "v_type": "Person",
    "attributes": {
     "R.first_name": "Viktor",
     "R.last_name": "Akhiezer",
     "R.@cnt1": 3,
     "R.@cnt2": 3
    }
   }
  ]
 }
]


### Example 1. Succinct Representation Of Multiple-hop Pattern

Find the 3rd superclass of the Tag class whose name is "TennisPlayer".

In [19]:
query = """
INTERPRET QUERY() FOR GRAPH ldbc_snb SYNTAX v2 {

    Tag_Class1 =
        SELECT t
        FROM Tag_Class:s-(Is_Subclass_Of>.Is_Subclass_Of>.Is_Subclass_Of>)-Tag_Class:t
        WHERE s.name == "TennisPlayer";

    PRINT Tag_Class1;
}
"""

## run interpret query
results = conn.runInterpretedQuery(query)

## print results
print(json.dumps(results, indent=1))

[
 {
  "Tag_Class1": [
   {
    "v_id": "239",
    "v_type": "Tag_Class",
    "attributes": {
     "id": 239,
     "name": "Agent",
     "url": "http://dbpedia.org/ontology/Agent"
    }
   }
  ]
 }
]


### Example 2. Disjunction in a Succinct Representation of a Multiple-hop Pattern

Find in which continents were the 3 most recent messages in Jan 2011 created.

In [20]:
query = """
INTERPRET QUERY() FOR GRAPH ldbc_snb SYNTAX v2 {

    SumAccum<String> @continent_name;

    acc_msg_continent =
        SELECT s
        FROM (Comment|Post):s-(Is_Located_In>.Is_Part_Of>)-Continent:t
        WHERE year(s.creation_date) == 2011 AND month(s.creation_date) == 1
        ACCUM s.@continent_name = t.name
        ORDER BY s.creation_date DESC
        LIMIT 3;

    PRINT acc_msg_continent;
}
"""

## run interpret query
results = conn.runInterpretedQuery(query)

## print results
print(json.dumps(results, indent=1))

[
 {
  "acc_msg_continent": [
   {
    "v_id": "824634837528",
    "v_type": "Post",
    "attributes": {
     "id": 824634837528,
     "image_file": "",
     "creation_date": "2011-01-31 23:58:03",
     "location_ip": "87.251.6.121",
     "browser_used": "Internet Explorer",
     "lang": "tk",
     "content": "About Adolf Hitler, iews. His writings and methods were often adapted to need and circumstance, although there were",
     "length": 115,
     "@continent_name": "Asia"
    }
   },
   {
    "v_id": "824636727408",
    "v_type": "Comment",
    "attributes": {
     "id": 824636727408,
     "creation_date": "2011-01-31 23:57:46",
     "location_ip": "31.2.225.17",
     "browser_used": "Firefox",
     "content": "thx",
     "length": 3,
     "@continent_name": "Europe"
    }
   },
   {
    "v_id": "824640012997",
    "v_type": "Comment",
    "attributes": {
     "id": 824640012997,
     "creation_date": "2011-01-31 23:54:28",
     "location_ip": "27.112.21.246",
     "browser_used": 

### Example 3. Multiple-hop Pattern With Accumulator Applied To All Matched Paths

Find Viktor Akhiezer’s favorite author of 2012 whose last name begins with the letter 'S'. 
Also find how many LIKES Viktor has given to the author’s post or comment.

In [21]:
query = """
INTERPRET QUERY() FOR GRAPH ldbc_snb SYNTAX v2 {
    SumAccum<int> @likes_cnt;

    favorite_authors =
        SELECT t
        FROM Person:s-(Likes>) -:msg - (Has_Creator>)-Person:t
        WHERE s.first_name == "Viktor" AND 
              s.last_name == "Akhiezer" AND 
              t.last_name LIKE "S%" AND 
              year(msg.creation_date) == 2012
        ACCUM t.@likes_cnt +=1;

    PRINT favorite_authors[favorite_authors.first_name, favorite_authors.last_name, favorite_authors.@likes_cnt];
}
"""

## run interpret query
results = conn.runInterpretedQuery(query)

## print results
print(json.dumps(results, indent=1))

[
 {
  "favorite_authors": [
   {
    "v_id": "15393162796846",
    "v_type": "Person",
    "attributes": {
     "favorite_authors.first_name": "Mario",
     "favorite_authors.last_name": "Santos",
     "favorite_authors.@likes_cnt": 1
    }
   },
   {
    "v_id": "2199023260091",
    "v_type": "Person",
    "attributes": {
     "favorite_authors.first_name": "Janne",
     "favorite_authors.last_name": "Seppala",
     "favorite_authors.@likes_cnt": 1
    }
   },
   {
    "v_id": "8796093025410",
    "v_type": "Person",
    "attributes": {
     "favorite_authors.first_name": "Priyanka",
     "favorite_authors.last_name": "Singh",
     "favorite_authors.@likes_cnt": 1
    }
   }
  ]
 }
]


## Multi-Block Queries
###  Example 1: Find Viktor Akhiezer’s liked messages whose authors' last names begin with S. Find these authors' alumni count.

In [22]:
query = """
INTERPRET QUERY() FOR GRAPH ldbc_snb SYNTAX v2 {
    SumAccum<int> @@cnt;
    // a computed vertex set F is used to constrain the second pattern.
    F = SELECT t
        FROM  :s -(Likes>)- :msg -(Has_Creator>)- :t
        WHERE s.first_name == "Viktor" AND 
              s.last_name == "Akhiezer" AND 
              t.last_name LIKE "S%";
              
    Alumni = SELECT p
             FROM Person:p -(Study_At>) -:u - (<Study_At)- F:s
             WHERE s != p
             Per (p)
             POST-ACCUM @@cnt+=1;

    PRINT @@cnt;
}
"""

## run interpret query
results = conn.runInterpretedQuery(query)


## print results
print(json.dumps(results, indent=1))

[
 {
  "@@cnt": 223
 }
]


###  Example 2: Find Viktor Akhiezer’s liked posts' authors A, and his liked comments' authors B. Count the universities that members from groups A and B studied at.


In [23]:
query = """
INTERPRET QUERY() FOR GRAPH ldbc_snb SYNTAX v2 {
    SumAccum<int> @@cnt;
// A and B are used to constrain the third pattern.
    A = SELECT t
        FROM :s -(Likes>:e1)- Post:msg -(Has_Creator>)- :t
        WHERE s.first_name == "Viktor" AND s.last_name == "Akhiezer" ;


    B = SELECT t
        FROM :s -(Likes>:e1)- Comment:msg -(Has_Creator>)- :t
        WHERE s.first_name == "Viktor" AND s.last_name == "Akhiezer" ;

    Univ = SELECT u
           FROM A:p -(Study_At>) -:u - (<Study_At)- B:s
           WHERE s != p
           Per (u)
           POST-ACCUM @@cnt+=1;

    PRINT @@cnt;
}
"""

## run interpret query
results = conn.runInterpretedQuery(query)

## print results
print(json.dumps(results, indent=1))

[
 {
  "@@cnt": 4
 }
]


###  Example 3. Find Viktor Akhiezer’s liked posts' authors A. See how many pairs of persons exist in A such that one person likes a message authored by another person.

In [24]:
query = """
INTERPRET QUERY() FOR GRAPH ldbc_snb SYNTAX v2 {
// a computed vertex set A is used twice in the second pattern.
    SumAccum<int> @@cnt;

    A = SELECT t
        FROM :s -(Likes>:e1)- Post:msg -(Has_Creator>)- :t
        WHERE s.first_name == "Viktor" AND s.last_name == "Akhiezer" ;

    A = SELECT p
        FROM A:p -(Likes>) -:msg - (Has_Creator>) - A:p2
        WHERE p2 != p
        Per (p, p2)
        ACCUM @@cnt +=1;

  PRINT @@cnt;
} 
"""

## run interpret query
results = conn.runInterpretedQuery(query)

## print results
print(json.dumps(results, indent=1))

[
 {
  "@@cnt": 8341
 }
]


###  Example 4. Find how many messages are created and liked by the same person whose first name begins with the letter T.

In [25]:
query = """
INTERPRET QUERY() FOR GRAPH ldbc_snb SYNTAX v2 {
// the same alias is used twice in a pattern
    SumAccum<int> @@cnt;

    A = SELECT msg
        FROM :s -(Likes>:e1)- :msg -(Has_Creator>)- :s
        WHERE s.first_name LIKE "T%"
        PER (msg)
        ACCUM @@cnt +=1;

  PRINT @@cnt;
}
"""

## run interpret query
results = conn.runInterpretedQuery(query)

## print results
print(json.dumps(results, indent=1))

[
 {
  "@@cnt": 207
 }
]


To further verify, we picked one message from the above query result. see if there exists a person who likes her own message.

In [26]:
query = """
INTERPRET QUERY() FOR GRAPH ldbc_snb SYNTAX v2 {
    R = SELECT s
        FROM :msg -(Has_Creator>)- :s
        WHERE msg.id == 1374390714042;

    T = SELECT s
        FROM R:s -(Likes>)- :msg
        WHERE msg.id == 1374390714042;

    PRINT R;
    PRINT T;
}
"""

## run installed query
results = conn.runInterpretedQuery(query)

## print results
print(json.dumps(results, indent=1))

[
 {
  "R": [
   {
    "v_id": "13194139533433",
    "v_type": "Person",
    "attributes": {
     "id": 13194139533433,
     "first_name": "Taras",
     "last_name": "Kofler",
     "gender": "female",
     "birthday": "1985-11-26 00:00:00",
     "creation_date": "2011-01-29 01:14:27",
     "location_ip": "31.131.28.133",
     "browser_used": "Internet Explorer",
     "speaks": [
      "uk",
      "ro",
      "en"
     ],
     "email": [
      "Taras13194139533433@gmail.com",
      "Taras13194139533433@yahoo.com"
     ]
    }
   }
  ]
 },
 {
  "T": [
   {
    "v_id": "13194139533433",
    "v_type": "Person",
    "attributes": {
     "id": 13194139533433,
     "first_name": "Taras",
     "last_name": "Kofler",
     "gender": "female",
     "birthday": "1985-11-26 00:00:00",
     "creation_date": "2011-01-29 01:14:27",
     "location_ip": "31.131.28.133",
     "browser_used": "Internet Explorer",
     "speaks": [
      "uk",
      "ro",
      "en"
     ],
     "email": [
      "Taras13194

## Example - A recommender

We have demonstrated the basic pattern match syntax. You should fully understand the basics by this point. In this section, we show two end-to-end solutions using the pattern match syntax.
In this example, we want to recommend some messages (comments or posts) to the person Viktor Akhiezer.

How do we do this?

One way is to find others who like the same messages Viktor likes, then recommend the messages that Others like but Viktor has not seen. The pattern can be sketched out as follows:

Viktor - (Likes>) - Message - (<Likes) - Others

Others - (Likes>) - NewMessage

Recommend NewMessage to Viktor

However, this is too granular. We are overfitting the message-level data with a collaborative filtering algorithm.

Intutively, two persons are similar to each other when their "liked" messages fall into the same category - here represented by the set of tags attached to each message.

As a result, one way to avoid overfitting is to go one level upward. Instead of looking at common messages, we look at their tags. We consider Person A and Person B similar if they like messages that belong to the same tag. This scheme fixes the overfitting problem. In pattern match vocabulary, we have

Viktor - (Likes>) - Message - (Has>) - Tag - (<Has) - Message - (<Likes) - Others

Others - (Likes>) - NewMessage

Recommend NewMessage to Viktor

This time, we create the query first and interpret the query by calling the query name with parameters.
If we are satisfied with this query, we can use INSTALL QUERY queryName to install the query, increasing performance.

In [27]:
query = """
USE GRAPH ldbc_snb

CREATE QUERY recommend_message (STRING fn, STRING ln) SYNTAX v2{

  SumAccum<int> @tag_in_common;
  SumAccum<float> @similarity_score;
  SumAccum<float> @rank;
  OrAccum @Liked = false;

   // 1. mark messages liked by Viktor
   // 2. calculate log similarity score for each persons share the same
   //   interests at Tag level.
    Others =
       SELECT p
       FROM Person:s-(Likes>)-:msg - (Has_Tag>.<Has_Tag.<Likes)- :p
       WHERE s.first_name == fn AND s.last_name == ln
       ACCUM msg.@Liked = true, p.@tag_in_common +=1
       POST-ACCUM p.@similarity_score = log (1 + p.@tag_in_common);

    // recommend new messages to Viktor that have not been liked by him.
    recommended_message =
             SELECT msg
             FROM Others:o-(Likes>) - :msg
             WHERE  msg.@Liked == false
             ACCUM msg.@rank +=o.@similarity_score
             ORDER BY msg.@rank DESC
             LIMIT 2;

  PRINT recommended_message[recommended_message.content, recommended_message.@rank];
}

// install query
INSTALL QUERY recommend_message

"""

## create and install query
createAndInstall = conn.gsql(query)

In [28]:
params1 = {
    "fn": "Viktor",
    "ln": "Akhiezer"
}

## run installed query
results = conn.runInstalledQuery("recommend_message", params1)

## print results
print(json.dumps(results, indent=1))

[
 {
  "recommended_message": [
   {
    "v_id": "549760294602",
    "v_type": "Post",
    "attributes": {
     "recommended_message.content": "About Indira Gandhi, Gandhi established closer relatAbout Mick Jagger, eer of the band. In 1989, he waAbout Ho Chi Minh, ce Unit and ECA International, About Ottoman Empire,  After t",
     "recommended_message.@rank": 4855.49561
    }
   },
   {
    "v_id": "549760292109",
    "v_type": "Post",
    "attributes": {
     "recommended_message.content": "About Ho Chi Minh, nam, as an anti-communist state, fought against the communisAbout Shiny Happy People, sale in the U.",
     "recommended_message.@rank": 4828.72607
    }
   }
  ]
 }
]


In [29]:
params2 = {
    "fn": "Adriaan",
    "ln": "Jong"
}

## run installed query
results = conn.runInstalledQuery("recommend_message", params2)

## print results
print(json.dumps(results, indent=1))

[
 {
  "recommended_message": [
   {
    "v_id": "549760294602",
    "v_type": "Post",
    "attributes": {
     "recommended_message.content": "About Indira Gandhi, Gandhi established closer relatAbout Mick Jagger, eer of the band. In 1989, he waAbout Ho Chi Minh, ce Unit and ECA International, About Ottoman Empire,  After t",
     "recommended_message.@rank": 6213.12891
    }
   },
   {
    "v_id": "549760292109",
    "v_type": "Post",
    "attributes": {
     "recommended_message.content": "About Ho Chi Minh, nam, as an anti-communist state, fought against the communisAbout Shiny Happy People, sale in the U.",
     "recommended_message.@rank": 6179.06299
    }
   }
  ]
 }
]


### NOTE

Increate the query timeout threshold. The query won't finish because it exceeded the default query timeout threshold (60 seconds). Please check GSE log for license expiration and 
RESTPP/GPE log with request id (16842754.RESTPP_1_1.1662360937266.N) for details. Try increase RESTPP.Factory.DefaultQueryTimeoutSec or add header GSQL-TIMEOUT to override default system timeout.