### Write data files to disk

In [1]:
%%writefile person.csv
name,gender,age,state
Tom,male,40,ca
Dan,male,34,ny
Jenny,female,25,tx
Kevin,male,28,az
Amily,female,22,ca
Nancy,female,20,ky
Jack,male,26,fl

Overwriting person.csv


In [2]:
%%writefile friendship.csv
person1,person2,date
Tom,Dan,2017-06-03
Tom,Jenny,2015-01-01
Dan,Jenny,2016-08-03
Jenny,Amily,2015-06-08
Dan,Nancy,2016-01-03
Nancy,Jack,2017-03-02
Dan,Kevin,2015-12-30

Overwriting friendship.csv


### Define a function to send and process the output of GSQL queries

In [3]:
import subprocess, json

def gsql(query, options='-g social'):
    
    comp = subprocess.run(['gsql', options, query], text=True, capture_output=True)
    
    try:
        json_object = json.loads(comp.stdout)
    except:
        print(comp.stdout.replace('\\n','\n'))
        return
    else:
        return json_object

### Clear the graph

In [4]:
gsql('drop all', options='')

Dropping all, about 1 minute ...
Abort all active loading jobs
Try to abort all loading jobs on graph social, it may take a while ...
[ABORT_SUCCESS] No active Loading Job to abort.

Shutdown restpp gse gpe ...
Graph store /home/tigergraph/tigergraph/gstore/0/ has been cleared!
Everything is dropped.



### Create the graph schema

In [5]:
gsql('''
create vertex person (primary_id name string, name string, age int, 
                      gender string, state string)
                      
create undirected edge friendship (from person, to person, 
                                   connect_day datetime)

create graph social (person, friendship)
''', options='')

The vertex type person is created.
The edge type friendship is created.

Restarting gse gpe restpp ...

Finish restarting services in 25.633 seconds!
The graph social is created.



### Create data loading job and run it

In [6]:
gsql('''
create loading job load_social for graph social {
    define filename file1="person.csv";
    define filename file2="friendship.csv";
    
    load file1 to vertex person values ($"name", $"name", $"age", $"gender", $"state") 
       using header="true", separator=",";
    
    load file2 to edge friendship values ($"person1", $"person2", $"date") 
       using header="true", separator=",";   
}

run loading job load_social
''')

The job load_social is created.
[Tip: Use "CTRL + C" to stop displaying the loading status update, then use "SHOW LOADING STATUS jobid" to track the loading progress again]
[Tip: Manage loading jobs with "ABORT/RESUME LOADING JOB jobid"]
Starting the following job, i.e.
  JobName: load_social, jobid: social.load_social.file.m1.1592354058496
  Loading log: '/home/tigergraph/tigergraph/logs/restpp/restpp_loader_logs/social/social.load_social.file.m1.1592354058496.log'

Job "social.load_social.file.m1.1592354058496" loading status
[RUNNING] m1 ( Finished: 0 / Total: 2 )
[2A[2KJob "social.load_social.file.m1.1592354058496" loading status
[2K[RUNNING] m1 ( Finished: 2 / Total: 2 )
  [LOADED]
  +---------------------------------------------------------------------------+
  |                       FILENAME |   LOADED LINES |   AVG SPEED |   DURATION|
  |    /home/tigergraph/person.csv |              8 |       7 l/s |     1.00 s|
  |/home/tigergraph/friendship.csv |              8 |       7

### Select queries return a native Python structure

In [7]:
q = gsql('select count(*) from person')
q

[{'count': 7, 'v_type': 'person'}]

In [8]:
q[0]['count']

7

### The `from` argument can be a pattern 

In [9]:
gsql('select count() from person-(friendship)-person')

[{'count': 7, 'e_type': 'friendship'}]

### The `where` arument is a filter on the `from` pattern

In [10]:
gsql('select * from person where primary_id=="Tom"')

[{'v_id': 'Tom',
  'attributes': {'gender': 'male', 'name': 'Tom', 'state': 'ca', 'age': 40},
  'v_type': 'person'}]

In [11]:
q = gsql('select * from person where gender=="female"')
q 

[{'v_id': 'Nancy',
  'attributes': {'gender': 'female',
   'name': 'Nancy',
   'state': 'ky',
   'age': 20},
  'v_type': 'person'},
 {'v_id': 'Jenny',
  'attributes': {'gender': 'female',
   'name': 'Jenny',
   'state': 'tx',
   'age': 25},
  'v_type': 'person'},
 {'v_id': 'Amily',
  'attributes': {'gender': 'female',
   'name': 'Amily',
   'state': 'ca',
   'age': 22},
  'v_type': 'person'}]

### Use Python's list comprehension to acess results

In [12]:
[v['attributes']['age'] for v in q]

[20, 25, 22]

### Transform into a Pandas DataFrame

In [13]:
import pandas as pd

pd.DataFrame([v['attributes'] for v in q])

Unnamed: 0,gender,name,state,age
0,female,Nancy,ky,20
1,female,Jenny,tx,25
2,female,Amily,ca,22


### Queries can saved and compiled

In [14]:
gsql('''
create query hello(vertex<person> p) for graph social{
    
    start = {p};
    tgt = select t from start:s-(friendship:e)-person:t ;
    print tgt;
}

install query hello
''')

The query hello has been added!
Start installing queries, about 1 minute ...
hello query: curl -X GET 'http://127.0.0.1:9000/query/social/hello?p=VALUE'. Add -H "Authorization: Bearer TOKEN" if authentication is enabled.


[                                                                 ] 0% (0/1)   
[                                                                 ] 0% (0/1)   



In [15]:
q = gsql('run query hello("Tom")')
q

{'error': False,
 'message': '',
 'version': {'schema': 0, 'edition': 'developer', 'api': 'v2'},
 'results': [{'tgt': [{'v_id': 'Dan',
     'attributes': {'gender': 'male', 'name': 'Dan', 'state': 'ny', 'age': 34},
     'v_type': 'person'},
    {'v_id': 'Jenny',
     'attributes': {'gender': 'female',
      'name': 'Jenny',
      'state': 'tx',
      'age': 25},
     'v_type': 'person'}]}]}

In [16]:
[v['attributes']['age'] for v in q['results'][0]['tgt']]

[34, 25]

### Accums store information while transversing the graph

In [17]:
gsql('''
create query hello2(vertex<person> p) for graph social {
    
    OrAccum @visited = false;
    AvgAccum @@aveAge;
    
    start = {p};
    
    firstHop = select t from start:s-(friendship:e)-person:t
               accum t.@visited += true, s.@visited += true;
    
    secondHop = select t from firstHop:s-(friendship:e)-person:t
                where t.@visited == false
                post_accum @@aveAge += t.age;
    
    print secondHop;
    print @@aveAge;
            
}

install query hello2
''')

The query hello2 has been added!
Start installing queries, about 1 minute ...
hello2 query: curl -X GET 'http://127.0.0.1:9000/query/social/hello2?p=VALUE'. Add -H "Authorization: Bearer TOKEN" if authentication is enabled.


[                                                                 ] 0% (0/1)   
[                                                                 ] 0% (0/1)   



In [18]:
q = gsql('run query hello2("Tom")')
q

{'error': False,
 'message': '',
 'version': {'schema': 0, 'edition': 'developer', 'api': 'v2'},
 'results': [{'secondHop': [{'v_id': 'Amily',
     'attributes': {'gender': 'female',
      '@visited': False,
      'name': 'Amily',
      'state': 'ca',
      'age': 22},
     'v_type': 'person'},
    {'v_id': 'Kevin',
     'attributes': {'gender': 'male',
      '@visited': False,
      'name': 'Kevin',
      'state': 'az',
      'age': 28},
     'v_type': 'person'},
    {'v_id': 'Nancy',
     'attributes': {'gender': 'female',
      '@visited': False,
      'name': 'Nancy',
      'state': 'ky',
      'age': 20},
     'v_type': 'person'}]},
  {'@@aveAge': 23.33333}]}

In [19]:
q['results'][1]['@@aveAge']

23.33333

In [20]:
ages = [v['attributes']['age'] for v in q['results'][0]['secondHop']]
ages

[22, 28, 20]

In [21]:
sum(ages)/len(ages)

23.333333333333332