# Simple Gremlin Queries
In this section, you would start with basic Gremlin queries.

## Setup

Before you start, ensure you have run notebook _01-Setup_ to create the dataset with which we'll be working.

In [None]:
%load_ext ipython_unittest
%run '../util/neptune.py'

In [None]:
g = neptune.graphTraversal()

## Graph Model

Here's the application graph data model:

<img src="https://s3.amazonaws.com/aws-neptune-customer-samples/neptune-sagemaker/images/imdb-data-model.jpg"/>



## Gremlin-Python

Throughout these exercises you'll be using [Gremlin-Python](http://tinkerpop.apache.org/docs/current/reference/#gremlin-python), which requires a few modifications to Gremlin:

 - In Python, `as`, `in`, `and`, `or`, `is`, `not`, `from`, and `global` are reserved words. In Gremlin-Python, simply add a `_` postfix to these words. For example, the `as()` Gremlin step is written `as_()`.

## Select a Vertex and its Properties

When selecting graph elements, bear the following in mind:

 - Neptune allows you to supply custom vertex and edge IDs. Use these if possible to lookup a vertex or edge by ID. See the _User Supplied IDs_ section in the [documentation](https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-gremlin-differences.html).
 - If you can't use an ID to lookup an element, use the _minimum number_ of label and property prediates necessary to uniquely identify the element or the set of elements you want to find.
 - When returning results, select only the properties necessary to satisfy your query, rather than returning vertices and edges in their entirety.

### 02.01

Find the vertex with ID 'tt0120338' and return its "title", "rating" and "year" properties as a map.

Consult the following documentation:
 - [`valueMap()`](http://tinkerpop.apache.org/docs/current/reference/#valuemap-step)
 - [`values()`](http://tinkerpop.apache.org/docs/current/reference/#_values_step)

In [None]:
results_02_01 = (g.
    #begin
    V('tt0120338').
    valueMap('title', 'rating', 'year').
    #end
    toList())

print(results_02_01)


### 02.02

Find the vertex representing movies which were released in the year 2000 and were rated greater than 7.5

Consult the following documentation:
 - [`has()`](http://tinkerpop.apache.org/docs/current/reference/#has-step)

In [None]:
results_02_02 = (g.
    #begin
    V().hasLabel('Movie').has('year', 2000).has('rating', gt(7.5)).
    values('title').
    #end
    toList())

for result in results_02_02:
    print(result);



## Following Edges

### 02.03

Find the movies directed by Christopher Nolan.

Consult the following documentation:

 - [Some simple graph traversal examples](http://kelvinlawrence.net/book/Gremlin-Graph-Guide.html#_some_simple_graph_traversal_examples) describes how you can use `out()`, `in()` and `both()` to traverse to neighbouring vertices.
 - [Examining the edge between two vertices](http://kelvinlawrence.net/book/Gremlin-Graph-Guide.html#exedge) describes how you can use `outE()`, `inE()` and `bothE()` to examine the edges themselves.

Note that `in` is a reserved word in Python – `in()` must therefore be written `in_()`.

In [None]:
results_02_03 = (g.
    #begin
    V().hasLabel('Artist').has('name','Christopher Nolan').
    in_('director').values('title').
    #end
    toList())

for result in results_02_03:
    print(result);



### 02.04

Find the person with ID 'person378' and count the number of people they follow.

In [None]:
%%time
results_02_04 = (g.
    #begin
    V('person378').
    out('follows').count().
    #end
    next())

print(results_02_04)


### 02.05

Find the Person with ID 'person378' and count the number of unique people (themselves _included_) who follow the same artists.

Consult the following documentation:
 - [`dedup()`](http://tinkerpop.apache.org/docs/current/reference/#dedup-step)

Remember, `in()` must be written `in_()`.

In [None]:
%%time
results_02_05 = (g.
    #begin
    V('person378').
    out('follows').in_('follows').
    dedup().count().
    #end
    next())

print(results_02_05)


## Accessing Previous Steps

In these exercises you'll use `as()` to label a step that you can refer back to in a subsequent step, and `aggregate()` to fill a collection that you can use in a future computation.

You'll also use predicates to test whether two objects are equal, or whether an object is included in a collection.

See the following documentation:

 - [`as()`](http://tinkerpop.apache.org/docs/current/reference/#as-step) and [`select()`](http://tinkerpop.apache.org/docs/current/reference/#select-step)
 - [`aggregate()`](http://tinkerpop.apache.org/docs/current/reference/#aggregate-step)
 - [Predicates](http://tinkerpop.apache.org/docs/current/reference/#a-note-on-predicates)
 
 Note that `as` is a reserved word in Python, so `as()` must be written `as_()`.

### 02.06

Find the Person with ID 'person378' and count the number of unique people (themselves _excluded_) who follow the same artists.

In [None]:
%%time
results_02_06 = (g.
    #begin
    V('person378').as_('person').
    out('follows').in_('follows').
    where(neq('person')). 
    dedup().count().
    #end
    next())

print(results_02_06)


### 02.07


Find the Person with ID 'person378' and number of unique persons that this person knows and also who live in the same area.

You can use the `as()` step to label the starting vertex, and the `select()` step later in the query to return to that labelled step to begin another portion of the traversal.

In [None]:
%%time
results_02_07 = (g.
    #begin
    V('person378').as_('person').
    out('knows').aggregate('friends').
    select('person').
    out('isLocatedIn').in_('isLocatedIn').
    where(neq('person')).
    where(within('friends')).
    dedup().count().
    #end
    next())

print(results_02_07)


## Create a Vertex with Properties

Neptune allows you to supply your own String IDs when creating vertices and edges. You should always try to supply your own IDs – we'll see why this is important in a highly-concurrent write scenario a little later on.

Moreover, you should try wherever possible to use _predictable_ IDs – that is, IDs that you can later on predict when wanting to query for a specific vertex or edge. If you don't supply your own ID, Neptune will create a String-based UUID for you – and you'll be hard pressed to predict the value of this ID when you later want to query for the element.

### 02.08

Create a new vertex representing <Add you name here>, with the ID 'person0'.

Note: `valueMap(True)` in the test below ensures we retrieve the value of the ID and label attached to the vertex.

In [None]:
# clear any existing vertex
g.V('person0').drop().toList()

# your code
(g.
    #begin
    addV('Person').
    property(id, 'person0').
    property('firstName', 'Lorem').
    property('lastName', 'Ipsom').
    #end
    toList())

# assert results
results_02_08 = (g.
    V('person0').valueMap(True).
    toList())

for result in results_02_08:
    print(result);



### 02.09

Create a new vertex representing Justin Bieber, with a birth year of 1899. Let Neptune assign an ID.

In [None]:
# clear any existing vertex
g.V().has('name', 'Justin Bieber').drop().toList()

# your code
(g.
    #begin
    addV('Artist').
    property('name', 'Justin Bieber').
    property('birthyear', 1899).
    property(id, 'artist0').
    #end
    toList())

# assert results
results_02_09 = (g.
    V().has('name', 'Justin Bieber').valueMap().
    toList())

print(results_02_09)


### 02.10

Actually, we made a mistake: Justin Bieber was born in 1994, not 1899. So we need to correct his birthday.

At this point, you'll need to know about the cardinality of properties. See the _Cardinality of Vertex Properties_ and _Updating a Vertex Property_ sections in the [Neptune Gremlin Implementation Differences](https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-gremlin-differences.html) documentation.

In [None]:
# your code
(g.
    #begin
    V().has('name', 'Justin Bieber').
    property(single, 'birthyear', 1994).
    #end
    toList())

# assert results
results_02_10 = (g.
    V().has('name', 'Justin Bieber').valueMap('birthyear').
    toList())

print(results_02_10)


## Create an Edge Between Two Vertices

To add an edge connecting two existing vertices, use the `as()` step to label the _target_ vertex, then select the _from_ vertex, and use `addE()` to add the edge:

```
g.
    V(... find 'to' vertex ...).as_('a'). // Label 'to' vertex
    V(... find 'from' vertex ...).
    addE('EDGE_LABEL').to('a')            // Use 'to' vertex by label
```

See the very last example in [Adding an airport (vertex) and a route (edge)](http://kelvinlawrence.net/book/Gremlin-Graph-Guide.html#_adding_an_airport_vertex_and_a_route_edge).

### 02.11

Create a new follows edge from person0 to Justin Bieber (artist0).

In [None]:
# Clear any existing edges
g.V('person0').outE('follows').as_('e').otherV().has(id, 'artist0').select('e').drop().toList()
assert (g.V('person0').out('follows').has(id, 'artist0').count().next()) == 0

(g.
    #begin
    V('artist0').as_('artist').
    V('person0').addE('follows').to('artist').
    #end
    next())

results_02_11 = (g.
    V('person0').out('follows').has(id, 'artist0').count().
    next())

print(results_02_11)


## Conditional Writes

Sometimes, you'll want upsert-like functionality, whereby you create a new vertex (or edge) only if it doesn't already exist: if it already exists, you use the existing instance. 

In Gremlin, you achieve this using a `fold()-coalesce()-unfold()` idiom. For a detailed explanation of this approach, see [Using _coalesce_ to only add a vertex if it does not exist](http://kelvinlawrence.net/book/Gremlin-Graph-Guide.html#coaladdv).

The `coalesce()` pattern allows you to determine whether an element already exists using any combination of ID, label and property predicates. However, _Neptune does not enforce any label and/or property uniqueness constraints_ – and this can have serious consequences in concurrent write scenarios.

If you are using the `coalesce()` pattern in concurrent write scenarios, but not using _predictable user-supplied IDs_, you can experience situations in which two writer processes (clients, threads, etc) attempt to create-or-return the same element at the same time. If both writers determine the element does not exist and they both attempt to create the element, you may end up with two elements, with the same labels and properties, but different IDs.

Neptune will, however, assert uniqueness for vertex IDs and edge IDs. No two vertices can have the same ID; no two edges can have the same ID (a vertex and an edge, however, can have the same ID).

The general guidance, therefore, is: 

When using `coalesce()`, always identify elements by their ID, rather than a property or label predicate. This is important in highly concurrent write scenarios: using a predictable ID for potentially new elements will ensure that only one writer wins should several writers attempt to create the same element simultaneously.

### 02.12

Create a vertex representing John Doe, with ID v98765. Note that your query will be executed twice.

In [None]:
# clear any existing vertex
g.V().has('firstName', 'John').has('lastName', 'Doe').drop().toList()

def addJohnDoe():
    (g.
         #begin
         V('person123456789').fold().
         coalesce(unfold(), addV('Person').
              property(id, 'person123456789').
              property('firstName','John').
              property('lastName', 'Doe')).
         #end
         next())
    
addJohnDoe()
addJohnDoe()
    
results_02_12 = (g.V().has('firstName', 'John').has('lastName', 'Doe').valueMap(True).toList())

for result in results_02_12:
    print(result);

