The idea of this tutorial is to show some of the structure of the graph used to
describe a Zymergen module and how a schedule is represented on the graph.

It aims to do this by natural exploration and minimizes required prior
knowledge. Images will also be neglected from this tutorial: even though they
would help understand this particular graph, we would like to show how only the
built-in tools of ZefDB should be sufficient to understand a graph structure.

# 1. Start ZefDB and view a graph

In [None]:
from zefdb import *
from zefdb.zefops import *

These are the standard imports of any zef session. To find all graphs with zymergen in their name:

In [None]:
zearch("zymergen")

We import one of these graphs and look at its basic information:

In [None]:
g = Graph("zymergen-scenario1")
g | info

> ### Note:
> The syntax `g|info` could also be written as `info(g)`. This type of function is known as a "zefop", which allows its first argument to be piped into it, allowing for operator chaining. Other zefops can include curried information, e.g. `list | filter[...]`

The output of the info command can be long, but for now, we are only interested in the top sections. There is some general summary of the graph, followed by a list of atomic entities (AETs), entities (ETs) and relations (RTs). In traditional graph terminology, graphs are made up of nodes and edges, in a Zef graph these are entities and relations. In addition an atomic entity is an entity that can have a value. Let's focus on the entities alone, which should look something like this:

```
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  Entities ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[3 total, 3 alive]             ET.AnyOf
[6 total, 6 alive]             ET.Submodule
[5 total, 5 alive]             ET.Handoff
[11 total, 11 alive]           ET.InternalLocation
[17 total, 17 alive]           ET.Recipe
[5 total, 5 alive]             ET.Payload
[5 total, 5 alive]             ET.ToBeImplemented
[5 total, 5 alive]             ET.ZymergenProtocol
[13 total, 13 alive]           ET.ZymergenStep
[4 total, 4 alive]             ET.SubLocation
```

Each string beginning with an `ET` represents an entity type. These are first-class citizens in zefdb. An entity type can be accessed via the `ET` object itself. We can assign these to variables, although it is not typical to do this:

In [None]:
sometype = ET.Payload
print(sometype)

Behind the scenes, an `ET.Payload` type is represented as a hidden integer (that is, it is "tokenized"), which is shared with ZefHub and is the same across all graphs. If you request a new type that doesn't exist, it will be silently registered on ZefHub. Two other tokenized types of this kind exist: `RT` for relations and `EN` for enums:

In [None]:
print(RT.HasType)
print(EN.MachineStatus.Ready)

Note that an EN enum has a "enum type" (in this case MachineStatus) and an "enum value" (in this case Ready). More on these later.

Let's get all of the `ET.Submodule`s in the
graph, and view more detail on the first one:

In [None]:
sms = g | instances[now][ET.Submodule]
sms | first | info

Two more zefops have been used here, `instances` and `first`, as well as a flag `now`.

> ### Note:
> The line `sms = g | instances[now][ET.Submodule]` in a different universe where ZefDB was written in a traditional programming style, this might have been a single function call `sms = g.get_all_instances(type=ET.Submodule, time=now)`. We mention this only to aid understanding the intent of zefops - in our universe, `get_all_instances` doesn't exist in ZefDB.

There are again several sections shown in the `info` output above. The first "section" is actually the title "Historical View" - the reason for the word "historical" will be made clear later. The next section is a general summary, followed by a list of all relations that are connected to this entity, and followed by a timeline of the changes that have occurred to the entity.

For now, we are interested in only the relations. Each relation states what it is connected to (the current entity is always on the left) and what is the type of the relation. For example, the first entry states that:
* There is a single RT.Name relation, which connects this entity (of type `ET.Submodule`) to an atomic entity of type `AET.String`.
* The `ET.Submodule` is the source of the relation and the `AET.String` is the target (observing the `--->` arrow direction)
* The relation UID is `0066b7bfbfaa1276a1b4835e9595967b`.
* The atomic entity UID is `90ff9867ce5d5ba4897cbdb6dd7e46b3`.
* The current value of the atomic entity is "atc-sm-1".

We can see that this entity is connected to one atomic entity string which is its name, with another outward relation to a `ET.SubmoduleType`. The other relations are incoming `RT.Within` relations.

> ### Note
> An atomic entity and an entity function have a similar purpose. The only differences are:
> 1. an atomic entity can have a value, that may change in time, and
> 2. an atomic entity's type is restricted to a set of primitive values, e.g. `AET.Int`. It cannot be `AET.MyCustomType` for example.

> ### Note:
> UIDs are helpful especially when errors occur. You can access a reference to an entity/relation by its UID using this syntax: `g["0066b7bfbfaa1276a1b4835e9595967b"]`.

# 2. Diving into particular entities

Let's get the detailed information for each submodule:

In [None]:
details = []
for sm in sms:
    details.append({"z": sm,
                    "name": sm >> RT.Name | value.String,
                    "type": sm >> RT.HasType >> RT.Name | value.String})
details

We used `>>` to traverse a relation to its target and `| value` to get the value of
the atomic entity we landed on. To interpret the output, we see that there are 6
submodules, 4 of type `"atc"` and each uniquely named. The item `'z'` is
the `ZefRef` that corresponds to that entity. `ZefRef`s are the standard type to
refer to all graph entities and relations, encoding what and when you 
have looked at an entity or relation.

> ### Note
> `value.String` does not strictly require the `.String` part here but it allows compilable languages (C++, Julia) to take advantage of inferring the type, hence it is good practice to include when possible.

Although we show the types of each submodule as strings, these are actually
the one entity (with a value of a string) on the graph themselves. The UID of
the `ET.SubmoduleType` entity indicates this:

In [None]:
sm_dict = {item['name']: item for item in details};

In [None]:
(sm_dict['atc-sm-1']['z'] >> RT.HasType) == (sm_dict['atc-sm-2']['z'] >> RT.HasType)

In [None]:
print(sm_dict['atc-sm-1']['z'] >> RT.HasType | uid)
print(sm_dict['atc-sm-2']['z'] >> RT.HasType | uid)
print(sm_dict['atc-sm-3']['z'] >> RT.HasType | uid)
print(sm_dict['atc-sm-4']['z'] >> RT.HasType | uid)

Using the zefop `outs` to get all outgoing relations, we can see that all of the submodule entities have a similiar structure, although the magnemotion submodule has an extra relation, capacity, in addition to its
name and type:

In [None]:
mm_out_edges = sm_dict['magnemotion-sm-1']['z'] | outs
[RT(out) for out in mm_out_edges]

In [None]:
sm_dict['magnemotion-sm-1']['z'] >> RT.Capacity | value

Whether a relation is always present or only optional can indicate intent of the graph
constructor. In this case, there is an implicit assumption that all submodules
have a capacity of one, and only if an explicit `RT.Capacity` is given, can that
capacity be larger. In this case, the extra information is redundant, as the capacity of the magnemotion is explicitly
set to 1 regardless, but in other scenarios it can be higher.

Try loading the graph "zymergen-scenario-3" instead and running some of the commands above.

In [None]:
# g = Graph("zymergen-scenario3")

When done, make sure you reload "zymergen-scenario1" again, and rerun all cells above this point, before continuing.

# 3. Exploring sideways

Previously, the properites of name and capacity could be considered to
"belong to" the submodule. This is commonly the case for atomic entities connected to an entity.

> #### Note:
> In a traditional database, these fields may be wrapped up together into one object, where each object has a definition of what fields it contains. In contrast, ZefDB takes the "flattening" approach where entities have no internal fields, and instead represent these with atomic entities. Although it is beyond this tutorial, we can still enforce the expected or required relations that an entity must have. This is done through the "delegate schema graph" and through "Zef hooks" which are scripts embedded into the graph data.

Let's now venture outwards to other objects in the
graph. From the original info statement given for the first submodule there were many
relations *incoming* to the submodule of type `RT.Within`, as well as an outgoing relation of type `RT.HasType`:

In [None]:
sms | first | info

```
    1x:     (z:ET.Submodule) -------------------------(RT.HasType)------------------------> (ET.SubmoduleType)
                (z) ----(5bea664061a964568946007a5629f309)---> (5d97d69c363c448bfc1346aa4af1fcd4)
                
    1x:     (z:ET.Submodule) <-------------------------(RT.Within)------------------------- (ET.Handoff)
                (z) <---(3372a9e40c5b39fe31042a20eedf0a6e)---- (755e6906f0b6acce3becaa8d852eebff)
    
    2x:     (z:ET.Submodule) <-------------------------(RT.Within)------------------------- (ET.InternalLocation)
                (z) <---(1dc0b78de4bc95dd11763c902dcbe298)---- (6447b19658d62c42b604be497b806ac7)
                (z) <---(2fcb19290641f2586744318ba53787c9)---- (da47ccc385741be560b7b78d9ae4ddf7)
```

The `RT.Within` relation looks like it describes locations, divided further into
two categories of `ET.Handoff` and `ET.InternalLocation`. (Tip: read the relation as
"A is Within this B", e.g. "ET.Handoff is RT.Within this ET.Submodule"). This is
referring to the Zymergen transport graph which we will get to later.

For now, let's look down the `RT.HasType` relation:

In [None]:
# The (...) are required for operator precedence. This restriction will be
# lifted in the future
smtype = (sms | first) >> RT.HasType
smtype | info

There are two types of incoming relations to this `ET.SubmoduleType` node, we have just traversed one of the `RT.HasType` relations, although there are many of these. Let's check the names of all of these submodules connected by this relation:

In [None]:
for sm in smtype << L[RT.HasType]:
    print(sm >> RT.Name | value)

Here the notation `L[...]` is used to obtain a list of all possible nodes at the
ends of a `RT.HasType` (the same command without `L[...]` would raise an
exception, as it is not clear which edge to traverse). Note also that we use `<<` instead of `>>` to traverse an *incoming* edge instead of
an *outgoing* edge. 

The `L[...]` returns a special kind of list, a `ZefRefs`. Where it would make sense, many zefops can act on both a `ZefRef` and on a `ZefRefs`. For example, we could obtain all names

In [None]:
names = smtype << L[RT.HasType] >> RT.Name
print([z | value for z in names])

# The following should work, but we need to fix up the "lifting" of the value zefop
#smtype << L[RT.HasType] >> RT.Name | value

The other interesting part of the `ET.SubmoduleType` entity are its `RT.Alias` relations to `ET.ZymergenProtocol`s which we will now explore.

# 4. Pincering the protocols

We can consider going straight to the `ET.ZymergenProtocol`s directly from the `ET.SubmoduleType` entity, but first let's see how many protocols there are in the graph:

In [None]:
protocols = g | instances[now][ET.ZymergenProtocol]
len(protocols)

Notice that there were also 5 `ET.ZymergenProtocol`s connected to the `ET.SubmoduleType`
node above. This suggests that these are the same sets. In fact, we can check
this more directly:

In [None]:
protocols | without[smtype << L[RT.Alias]]

The `without` zefop does a set difference, and returns a `ZefRefs`. The example above is actually an empty list (observe the `... of len=0 ...` part of the output) which shows that the `smtype` is connected to all protocols on the graph.

Let's explore the last protocol from the whole set on the graph:

In [None]:
protocols | last | info

This protocol obviously contains 1 alias and 2 steps. These steps are:

In [None]:
firststep,laststep = (protocols | last) << L[RT.Protocol]
firststep | info

Note: although `firststep` and `laststep` turn out to be appropriate names here,
the ordering of the list returned from the traversal does not have to be this way. In this case, we can
prove the naming is correct by checking the dependency given in the `RT.After` relation:

In [None]:
(laststep >> RT.After) == firststep

There is plenty to explore here. Try exploring, on your own, the `ET.Payload` at
the end of the `RT.Payload` relation using similar commands as above.

In [None]:
# Do some graph traversal here yourself

# Try to determine
# a) the id of the payload
# b) the current location of the payload
# c) the number of steps involving this payload.

After that, let's explore the `ET.Recipe` node.

In [None]:
firststep >> RT.Recipe | info

This shows the recipe is a thermocycle... but shouldn't there be more
information attached? There is, but it is information which is not specific to
the recipe alone, but rather applies to the combination of both recipe and step.
Hence, this information is stored on the relation which joins the recipe and step: 

In [None]:
(firststep > RT.Recipe) | info

Here `>` instead of `>>` was used, to land on the outgoing *relation* rather
than following it to its target. Note that the relations shown above have their
source as the `RT.Recipe` relation itself. Relations are not restricted
to only connect entities together.

A similar overload operator `<` exists to land on an *incoming* edge.

Another similiar case for information residing on an edge is the required
location for a payload in a paritcular step. Check this out yourself by
following a `ET.ZymergenStep` to an `ET.Payload` but stopping on the edge.

In [None]:
# Traverse the step to its relation RT.Payload and then find the required location.
# You will discover that this location is not a simple entity, but a representation of two parts:
# a) a "Within" to describe the submodule (which is actually an alias)
# b) a "InternalType" to address a location within that submodule.

# 5. Transport Graph
Let's look at all locations that are within submodules:

In [None]:
all_within = (g | instances[now][ET.Submodule]) << L[RT.Within]
print(all_within)

Notice that this returns a `ZefRefss`. A `ZefRefs` is a list of `ZefRef` and a
`ZefRefss` is a list of `ZefRefs` (i.e. a nested list of lists of ZefRef types).

A `ZefRefss` was returned here, because `instances` gave back a list, and the `<< L[RT.Within]` traversal was performed on every element of that list.

In the current case, we are not interested in the grouping of these locations,
so we can `flatten` them out (i.e. concatenate every list inside the `ZefRefss` into one big `ZefRefs`): 

In [None]:
locations = all_within | flatten
print(length(locations))
set(ET(z) for z in locations)

Locations are divided into two ET types. Let's also divide these into locations that
belong to one submodule only or multiple submodules:

In [None]:
one_sm_locs = locations | filter[lambda z: len(z >> L[RT.Within]) == 1]
set(ET(z) for z in one_sm_locs)

In [None]:
multi_sm_locs = locations | filter[lambda z: len(z >> L[RT.Within]) >= 2]
set(ET(z) for z in multi_sm_locs)

As could have been guessed, `ET.InternalLocation`s are within only one submodule
whereas `ET.Handoff`s are at the boundary of 2 submodules.

Looking at a particular location:

In [None]:
loc = one_sm_locs[0]
loc | info

there is some local information attached to the location (IsBuffer, IsHandoff,
NodeLabel) and a bunch of connections to other locations, given by
`RT.CanMoveTo` relations. Following one of these (to the edge, not the target): 

In [None]:
canmoveto = (loc > L[RT.CanMoveTo]) | first
canmoveto | info

we find an attached recipe. Describing this:

In [None]:
canmoveto >> RT.Recipe | info
(canmoveto > RT.Recipe) | info

> ### Note:
> Unfortunately brackets are often required (currently) to order zef operations, e.g. `canmoveto > RT.Recipe | info` would cause info to operate on RT.Recipe first. In the future this will be handled using lazily evaluated zefops.

# 6. Further exploration

In this tutorial, we have been limited to only viewing the latest timeslice of the graph. There are several different places to go next:
* Appending to the graph ("mutating" the graph) - this is described in the `Zymergen write.ipynb` tutorial
* Viewing historical information - this is mentioned in the `Zymergen schedules` HOWTO short guides.
* Higher level operations (filter, sort, ...)
* Adding scripts to the graph that are triggered on particular actions.
* Subscribing to events on the graph.

# 7. List of ZefOps

Here is a list of zefops/functions/operators that have been used in the tutorial:
#### General
* zearch
* Graph
* info
* instances
* now
* value
* RT

#### List related
* first
* last
* without
* flatten
* filter

#### Traversal
* \>>
* \>
* <<
* <
* L[...]

#### Further zefops to be aware of/whet your appetite:
* O[...] - optional (0 or 1) traversal
* only - the only item in a list
* sort[...] - list sorting
* ET - obtain ET type of entity
* AET - obtain AET type of entity
* EN - enums (e.g. `EN.MachineStatus.Ready`)
* subscribe - listen to events on the graph
* <= - can be used for value assignment and subtype checking too (e.g. `zefref <= ET.Submodule`)
* add_right - user rights management
* sync - store graph on zefhub
* tag - tag your graph on zefhub

# TODO
### External graph tools

* How to get an adjacency matrix to describe the transport graph.

### Challenge questions

* Store answers on the graph as "hidden" answers.