# Zymergen Append Tutorial

This tutorial assumes you have already run through the "Zymergen explore" tutorial

In this tutorial, we'll add information (by hand) to a sample graph.

Note that we use the terminology "append" rather than "change" the graph. This is because we never destroy old information on the graph and merely provide extra information about new entities/relations/values which can be viewing in the latest timeslice.

# 1. Start ZefDB and load a graph

In [None]:
from zefdb import *
from zefdb.zefops import *

In [None]:
g_orig = Graph("zymergen-scenario1")
g = clone(g_orig)

Above, we have cloned the graph. This is only for the purposes of this tutorial, so we can mess it up without changing the original.

If you would like to save your changes and return to them, then you should sync your graph to zefhub and tag it with a name. The following commands will only succeed if no existing graph is tagged.

In [None]:
# Come up with your own tag here, e.g. "my-secret-tag"
tag_name = ..
sync(g)
tag(g, tag_name)
# In the future, you can then load this graph with Graph("...") - by default only you will have access to view/discover/append to this graph.

# 2. Making basic entities

Graph modifications can be divided into three operations:
1. Instantiation of an entity/relation/atomic entity
2. Assignment of a value to an atomic entity.
3. Termination of an entity/relation/atomic entity

There are no other ways to affect the graph data.

## Instantiation
A simple example:

In [None]:
# Entities are given with a single ET
payload = instantiate(ET.Payload, g)

# Atomic entities are the same, but with an AET
id = instantiate(AET.String, g)

# Relations require two ZefRefs (the source/target), sandwiched with an RT
rel = instantiate(payload, RT.ID, id, g)

In [None]:
print(payload)
print(id)
print(rel)

We have instantiated a simple entity of type `ET.Payload` and connected it to a string atomic entity (which currently contains no value) via a relation `RT.ID`. Each instantiation returned a `ZefRef`, which is a reference to the entity/relation and its frame of reference.

The first thing you might like to do is check the info of the payload:

In [None]:
payload | info

There seems to be a problem! This payload should be connected to a string, but it currently looks like it is connected to nothing. The issue is that we are viewing the payload in an old frame of reference. The timeslice of the `ZefRef` is `2` (which can be found from the `Seen from: 2` text, or the `ts=2` in the short summary).

Let's instead look at the payload from the latest frame of reference:

In [None]:
payload | now | info

Now things look better, that we are viewing the `ZefRef` in timeslice 4. This is very common mistake to make while viewing/testing Zef at the REPL. However, in functions and other codes this is avoided by the use of common frames of reference.

> #### Note:
> The way that ZefDB requires a explicit advancing of the reference frame (e.g. using `now`) means that it is always "safe" to explore a graph from a particular ZefRef. `payload | info` will never change, even if someone else added to this graph, because `payload` is fixed to a reference frame. In this way, it allows functions acting on `ZefRef`s to be pure, i.e. they will always produce the same output for the same input. On the other hand, `payload | now` is "unsafe", as calling this twice at different points in time can result in different behaviour.

Really, we shouldn't have created these objects in separate timeslices in the first place. Let's create a new payload with it's ID in one go:

In [None]:
I = lambda *args: instantiate(*args, g)

with Transaction(g):
    payload2 = I(ET.Payload)
    I(payload2, RT.ID, I(AET.String))

Here we have also defined a shorthand `I()` to more compactly represent the process. In the above, only one timeslice (or rather, one transaction) is created. Hence the info statement will produce the information we expect straight away:

In [None]:
payload2 | info

Transactions are important. They are always required, and will be created automatically if you do not explicitly specify them, but this will typically result in more transactions that you want.

The `with Transaction` context manager can be nested, and inner `with Transactions` will be swallowed by the outermost `with Transaction`s, resulting in only one transaction being created. So when in doubt, wrap everything in a Transaction.

> #### Note:
> In the future, transactions will also allow for "roll-back" if an unexpected error is encountered, before the data is added to the graph. At the moment this hasn't been implemented.

## Value assignment

Entities and relations are useful to express relationships but typically we need some primitive data too. This can be given in a simple value assignment by using the `<=` operator:

In [None]:
id = payload2 >> RT.ID
id <= "second payload"
payload2_after_first_assignment = payload2 | now

In [None]:
payload2_after_first_assignment | info

A value assignment also creates a transaction. Hence, we could update the id and view it within both timeslices:

In [None]:
id <= "second payload updated"
payload2_after_second_assignment = payload2 | now

In [None]:
payload2_after_first_assignment >> RT.ID | value

In [None]:
payload2_after_second_assignment >> RT.ID | value

We can even look at the atomic entity's value before it was assigned. In this case it will be `None`

In [None]:
payload2 >> RT.ID | value is None

> ### Note:
> Traversing a graph from a particular frame of reference, will keep you in that frame of reference. That is, `payload >> RT.ID` will return a `ZefRef` of the same timeslice as `payload`. It is purposely difficult to "escape" a timeslice without explicit commands, although `instantiate` is one key counterexample.

Even though these two `ZefRef`s to the payload refer to the same entity, they are within different timeslices. Hence they cannot be compared to one another. If you attempt to do so, ZefDB will throw an error warning you about this: 

In [None]:
payload2_after_first_assignment == payload2_after_second_assignment

If you wish to compare the "identity" of an entity, you can do this via a `UZefRef` (more on this later) or via the index/uid of an entity:

In [None]:
print(payload2_after_first_assignment | to_uzefref == payload2_after_second_assignment | to_uzefref)
print(index(payload2_after_first_assignment), index(payload2_after_second_assignment))
print(uid(payload2_after_first_assignment), uid(payload2_after_second_assignment))

> ### Note:
> It doesn't matter from what frame of reference you update an atomic entity's value, it will always become the latest value. So `ae <= 3 ; ae <= 4` is equivalent to `ae | now <= 3 ; ae | now <= 4`

## Convenience functions

Now that we have seen how to instantiate and value assign, there is little more needed to generate an entire graph filled with data. However, there are one or two convenience functions to help with this.

`attach` is a zefop that attaches "fields" to an entity/relation:

In [None]:
with Transaction(g):
    payload3 = instantiate(ET.Payload, g) | attach[RT.ID, "third payload"]

In [None]:
payload3 | info

This is roughly equivalent to the code:

In [None]:
with Transaction(g):
    payload3 = instantiate(ET.Payload, g)
    temp = instantiate(AET.String, g)
    temp <= "third payload"
    instantiate(payload3, RT.ID, temp, g)

`attach` returns the original entity itself, allowing for chaining, and can be used to quickly build up tree-like structures. Graph structures with loops still require several separate statements.

You can either call `attach` consequetively multiple times, or provide a list of tuples to a single `attach` call. Both are demonstrated below:

In [None]:
with Transaction(g):
    sm = I(ET.Submodule) | attach[[
        (RT.Name, "atc-1"),
        (RT.HasType, I(ET.SubmoduleType) | attach[[
            (RT.Name, "atc")
        ]]),
        (RT.Capacity, 1)
    ]]
    
    loc = (I(ET.Location)
           | attach[RT.Within, sm]
           | attach[RT.Name, "atc-1-nest"])
    
    step = I(ET.ZymergenStep) | attach[[
        (RT.ZymergenUUID, "a-b-c-d-e"),
        (RT.Submodule, sm),
        (RT.Payload, I(ET.Payload) | attach[[
            (RT.ID, "123"),
            (RT.Location, loc)
        ]])
    ]]

In [None]:
loc == (loc >> RT.Within << RT.Submodule >> RT.Payload >> RT.Location)

Note: the above is more of an illustration, and is not how a graph would typically be built up.

## Termination

The final effect that can be made to the graph is to terminate a relation or entity. This is final: after termination, an entity cannot be reinstantiated, although a new entity with a new uid could be instantiated in its place:

In [None]:
terminate(sm)

Any entity termination also terminates any relations connected to it. Hence, the following code should throw an error:

In [None]:
(loc | now) >> RT.Within

Note that the terminated entity is still accessible in historical timeslices, and the variable `loc` (without `|now`) still points at an older timeslice where the submodule has not been terminated:

In [None]:
loc >> RT.Within

It is possible to ask if an entity is currently alive, without resorting to exception checking:

In [None]:
# The commented line should work in the future:
#sm | exists_at[now]
sm | to_uzefref | exists_at[g|now]

Similarly, we can ask if an entity existed in the past, by using the `tx` zefop to obtain the timeslice of a `ZefRef`:

In [None]:
sm | to_uzefref | exists_at[payload|tx]

In [None]:
sm | to_uzefref | exists_at[loc|tx]

> #### Note:
> A `ZefRef` is actually just a pair of `UZefRef`s: one to point to the raw entity/relation blob, and one to point to a transaction blob (that is, the timeslice). While most zefops should happily accept either a `ZefRef` or a `UZefRef`, there are several functions (for example `exists_at`) which have not yet been extended to allow a `ZefRef` argument. It is highly recommended to work with `ZefRef`s as much as possible unless the context (or missing implementation of a overloaded function) demands it.
>
> For example, `sm | to_uzefref | outs` will show all out relations from all timeslices attached to the `sm` entity, and some of these relations may not exist simultaneously in any timeslice. This is almost certainly not what is intended in user code. It will also show "low-level graph edges" which represent the supporting data to the "high-level graph" which is the user-facing Zef graph. Unless absolutely needed, avoid `UZefRef`.
>
> If you are curious about the low-level graph, try these commands: `list(sm | to_uzefref | ins)` and `(sm | to_uzefref) << BT.REL_ENT_INSTANCE`

# Importing configs

To put some of the above to the test, I have included an example of loading a `config.json` file from the toy problems.

This code is written in Julia, however the Zef parts of it are nearly identical. I have included comments to explain a few choices

In [None]:
function LoadEverything(path, g=Graph())
    # Lookup is a dictionary to keep track of names/Zymergen uids with ZefRefs
    lookup = Dict()
    cd(path) do
        Transaction(g) do ctx
            LoadConfig(g, lookup)
            LoadTransport(g, lookup)
            LoadInventory(g, lookup)
            LoadProtocols(g, lookup)

            # This is unusual and needs to be handled in a better way
            for loc in g | instances[now][ET.InternalLocation] | filter[z -> contains(z >> RT.NodeLabel | value.String, "storage")]
                loc | attach[RT.Capacity => 100]
            end
        end
    end

    g,lookup
end

using JSON.Parser: parsefile

function LoadConfig(g, lookup)
    # This is a convenience equivalent to I = lambda *args: instantiate(*args, g) in python
    I = instantiate(g)
    data = parsefile("config.json")
    @assert keys(data) ⊆ ["submodules", "storage_smts"]

    for (key,item) in data["submodules"]
        typ = item["type"]
        # If this is a new type of submodule we haven't seen before, add it in as an entity.
        if typ ∉ keys(lookup)
            lookup[typ] = I(ET.SubmoduleType) | attach[RT.Name => typ]
        end

        z = I(ET.Submodule) | attach[RT.Name => key,
                                     RT.HasType => lookup[typ]]
        
        # Sometimes capacity is included in the keys.
        if "capacity" ∈ keys(item)
            z | attach[RT.Capacity => item["capacity"]]
        end
        lookup[key] = z
    end
end

# Other LoadX functions below here...