# indexd Demo

## About indexd

Indexd, in a nutshell, is a microservice which maintains URLs as pointers to stored data files. Indexd adds a layer of abstraction over stored data files: the data can move between or live in multiple locations, while the unique identifier for each file, kept in indexd, allows us to obtain the URLs (and some miscellaneous metadata) for the same stored data. Additionally, indexd tracks revisions of the same data file.

## Python Demo

In [223]:
import json
from urllib.parse import urljoin

import requests

### Setup

To start, run indexd on `localhost:8080`. Probably the easiest way is with a docker container:
```bash
# Start from indexd directory
# Build the docker image if you don't have it yet
docker build -t indexd .
# Now run the image, and set it to forward to port 8080.
docker run -d --name indexd -p 8080:80 indexd
```

In order to use endpoints requiring admin authorization, set up a username and password in the indexd docker image:
```bash
docker exec indexd python /indexd/bin/index_admin.py create --username test --password test
```

(Here we set up a bit of code just to make the API calls more concise and readable.)

In [224]:
base = 'http://localhost:8080'

# NOTE
# Fill in the auth with whatever username/password you set before.
request_auth = requests.auth.HTTPBasicAuth('test', 'test')

indexd = lambda path: urljoin(base, path)

def print_response(response):
    print(response)
    try:
        print(json.dumps(response.json(), indent=4))
    except ValueError:
        print(response.text)

Just for the purposes of re-using this demo with the same indexd instance, we'll clear out all the records from indexd. (For the sake of the tutorial, this shouldn't make sense yet—so ignore this, and move along!)

In [225]:
def wipe_indexd():
    """
    Delete all records from indexd.
    """
    records = requests.get(indexd('/index/')).json()['records']
    for record in records:
        path = indexd('/index/{}'.format(record['did']))
        params = {'rev': record['rev']}
        response = requests.delete(path, auth=request_auth, params=params)

In [226]:
# WARNING: don't do this if you want to keep your existing records!
wipe_indexd()

Let's check that indexd is alive, using the status endpoint.

In [227]:
print_response(requests.get(indexd('/_status')))

<Response [200]>
Healthy


So far so good. Let's get the list of records stored in indexd right now, by sending a `GET` to `/index/`.

In [228]:
print_response(requests.get(indexd('/index/')))

<Response [200]>
{
    "version": null,
    "metadata": {},
    "urls": [],
    "start": null,
    "size": null,
    "limit": 100,
    "records": [],
    "ids": null,
    "acl": [],
    "hashes": null,
    "file_name": null
}


There's no records registered yet...let's create one.

### Creating a Record

Just below is some example data for a record. We `POST` this to the `/index/` endpoint on indexd to register the record.

The minimum information necessary to supply to indexd is the file size, the hash (in any of several common formats), a list of URLs pointing to where the data file is stored (which can be left empty),
and the form TODO. For this example we'll also give our imaginary file a name, and add `'*'` in the ACL list.

In [229]:
data = {
    'size': 8,
    'hashes': {'md5': 'e561f9248d7563d15dd93457b02ebbb6'},
    'urls': [],
    'form': 'object',
    'file_name': 'example_file',
    'acl': ['*'],
}
response = requests.post(indexd('/index/'), json=data, auth=request_auth)
print_response(response)
# Save this stuff, we'll need to use it later.
v_0_did = response.json()['did']
v_0_baseid = response.json()['baseid']
v_0_rev = response.json()['rev']

<Response [200]>
{
    "rev": "be8c395f",
    "did": "testprefix:760c371d-1efa-44e0-8a0e-83b797e738dc",
    "baseid": "cef3e517-a7e9-4381-9687-0ba11fc177b1"
}


Success!

### Retrieving Records

Now the list of records returned from indexd should have our new entry—let's check, again using a `GET` to the `/index/` endpoint.

In [230]:
print_response(requests.get(indexd('/index/')))

<Response [200]>
{
    "version": null,
    "metadata": {},
    "urls": [],
    "start": null,
    "size": null,
    "limit": 100,
    "records": [
        {
            "version": null,
            "did": "testprefix:760c371d-1efa-44e0-8a0e-83b797e738dc",
            "urls_metadata": {},
            "urls": [],
            "baseid": "cef3e517-a7e9-4381-9687-0ba11fc177b1",
            "created_date": "2018-08-07T22:19:03.068052",
            "size": 8,
            "acl": [
                "*"
            ],
            "metadata": {},
            "hashes": {
                "md5": "e561f9248d7563d15dd93457b02ebbb6"
            },
            "rev": "be8c395f",
            "form": "object",
            "updated_date": "2018-08-07T22:19:03.068062",
            "file_name": "example_file"
        }
    ],
    "ids": null,
    "acl": [],
    "hashes": null,
    "file_name": null
}


We can also look up this specific record using `GET` `/index/{UUID}`, where the UUID is the DID that indexd returned before when we created this record.

In [231]:
path = indexd('/index/{}'.format(v_0_did))
print_response(requests.get(path))

<Response [200]>
{
    "version": null,
    "did": "testprefix:760c371d-1efa-44e0-8a0e-83b797e738dc",
    "urls_metadata": {},
    "urls": [],
    "baseid": "cef3e517-a7e9-4381-9687-0ba11fc177b1",
    "created_date": "2018-08-07T22:19:03.068052",
    "size": 8,
    "acl": [
        "*"
    ],
    "metadata": {},
    "hashes": {
        "md5": "e561f9248d7563d15dd93457b02ebbb6"
    },
    "rev": "be8c395f",
    "form": "object",
    "updated_date": "2018-08-07T22:19:03.068062",
    "file_name": "example_file"
}


Great, so we made a new record!...but what does any of that stuff mean?? Let's break this down.

### About Records in indexd

A single record in indexd contains several fields; let's go through each field and explain what these are for.

#### `did` ("digital identifier")

A unique identifier (UUID4) for the file; indexd will make these for new records automatically. Notice that the one that indexd generated for us looks like this:
```
<prefix>:<UUID>
```
TODO

#### `baseid`

The `baseid` is a common identifier for all versions of one file, across revisions.

#### `rev`

The `rev` field identifies a particular version of a file with multiple versions.

#### `form`

#### `size`

This is just the filesize that we gave indexd originally for this file.

#### `file_name`

Optional field recording the filename of the indexed file.

#### `metadata`

#### `urls_metadata`

#### `version`

#### `urls`

Like we mentioned above, this is the list of URLs which point to the real location of the stored data.

#### `acl`

#### `hashes`

`hashes` is an object storing one or more hashes for the file itself. These can be any of:
- MD5
- SHA
- SHA256
- SHA512
- CRC
- ETag

### Record Versions

Now that we've created a record, let's look at the process of updating this record with a new version. We're going to change the contents—and thus the size and the hash—of our imaginary file. Let's update indexd with the new information. To add a new version, we `POST` to `/index/{UUID}`, where the UUID is the DID of the existing file.

In [232]:
# Here's the new data for the "file".
data['size'] = 10
data['hashes'] = {'md5': 'f7952a9483fae0af6d41370d9333020b'}

# We saved the DID for this file before.
path = indexd('/index/{}'.format(v_0_did))
response = requests.post(path, json=data, auth=request_auth)
v_1_baseid = response.json()['baseid']
v_1_did = response.json()['did']
v_1_rev = response.json()['rev']
print_response(response)

<Response [200]>
{
    "rev": "2d09fa8d",
    "did": "88bca605-42f9-40b1-a0e3-41e632276125",
    "baseid": "cef3e517-a7e9-4381-9687-0ba11fc177b1"
}


Now, if we compare this `baseid` to the `baseid` that indexd returned when we created the record for the original file, we see that this `baseid` remains the same.

In [233]:
print(v_0_baseid == v_1_baseid)

True


However, this record has a different `rev` and a different `did` than the original.

In [234]:
print(v_0_did == v_1_did)
print(v_0_rev == v_1_rev)

False
False


Having created the new version for this file, let's again make a request `GET` `/index/{UUID}`, using the shared `baseid`.

In [235]:
path = indexd('/index/{}'.format(v_0_baseid))
response = requests.get(path)
print_response(response)

<Response [200]>
{
    "version": "2.0",
    "did": "88bca605-42f9-40b1-a0e3-41e632276125",
    "urls_metadata": {},
    "urls": [],
    "baseid": "cef3e517-a7e9-4381-9687-0ba11fc177b1",
    "created_date": "2018-08-07T22:19:03.147919",
    "size": 10,
    "acl": [
        "*"
    ],
    "metadata": {},
    "hashes": {
        "md5": "f7952a9483fae0af6d41370d9333020b"
    },
    "rev": "2d09fa8d",
    "form": "object",
    "updated_date": "2018-08-07T22:19:03.147927",
    "file_name": "example_file"
}


The information for this record reflects the new changes to the file.

In [236]:
print(response.json()['did'] == v_1_did)

True


However, the original information still exists. We can make a request again using the DID of the original file, and see that this revision hasn't changed.

In [237]:
path = indexd('/index/{}'.format(v_0_did))
print_response(requests.get(path))

<Response [200]>
{
    "version": null,
    "did": "testprefix:760c371d-1efa-44e0-8a0e-83b797e738dc",
    "urls_metadata": {},
    "urls": [],
    "baseid": "cef3e517-a7e9-4381-9687-0ba11fc177b1",
    "created_date": "2018-08-07T22:19:03.068052",
    "size": 8,
    "acl": [
        "*"
    ],
    "metadata": {},
    "hashes": {
        "md5": "e561f9248d7563d15dd93457b02ebbb6"
    },
    "rev": "be8c395f",
    "form": "object",
    "updated_date": "2018-08-07T22:19:03.068062",
    "file_name": "example_file"
}


Finally, we can look at the whole list of versions for a single file, with `GET` `/index/{UUID}/versions`. The object in the response will contain the records for every version of this file as key-value pairs, where the keys are just numeric indexes (in string form) and the values are the records.

In [266]:
path = indexd('/index/{}/versions'.format(v_0_baseid))
print_response(requests.get(path))

<Response [200]>
{
    "1": {
        "version": "2.0",
        "did": "88bca605-42f9-40b1-a0e3-41e632276125",
        "urls_metadata": {},
        "urls": [],
        "baseid": "cef3e517-a7e9-4381-9687-0ba11fc177b1",
        "created_date": "2018-08-07T22:19:03.147919",
        "size": 10,
        "acl": [
            "*"
        ],
        "metadata": {},
        "hashes": {
            "md5": "f7952a9483fae0af6d41370d9333020b"
        },
        "rev": "2d09fa8d",
        "form": "object",
        "updated_date": "2018-08-07T22:19:03.147927",
        "file_name": "example_file"
    },
    "0": {
        "version": null,
        "did": "testprefix:760c371d-1efa-44e0-8a0e-83b797e738dc",
        "urls_metadata": {},
        "urls": [],
        "baseid": "cef3e517-a7e9-4381-9687-0ba11fc177b1",
        "created_date": "2018-08-07T22:19:03.068052",
        "size": 8,
        "acl": [
            "*"
        ],
        "metadata": {},
        "hashes": {
            "md5": "e561f9248d7563d

### Record Aliases