# Quick Start Guide - Adding Data

This notebook will guide you inserting data into the the Platform from
[General System](https://www.generalsystem.com).

OpenAPI specification documentation is available at
<https://api.dataflowindex.io/docs/api>.

Please refer to https://github.com/thegeneralsystem/dfi-client-examples for
the most up-to-date companion documentation.

Additional resources and help are available at <https://support.generalsystem.com>.

## Get ready

In [None]:
# Install Python modules if they are not already present.
!python3 -m pip install requests tabulate pydeck

In [None]:
# Import required modules.
import random

import pandas as pd
import requests
from tabulate import tabulate

In [None]:
# First set your API token to access the DFI API.
#
# Access to the DFI demonstration servers requires an API token, which may be
# obtained free of charge by enrolling at <https://eap.generalsystem.com>. Once
# enrolled, your API token may be redeemed from <https://tokens.dataflowindex.io/>.

from getpass import getpass

api_token = getpass("Enter your API token: ")

# Set authorisation headers:
headers = {
    "Authorization": f"Bearer {api_token}",
    "accept": "application/json",
    "content-type": "application/json",
}
base_url = "https://api.dataflowindex.io"
query_timeout = 60

In [None]:
# Get list of instances associated with your API token.
r = requests.get(f"{base_url}/instances", headers=headers, timeout=query_timeout)
print(r.json())

In [None]:
# Next select which DFI instance you will be accessing.
# Insert the name of your DFI instance here.
# Contact support via https://support.generalsystem.com/ if you need help
# finding your DFI instance name.
instance_name = "YOUR_DFI_INSTANCE_NAME_HERE"
params = {"instance": instance_name}

## Adding data

Note that you cannot add data to the Demo instance, as it is shared among multiple evaluation users.

If you have access to your own Trial instance, or have purchased an instance, then you can use the methods below to add data.

In [None]:
# Add a new point at coordinates 0,0 with sensor ID "1"
# and an ID can be either an int64 or a uuidv4.
payload = [
    {
        "coordinate": [0, 0],  # [Long, Lat ],
        "time": "2022-09-01T17:32:28.250Z",
        "id": 1234,
        "payload": "Application specific data",
    }
]
r = requests.post(f"{base_url}/insert", params=params, json=payload, headers=headers, timeout=query_timeout)

print(f"Status code: {r.status_code}")
print(f"Response:\n{r.text}")

Now we create a set of random points in the bounding box of any given polygon

In [None]:
# First we create a helper function.
def create_points_in_polygon(query_polygon: str, entity_id: int, n: int) -> None:
    r = requests.get(base_url + "/polygons/" + query_polygon, headers=headers, timeout=query_timeout)
    if r.status_code != 200:
        print(f"Status code: {r.status_code}")
        print(f"Reason: {r.reason}")
        print(f"Polygon does not exist: {query_polygon}")
        return None

    min_lat = min([x[1] for x in r.json()["vertices"]])
    max_lat = max([x[1] for x in r.json()["vertices"]])
    min_long = min([x[0] for x in r.json()["vertices"]])
    max_long = max([x[0] for x in r.json()["vertices"]])
    print(f"Bounding box: {min_long},{max_long} : {min_lat},{max_lat}")

    for _ in range(n):
        payload = [
            {
                "coordinate": [random.uniform(min_long, max_long), random.uniform(min_lat, max_lat)],  # [ Long, Lat ]
                "time": "2022-01-01T" + str(random.randint(10, 22)) + ":32:28.250Z",
                "id": entity_id,
                "payload": "Pts in polygon",
            }
        ]
        r = requests.post(f"{base_url}/insert", params=params, json=payload, headers=headers, timeout=query_timeout)
        print(f"Status code: {r.status_code}")
        print(f"Response:\n{r.text}")

In [None]:
# We have created a set of interesting polygons that you can use.
# These include the London Borough areas,
# the Congestion Charging Zone area and more.
# The code below lists the polygons available:
r = requests.get(f"{base_url}/polygons", headers=headers, timeout=query_timeout)
if r.status_code != 200:
    print(f"Status code: {r.status_code}")
    print(f"Response:\n{r.text}")
    r.raise_for_status()

data = [[polygon["name"], polygon["count"]] for polygon in r.json()["polygons"]]
print(tabulate(data, ["name", "vertices"], tablefmt="pretty"))

In [None]:
# Let's create some data:
create_points_in_polygon("uk_southwark", 1, 1000)

# Ingest file

Ingest data from a file stored on an AWS S3 bucket (or other file accessible from the Internet)

In [None]:
# before loading new data, let's DELETE all data in the instance
truncate_headers = {"Authorization": f"Bearer {api_token}", "accept": "*.*"}
instance_to_truncate = "my_instance"
r = requests.post(
    f"{base_url}/instances/{instance_to_truncate}/truncate", headers=truncate_headers, timeout=query_timeout
)

print(f"Status code: {r.status_code}")
print(f"Response:\n{r.text}")

In [None]:
# check instance is empty
r = requests.get(f"{base_url}/count", params=params, headers=headers, timeout=query_timeout)
if r.status_code != 200:
    print(f"Status code: {r.status_code}")
    print(f"Response:\n{r.text}")
    r.raise_for_status()

total_histories = r.json()
print(f"Records found: {total_histories}")

In [None]:
# we submit a new ingestion job
# we can submit a list of files to ingest
file = "https://domain/file.ext"
payload = {
    "source": {"urls": [file]},
    # we specify the order of the columns where to find each datatype:
    "format": {"columns": {"entityId": 0, "timestamp": 1, "longitude": 2, "latitude": 3}},
}
r = requests.put(f"{base_url}/import/s3", json=payload, params=params, headers=headers, timeout=query_timeout)
print(f"Status code: {r.status_code}")
result = r.json()
ingestion_id = result["id"]

In [None]:
# we can check status of ingestion
# the possible statuses are: created, started, finished
r = requests.get(f"{base_url}/import/s3/{ingestion_id}/status", headers=headers, timeout=query_timeout)
print(f"Status code: {r.status_code}")
result = r.json()
print("Result status: ", result["status"])
print("Insert count: ", result["insertCount"])