# Intro to Data Flow Index Python API

**Note: DFI Queries Will Not Work**

**The Data Flow Index server used for this workshop is no longer running.  The workshop materials are left up _as is_ but queries will not run.  If you would like to trial the Data Flow Index please reach out to General System at [https://www.generalsystem.com/contact-us](https://www.generalsystem.com/contact-us).**


In [None]:
import json
import shutil
from datetime import datetime
from getpass import getpass
from pathlib import Path

import geopandas as gpd
import pandas as pd
import pydeck as pdk
import urllib3
from dfi import Client
from shapely.geometry import Point

## I. Workshop Location

### A. Load the OSM dataset

In [None]:
def load_location_data(filename: str, url: str) -> gpd.GeoDataFrame:
    """this function will save the file at the url filename
    e.g. url = "https://d3ftlhu7xfb8rb.cloudfront.net/blank_street_coffees.geoparquet"
    """
    Path(filename).parent.mkdir(parents=True, exist_ok=True)
    http = urllib3.PoolManager()
    with open(filename, "wb") as out:
        r = http.request("GET", url, preload_content=False)
        shutil.copyfileobj(r, out)

    return gpd.read_parquet(filename)

In [None]:
gdf = load_location_data("./temp-data", "https://d3ftlhu7xfb8rb.cloudfront.net/london_nyc_osm.geoparquet")

### B. Find the conference building in the OSM dataset

In [None]:
coord = Point(-73.963945, 40.806802)  # Main conference building
building = gdf[gdf.intersects(coord)]
vertices = list(building.geometry.iloc[0].exterior.coords)

### C. Map the building polygon

In [None]:
ORANGE = [255, 80, 8]

layer = pdk.Layer(
    "GeoJsonLayer",
    building,
    opacity=0.5,
    stroked=True,
    filled=True,
    get_fill_color=ORANGE,
    get_line_color=ORANGE,
)

building_centroid = list(zip(building.centroid.x, building.centroid.y))
view = pdk.data_utils.compute_view(building_centroid)
view.zoom = 15

pdk.Deck(
    layers=[layer],
    initial_view_state=view,
)

## II. Querying with Data Flow Index

See [dfipy documentation](https://dfipy.docs.generalsystem.com/).

There are three main entry points for querying the DFI:

- `dfi.get.records()` - queries for records within the filter bounds
- `dfi.get.entities()` - queries for the unique entities within the filter bounds
- `dfi.get.records_count()` - queries for the count of records within the filter bounds

All three methods have the filter bounds `polygon` and `time_interval`. The `dfi.get.records()` and `dfi.get.records_count()` have an additional filter bound, `entities`.

|          | BBox | Polygon | Entities | Time Interval |
|----------|------|---------|----------|---------------|
| Count    | ✔︎    | ✔︎       | ✔︎        | ✔︎             |
| Entities | ✔︎    | ✔︎       | X        | ✔︎             |
| Records  | ✔︎    | ✔︎       | ✔︎        | ✔︎             |

### A. Initialization

In [None]:
token = getpass("Enter your API access token: ")
instance = "sdsc-2-2088"  # sdsc-1-5148
namespace = "gs"
url = "https://api.prod.generalsystem.com"

dfi = Client(token, instance, namespace, url, progress_bar=True)

### B. Count of Records within a Polygon

In [None]:
dfi.get.records_count(polygon=vertices)

### C. Unique IDs within a Polygon

In [None]:
entities = dfi.get.entities(polygon=vertices)
len(entities)

### D. Records within a Polygon & Time Range

In [None]:
start_time = datetime(2022, 8, 1, 0, 0, 0)
end_time = datetime(2022, 9, 1, 1, 0, 0)
df = dfi.get.records(polygon=vertices, time_interval=(start_time, end_time))

df.info()

### E. Records for an Entity

In [None]:
df = dfi.get.records(entities=["ba64395a-1268-4f90-9197-b9de3aebbc80"], add_payload_as_json=True).assign(
    payload=lambda df: df.payload.map(json.loads)
)
df = df.join(pd.DataFrame(df.pop("payload").tolist()))

df.head()