## Notebook to demo IDRIS on NYC Taxi Data.

This IDRIS is based on [LiteLLM](https://docs.litellm.ai/) and [DuckDB](https://duckdb.org/).

The original data can be download from https://www.kaggle.com/c/nyc-taxi-trip-duration/data.
A post-processed data can be download from [here](https://esriis-my.sharepoint.com/:u:/g/personal/mraad_esri_com/EbLCLK2xevxJmmGofsNiPZsBle6vGdJiKjJkeo1whrdtww?e=KZR3gR).

Install the following modules:
```shell
uv pip install -U duckdb folium matplotlib mapclassify xyzservices geopandas orjson
```

Pull the following ollama [model](https://ollama.com/library/duckdb-nsql):
```shell
ollama pull llama3.2:latest
ollama pull duckdb-nsql:7b-q8_0
ollama pull mxbai-embed-large:latest
```

In [1]:
import os

import gait as G
import geopandas as gpd
import nest_asyncio
import xyzservices.providers as xyz
from shapely import wkb

In [2]:
nest_asyncio.apply()  # so we can run in notebook :-)

### Adjust accordingly to where you downloaded the data.

In [3]:
base = os.path.expanduser(os.path.join("~", "data", "nyc-taxi-trip-duration"))
os.path.exists(base)

True

### Create IDRIS instance using LiteLLM and DuckDB.

In [24]:
rdb = G.IdrisDuckDB(database=os.path.join(base, "trips.db"))
llm = G.IdrisLiteLLM(
    # model_name="ollama/llama3.2:latest",
    model_name="ollama_chat/duckdb-nsql:7b-q8_0",
    # model_name="azure/gpt-4o-mini",
    # api_base=os.environ["AZURE_API_URL"] + "/gpt-4o-mini",
    # api_version="2024-06-01",
)
emb = G.IdrisLiteEmb(
    model_name="ollama/mxbai-embed-large:latest",
    # model_name="azure/text-embedding-ada-002",
    # api_base=os.environ["AZURE_API_URL"] + "/text-embedding-ada-002",
    # api_version="2024-06-01",
)
idris = G.Idris(rdb, emb, llm)

### Add the trips table description.

In [25]:
idris.add_describe_table("trips")

### Load addition context.

In [26]:
idris.load_context_json(os.path.join(base, "context.json"))

### Load Question/SQL samples.

In [27]:
idris.load_question_sql_json(os.path.join(base, "question_sql.json"))

### Starting asking questions :-)

In [28]:
sql = idris.generate_sql("list the pickup boroughs")
sql

'SELECT DISTINCT pickup_boro FROM trips WHERE pickup_boro IS NOT NULL ORDER BY 1'

### Check if the SQL is valid and execute it.

In [29]:
G.is_sql_valid(sql)

True

In [30]:
idris.execute_sql(sql)

Unnamed: 0,pickup_boro
0,Bronx
1,Brooklyn
2,Manhattan
3,Queens
4,Staten Island


### Generate geometry output and let's map it.

Note how here we are calling idris directly and it will return a pandas dataframe if the generate sql is valid.

In [31]:
pdf = idris("Show heatmap of trips from Manhattan at 2AM")

### Create geometry column from WKB content and explore it.

In [32]:
pdf.geometry = pdf.geometry.apply(bytes)
pdf.geometry = pdf.geometry.apply(wkb.loads)
gdf = gpd.GeoDataFrame(pdf, crs="EPSG:3857")

In [33]:
gdf.explore(
    "z_score",
    cmap="coolwarm",  # https://matplotlib.org/stable/tutorials/colors/colormaps.html
    vmin=-3.0,
    vmax=3.0,
    tiles=xyz.Esri.WorldGrayCanvas,
)