## Notebook to demo IDRIS on NYC Taxi Data.

This IDRIS is based on [Ollama](https://ollama.com/) and [DuckDB](https://duckdb.org/).

The original data can be download from https://www.kaggle.com/c/nyc-taxi-trip-duration/data.
A post-processed data can be download from [here](https://esriis-my.sharepoint.com/:u:/g/personal/mraad_esri_com/EbLCLK2xevxJmmGofsNiPZsBle6vGdJiKjJkeo1whrdtww?e=KZR3gR).

Install the following modules:
```shell
uv pip install -U duckdb folium matplotlib mapclassify xyzservices geopandas
```

Pull the following ollama [model](https://ollama.com/library/duckdb-nsql):
```shell
ollama pull duckdb-nsql
```

In [1]:
import os

import gait as G
import geopandas as gpd
import xyzservices.providers as xyz
from shapely import wkb

### Adjust accordingly to where you downloaded the data.

In [2]:
base = os.path.expanduser(os.path.join("~", "data", "nyc-taxi-trip-duration"))
os.path.exists(base)

True

### Create IDRIS instance using Ollama and DuckDB.

In [3]:
rdb = G.IdrisDuckDB(database=os.path.join(base, "trips.db"))

emb = G.IdrisLiteEmb(
    model_name="azure/text-embedding-ada-002",
    api_base=os.environ["AZURE_API_URL"] + "/text-embedding-ada-002",
    api_version="2024-06-01",
    # model_name="openai/mxbai-embed-large:latest",
    # api_base="http://localhost:11434/v1",
    # api_key="ollama",
)

llm = G.IdrisLiteLLM(
    model_name="ollama_chat/mistral:7b-instruct-v0.3-q8_0",
    #
    # model_name="azure/gpt-4o-mini",
    # api_base=os.environ["AZURE_API_URL"] + "/gpt-4o-mini",
    # api_version="2024-06-01",
    #
    # model_name="openai/duckdb-nsql:7b-q8_0",
    # api_base="http://localhost:11434/v1",
    # api_key="ollama",
)

idris = G.Idris(rdb, emb, llm)

### Add the trips table description.

In [4]:
idris.add_describe_table("trips")

### Load addition context.

In [5]:
idris.load_context_json(os.path.join(base, "context.json"))

### Load Question/SQL samples.

In [6]:
idris.load_question_sql_json(os.path.join(base, "question_sql.json"))

### Starting asking questions :-)

In [7]:
sql = idris.generate_sql("list the pickup boroughs")
sql

'SELECT DISTINCT pickup_boro FROM trips WHERE pickup_boro IS NOT NULL'

### Check if the SQL is valid and execute it.

In [8]:
G.is_sql_valid(sql)

True

In [9]:
idris.execute_sql(sql)

Unnamed: 0,pickup_boro
0,Staten Island
1,Brooklyn
2,Bronx
3,Queens
4,Manhattan


### Generate geometry output and let's map it.

Note how here we are calling idris directly and it will return a pandas dataframe if the generate sql is valid.

In [10]:
pdf = idris("Show heatmap of trips from Manhattan at 2AM")

### Create geometry column from WKB content and explore it.

In [11]:
pdf.geometry = pdf.geometry.apply(bytes)
pdf.geometry = pdf.geometry.apply(wkb.loads)
gdf = gpd.GeoDataFrame(pdf, crs="EPSG:3857")

In [None]:
gdf.explore(
    "z_score",
    cmap="coolwarm",  # https://matplotlib.org/stable/tutorials/colors/colormaps.html
    vmin=-3.0,
    vmax=3.0,
    tiles=xyz.Esri.WorldGrayCanvas,
)