## Disambiguation

In this notebook...


In [1]:
import sys
sys.path.append("..")

In [42]:
from dotenv import load_dotenv
import os
from openai import OpenAI
from pydantic import BaseModel
import json

In [3]:
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

In [4]:
client = OpenAI(api_key=OPENAI_API_KEY)

In [60]:
MODEL = "gpt-4o-mini"
TEXT = "They marched from [Alexandria](LOCATION) through [Memphis](LOCATION) via the [Nile](LOCATION) to [Thebes](LOCATION)."
ENTITY_TO_IDENTIFY = "Memphis"

In [111]:
prompt = """
Query the web to identify this entity in Wikidata.

{entity}

It is within the context of the following text:

{text}

Only return the JSON output, nothing else. Do so with the following schema:

class Entity(BaseModel):
    entity_text: str
    label: str
    wikidata_id: str
    sources: list[str]
"""

In [112]:
formatted_prompt = prompt.format(entity=ENTITY_TO_IDENTIFY, text=TEXT)

In [113]:
print(formatted_prompt)


Query the web to identify this entity in Wikidata.

Memphis

It is within the context of the following text:

They marched from [Alexandria](LOCATION) through [Memphis](LOCATION) via the [Nile](LOCATION) to [Thebes](LOCATION).

Only return the JSON output, nothing else. Do so with the following schema:

class Entity(BaseModel):
    entity_text: str
    label: str
    wikidata_id: str
    sources: list[str]



In [114]:

response = client.responses.create(
    model="gpt-4o",
    tools=[{"type": "web_search",
}],
    input=formatted_prompt,
)

output_text = response.output_text

In [115]:
print(output_text)

```json
{
  "entity_text": "Memphis",
  "label": "Memphis",
  "wikidata_id": "Q5715",
  "sources": [
    "Wikidata entry for Memphis (ancient capital of Inebu-hedj, Egypt) ([wikidata.org](https://www.wikidata.org/wiki/Q5715?utm_source=openai), [m.wikidata.org](https://m.wikidata.org/wiki/Q5715?utm_source=openai))",
    "Wikipedia article 'Memphis, Egypt' confirming its identity as the ancient capital of Inebu-hedj ([en.wikipedia.org](https://en.wikipedia.org/wiki/Memphis%2C_Egypt?utm_source=openai))"
  ]
}
```


In [116]:
def parse_json_with_sources(text):
    json_data = text.split("```json")[1]
    json_data, sources = json_data.split("```")
    json_data = json.loads(json_data)
    return json_data, sources

json_output, sources = parse_json_with_sources(output_text)
print(json_output)

{'entity_text': 'Memphis', 'label': 'Memphis', 'wikidata_id': 'Q5715', 'sources': ['Wikidata entry for Memphis (ancient capital of Inebu-hedj, Egypt) ([wikidata.org](https://www.wikidata.org/wiki/Q5715?utm_source=openai), [m.wikidata.org](https://m.wikidata.org/wiki/Q5715?utm_source=openai))', "Wikipedia article 'Memphis, Egypt' confirming its identity as the ancient capital of Inebu-hedj ([en.wikipedia.org](https://en.wikipedia.org/wiki/Memphis%2C_Egypt?utm_source=openai))"]}
