## LLM based Geo coding 
In this example we'll walk through how to build and iterate on a locations search service that leverages an LLM to generate structured filter queries that can then be passed to a vector store.



imports and data prep
In this example we use ChatOpenAI for the model and ElasticsearchStore for the vector store, but these can be swapped out with an LLM/ChatModel and any VectorStore that support self-querying.

Download data from: https://www.kaggle.com/datasets/keshavramaiah/hotel-recommendation


In [1]:

## !pip install langchain lark openai elasticsearch pandas


In [3]:
!env | grep -i api

OPENAI_API_KEY=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


## Describe data attributes
### We'll use a self-query retriever, which requires us to describe the metadata we can filter on.


In [4]:
attribute_info = [
    {'name': 'placetype', 
     'description': 'The type of the place, for example: museum hospital mall private house',
     'type': 'string'},
     {'name': 'state',
      'description': 'The state where the place is in',
      'type': 'string'},
     {'name': 'city',
      'description': 'The city where the place is in',
      'type': 'string'}
]


In [5]:
attribute_info

[{'name': 'placetype',
  'description': 'The type of the place, for example: museum hospital mall private house',
  'type': 'string'},
 {'name': 'state',
  'description': 'The state where the place is in',
  'type': 'string'},
 {'name': 'city',
  'description': 'The city where the place is in',
  'type': 'string'}]

## Creating a query constructor chain
Let's take a look at the chain that will convert natural language requests into structured queries.

To start we can just load the prompt and see what it looks like

In [6]:
from langchain.chains.query_constructor.base import (
    get_query_constructor_prompt,
    load_query_constructor_runnable,
)

In [7]:
doc_contents = "name and description of a place in the world"
prompt = get_query_constructor_prompt(doc_contents, attribute_info)
print(prompt.format(query="{query}"))

Your goal is to structure the user's query to match the request schema provided below.

<< Structured Request Schema >>
When responding use a markdown code snippet with a JSON object formatted in the following schema:

```json
{
    "query": string \ text string to compare to document contents
    "filter": string \ logical condition statement for filtering documents
}
```

The query string should contain only text that is expected to match the contents of documents. Any conditions in the filter should not be mentioned in the query as well.

A logical condition statement is composed of one or more comparison and logical operation statements.

A comparison statement takes the form: `comp(attr, val)`:
- `comp` (eq | ne | gt | gte | lt | lte | contain | like | in | nin): comparator
- `attr` (string):  name of attribute to apply the comparison to
- `val` (string): is the comparison value

A logical operation statement takes the form `op(statement1, statement2, ...)`:
- `op` (and | or | not

In [10]:
from langchain_openai import ChatOpenAI

chain = load_query_constructor_runnable(
    ChatOpenAI(model="gpt-3.5-turbo", temperature=0), doc_contents, attribute_info
)

In [11]:
chain.invoke({"query": "An art museum in Amsterdam Netherland"})

StructuredQuery(query='art museum', filter=Operation(operator=<Operator.AND: 'and'>, arguments=[Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='city', value='Amsterdam'), Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='state', value='Netherland'), Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='placetype', value='museum')]), limit=None)