<a href="https://colab.research.google.com/github/vrajeshtrichy/HuggingFace-GPT-RAG-MedicalQA/blob/master/HuggingFace-GPT-RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Building A RAG System with OpenAI GPT 4.0, Elasticsearch and Hugging Face Models

Authored By: [Rajesh Kanna](https://www.linkedin.com/in/vrajeshtrichy/)

This notebook walks you through building a Retrieval-Augmented Generation (RAG) powered by Elasticsearch (ES) and Hugging Face models, letting you toggle between ES-vectorising (your ES cluster vectorises for you when ingesting and querying) vs self-vectorising (you vectorise all your data before sending it to ES).

## Step 1: Installing Libraries

In [1]:
!pip install elasticsearch sentence_transformers transformers eland==8.12.1
!pip install datasets==2.19.2
!pip install openai



## Step 2: Set up

### Hugging Face
Authenticate with Hugging Face to download models and datasets.

In [2]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

#### Elasticsearch deployment

`CLOUD_ID` and `ELASTIC_DEPL_API_KEY` saved as Colab secrets.

In [3]:
from google.colab import userdata

CLOUD_ID = userdata.get("CLOUD_ID")

ELASTIC_API_KEY = userdata.get("ELASTIC_API_KEY")

In [4]:
from elasticsearch import Elasticsearch, helpers

# Create the client instance
client = Elasticsearch(cloud_id=CLOUD_ID, api_key=ELASTIC_API_KEY)

# Successful response!
client.info()

ObjectApiResponse({'name': 'instance-0000000001', 'cluster_name': 'a7730461ad0f483fbaff9fec75caa26e', 'cluster_uuid': '4i2TdQI3QJacLfbeUTQDtg', 'version': {'number': '8.15.0', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': '1a77947f34deddb41af25e6f0ddb8e830159c179', 'build_date': '2024-08-05T10:05:34.233336849Z', 'build_snapshot': False, 'lucene_version': '9.11.1', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'})

## Step 3: Data sourcing and preparation

The data utilised is sourced from Hugging Face datasets, specifically the
[MongoDB/whatscooking.restaurants dataset](https://huggingface.co/datasets/MongoDB/whatscooking.restaurants).

In [5]:
# Load Dataset
from datasets import load_dataset, Features, Value

restaurant_dataset = load_dataset("MongoDB/whatscooking.restaurants")

print(restaurant_dataset)

DatasetDict({
    train: Dataset({
        features: ['restaurant_id', 'location', '_id', 'review_count', 'DogsAllowed', 'embedding', 'PriceRange', 'menu', 'HappyHour', 'TakeOut', 'address', 'sponsored', 'attributes', 'borough', 'OutdoorSeating', 'name', 'cuisine', 'stars'],
        num_rows: 25361
    })
})


In [6]:
# Data Preparation

restaurant_dataset = restaurant_dataset.remove_columns("embedding")
restaurant_dataset = restaurant_dataset.remove_columns("restaurant_id")
restaurant_dataset = restaurant_dataset.remove_columns("location")
restaurant_dataset = restaurant_dataset.remove_columns("_id")
restaurant_dataset = restaurant_dataset.remove_columns("DogsAllowed")
restaurant_dataset = restaurant_dataset.remove_columns("HappyHour")
restaurant_dataset = restaurant_dataset.remove_columns("TakeOut")
restaurant_dataset = restaurant_dataset.remove_columns("address")
restaurant_dataset = restaurant_dataset.remove_columns("sponsored")
restaurant_dataset = restaurant_dataset.remove_columns("OutdoorSeating")

# Remove data point where plot coloumn is missing

restaurant_dataset = restaurant_dataset.filter(lambda x: (
    (x["name"] is not None) and
     (x["cuisine"] is not None) and
      (x["borough"] is not None) and
       (x["menu"] is not None) and
        (x["review_count"] is not None) and
         (x["PriceRange"] is not None) and
          (x["attributes"] is not None) and
           (x["stars"] is not None)))

# Convert list of strings to strings
def list_to_string(example):
  for k, v in example.items():
    if isinstance(v, list):
      example[k] = ' '.join(str(v))
    elif isinstance(v, dict):
      example[k] = ' '.join(str(val) for val in v.values())
  return example

restaurant_dataset = restaurant_dataset.map(list_to_string)


restaurant_dataset = restaurant_dataset.cast(Features({
    "name": Value("string"),
    "cuisine": Value("string"),
    "borough": Value("string"),
    "menu": Value("string"),
    "review_count": Value("string"),
    "PriceRange": Value("string"),
    "attributes": Value("string"),
    "stars": Value("string")
    }))

restaurant_dataset["train"]

Map:   0%|          | 0/12775 [00:00<?, ? examples/s]

Casting the dataset:   0%|          | 0/12775 [00:00<?, ? examples/s]

Dataset({
    features: ['name', 'cuisine', 'borough', 'menu', 'review_count', 'PriceRange', 'attributes', 'stars'],
    num_rows: 12775
})

In [7]:
restaurant_dataset["train"][0]
# len(str(restaurant_dataset["train"][0]))

{'name': "Buddy'S Wonder Bar",
 'cuisine': 'American',
 'borough': 'Staten Island',
 'menu': "[ ' G r i l l e d   c h e e s e   s a n d w i c h ' ,   ' B a k e d   p o t a t o ' ,   ' L a s a g n a ' ,   ' M o z z a r e l l a   s t i c k s ' ,   ' M a c   &   c h e e s e ' ,   ' C h i c k e n   f i n g e r s ' ,   ' M a s h e d   p o t a t o e s ' ,   ' C h i c k e n   p o t   p i e ' ,   ' G r e e n   s a l a d ' ,   ' M e a t l o a f ' ,   ' T o m a t o   s o u p ' ,   ' O n i o n   r i n g s ' ]",
 'review_count': '62',
 'PriceRange': '2',
 'attributes': "'beer_and_wine' {'romantic': False, 'intimate': False, 'classy': False, 'hipster': False, 'divey': False, 'touristy': False, 'trendy': False, 'upscale': False, 'casual': True} None None True None None {'garage': False, 'street': True, 'validated': False, 'lot': True, 'valet': False} None None None True {'dessert': False, 'latenight': False, 'lunch': True, 'dinner': True, 'brunch': False, 'breakfast': False} True None u'average' 'ca

## Step 4: Load Elasticsearch with vectorised data

### Choose data and query vectorisation options

- Setting `USE_ELASTICSEARCH_VECTORISATION` to `True` will make the rest of this notebook set up and use ES-hosted-vectorisation for the data and querying, but **this requires your ES deployment to have at least 1 ML node**.

- If `USE_ELASTICSEARCH_VECTORISATION` is `False`, this notebook will set up and use the provided model "locally" for data and query vectorisation.

Using [thenlper/gte-small](https://huggingface.co/thenlper/gte-small) model for embedding.

In [8]:
USE_ELASTICSEARCH_VECTORISATION = True

EMBEDDING_MODEL_ID = "thenlper/gte-small"
# https://huggingface.co/thenlper/gte-small's page shows the dimensions of the model
# If you use the `gte-base` or `gte-large` embedding models, the numDimension
# value in the vector search index must be set to 768 and 1024, respectively.
EMBEDDING_DIMENSIONS = 384

### Load Hugging Face model into Elasticsearch if needed

This step loads and deploys the Hugging Face model into Elasticsearch using [Eland](https://eland.readthedocs.io/en/v8.12.1/), if `USE_ELASTICSEARCH_VECTORISATION` is `True`. This allows Elasticsearch to vectorise queries, and data in later steps.

In [9]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"
!(if [ "True" == $USE_ELASTICSEARCH_VECTORISATION ]; then \
  eland_import_hub_model --cloud-id $CLOUD_ID --hub-model-id $EMBEDDING_MODEL_ID --task-type text_embedding --es-api-key $ELASTIC_API_KEY --start --clear-previous; \
fi)

2024-08-27 16:15:34.201593: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-27 16:15:34.230774: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-27 16:15:34.241186: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-08-27 16:15:34.279664: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-27 16:15:40,400 INFO : Establishing connectio

This step adds functions for creating embeddings for text locally, and enriches the dataset with embeddings, so that the data can be ingested into Elasticsearch as vectors. Does not run if `USE_ELASTICSEARCH_VECTORISATION` is True.

In [10]:
from sentence_transformers import SentenceTransformer

if not USE_ELASTICSEARCH_VECTORISATION:
    embedding_model = SentenceTransformer(EMBEDDING_MODEL_ID)


def get_embedding(text: str) -> list[float]:
    if USE_ELASTICSEARCH_VECTORISATION:
        raise Exception(
            f"Disabled when USE_ELASTICSEARCH_VECTORISATION is [{USE_ELASTICSEARCH_VECTORISATION}]"
        )
    else:
        if not text.strip():
            print("Attempted to get embedding for empty text.")
            return []

        embedding = embedding_model.encode(text)
        return embedding.tolist()


def add_fullplot_embedding(x):
    if USE_ELASTICSEARCH_VECTORISATION:
        raise Exception(
            f"Disabled when USE_ELASTICSEARCH_VECTORISATION is [{USE_ELASTICSEARCH_VECTORISATION}]"
        )
    else:
        full_details = x["name"] + x["borough"] + x["cuisine"] + x["stars"] + x["review_count"] + x["PriceRange"] + x["menu"] + x["attributes"]
        return {"embedding": [get_embedding(full_detail) for full_detail in full_details]}


if not USE_ELASTICSEARCH_VECTORISATION:
    restaurant_dataset = restaurant_dataset.map(add_fullplot_embedding, batched=True)
    restaurant_dataset["train"]

In [11]:
restaurant_dataset["train"]

Dataset({
    features: ['name', 'cuisine', 'borough', 'menu', 'review_count', 'PriceRange', 'attributes', 'stars'],
    num_rows: 12775
})

In [12]:
# restaurant_dataset["train"][0]

## Step 5: Create a Search Index with vector search mappings.

Create an index in Elasticsearch with the right index mappings to handle vector searches.

In [13]:
# Needs to match the id returned from Eland
# in general for Hugging Face models, you just replace the forward slash with
# double underscore
print(EMBEDDING_MODEL_ID)
model_id = EMBEDDING_MODEL_ID.replace("/", "__")

index_name = "restaurants"

index_mapping = {
    "properties": {
        "name": {"type": "text"},
        "borough": {"type": "text"},
        "cuisine": {"type": "text"},
        "stars": {"type": "text"},
        "review_count": {"type": "text"},
        "PriceRange": {"type": "text"},
        "menu": {"type": "text"},
        "attributes": {"type": "text"},
    }
}
# define index mapping
if USE_ELASTICSEARCH_VECTORISATION:
    index_mapping["properties"]["embedding"] = {
        "properties": {
            "is_truncated": {"type": "boolean"},
            "model_id": {
                "type": "text",
                "fields": {"keyword": {"type": "keyword", "ignore_above": 256}},
            },
            "predicted_value": {
                "type": "dense_vector",
                "dims": EMBEDDING_DIMENSIONS,
                "index": True,
                "similarity": "cosine",
            },
        }
    }
else:
    index_mapping["properties"]["embedding"] = {
        "type": "dense_vector",
        "dims": EMBEDDING_DIMENSIONS,
        "index": "true",
        "similarity": "cosine",
    }

# flag to check if index has to be deleted before creating
should_delete_index = True

# check if we want to delete index before creating the index
if should_delete_index:
    if client.indices.exists(index=index_name):
        print("Deleting existing %s" % index_name)
        client.indices.delete(index=index_name, ignore=[400, 404])

print("Creating index %s" % index_name)


# ingest pipeline definition
if USE_ELASTICSEARCH_VECTORISATION:
    pipeline_id = "vectorize_restaurants"

    client.ingest.put_pipeline(
        id=pipeline_id,
        processors=[
            {
                "inference": {
                    "model_id": model_id,
                    "target_field": "embedding",
                    "field_map": {
                        "borough": "text_field",
                        "cuisine": "text_field",
                        "stars": "text_field",
                        "review_count": "text_field",
                        "PriceRange": "text_field",
                        "menu": "text_field",
                        "attributes": "text_field",
                        },
                }
            }
        ],
    )

    index_settings = {
        "index": {
            "default_pipeline": pipeline_id,
        }
    }
else:
    index_settings = {}

client.options(ignore_status=[400, 404]).indices.create(
    index=index_name, mappings=index_mapping, settings=index_settings
)

thenlper/gte-small
Deleting existing restaurants


  client.indices.delete(index=index_name, ignore=[400, 404])


Creating index restaurants


ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'restaurants'})

Ingesting data into a Elasticsearch is best done in batches. `helpers` offers an easy way to do this.

In [14]:
from elasticsearch.helpers import BulkIndexError

def batch_to_bulk_actions(batch):
    for record in batch:
        action = {
            "_index": "restaurants",
            "_source": {
                "name": record["name"],
                "borough": record["borough"],
                "cuisine": record["cuisine"],
                "stars": record["stars"],
                "review_count": record["review_count"],
                "price_range": record["PriceRange"],
                "menu": record["menu"],
                "attributes": record["attributes"],
            },
        }
        if not USE_ELASTICSEARCH_VECTORISATION:
            action["_source"]["embedding"] = record["embedding"]
        yield action


def bulk_index(ds):
    start = 0
    end = len(ds)
    batch_size = 100
    if USE_ELASTICSEARCH_VECTORISATION:
        # If using auto-embedding, bulk requests can take a lot longer,
        # so pass a longer request_timeout here (defaults to 10s), otherwise
        # we could get Connection timeouts
        batch_client = client.options(request_timeout=600)
    else:
        batch_client = client
    for batch_start in range(start, end, batch_size):
        batch_end = min(batch_start + batch_size, end)
        print(f"batch: start [{batch_start}], end [{batch_end}]")
        batch = ds.select(range(batch_start, batch_end))
        actions = batch_to_bulk_actions(batch)
        helpers.bulk(batch_client, actions)


try:
    bulk_index(restaurant_dataset["train"])
except BulkIndexError as e:
    print(f"{e.errors}")

print("Data ingestion into Elasticsearch complete!")

batch: start [0], end [100]
batch: start [100], end [200]
batch: start [200], end [300]
batch: start [300], end [400]
batch: start [400], end [500]
batch: start [500], end [600]
batch: start [600], end [700]
batch: start [700], end [800]
batch: start [800], end [900]
batch: start [900], end [1000]
batch: start [1000], end [1100]
batch: start [1100], end [1200]
batch: start [1200], end [1300]
batch: start [1300], end [1400]
batch: start [1400], end [1500]
batch: start [1500], end [1600]
batch: start [1600], end [1700]
batch: start [1700], end [1800]
batch: start [1800], end [1900]
batch: start [1900], end [2000]
batch: start [2000], end [2100]
batch: start [2100], end [2200]
batch: start [2200], end [2300]
batch: start [2300], end [2400]
batch: start [2400], end [2500]
batch: start [2500], end [2600]
batch: start [2600], end [2700]
batch: start [2700], end [2800]
batch: start [2800], end [2900]
batch: start [2900], end [3000]
batch: start [3000], end [3100]
batch: start [3100], end [320

## Step 6: Perform Vector Search on User Queries

The following step implements a function that returns a vector search result.

If `USE_ELASTICSEARCH_VECTORISATION` is true, the text query is sent directly to
ES where the uploaded model will be used to vectorise it first before doing a vector search. If `USE_ELASTICSEARCH_VECTORISATION` is false, then we do the
vectorising locally before sending a query with the vectorised form of the query.

In [15]:
def vector_search(restaurant_query):
    if USE_ELASTICSEARCH_VECTORISATION:
        knn = {
            "field": "embedding.predicted_value",
            "k": 10,
            "query_vector_builder": {
                "text_embedding": {
                    "model_id": model_id,
                    "model_text": restaurant_query,
                }
            },
            "num_candidates": 150,
        }
    else:
        question_embedding = get_embedding(restaurant_query)
        knn = {
            "field": "embedding",
            "query_vector": question_embedding,
            "k": 10,
            "num_candidates": 150,
        }

    response = client.search(index="restaurants", knn=knn, size=10)
    results = []
    for hit in response["hits"]["hits"]:
        id = hit["_id"]
        score = hit["_score"]
        name = hit["_source"]["name"]
        borough = hit["_source"]["borough"]
        cuisine = hit["_source"]["cuisine"]
        stars = hit["_source"]["stars"]
        review_count = hit["_source"]["review_count"]
        price_range = hit["_source"]["price_range"]
        menu = hit["_source"]["menu"]
        attributes = hit["_source"]["attributes"]
        result = {
            "id": id,
            "_score": score,
            "name": name,
            "borough": borough,
            "cuisine": cuisine,
            "stars": stars,
            "review_count": review_count,
            "price_range": price_range,
            "menu": menu,
            "attributes": attributes,
        }
        results.append(result)
    return results

def pretty_search(query):

    get_knowledge = vector_search(query)

    search_result = ""
    for result in get_knowledge:
        search_result += f"Name: {result.get('name', 'N/A')}, Location: {result.get('borough', 'N/A')}, Cuisine: {result.get('cuisine', 'N/A')}, Stars: {result.get('stars', 'N/A')}, Review_count: {result.get('review_count', 'N/A')}, PriceRange: {result.get('price_range', 'N/A')}, Menu: {result.get('menu', 'N/A')}, Attributes: {result.get('attributes', 'N/A')}\n"

    return search_result

## Step 7: Handling user queries and loading OpenAI GPT 4.o


In [16]:
# Conduct query with retrival of sources, combining results into something that
# we can feed to GPT
def combined_query(query):
    source_information = pretty_search(query)
    return f"Query: {query}\nContinue to answer the query by using these Search Results:\n{source_information}."


query = "What is the best restaurant for Kids in Manhattan?"
combined_results = combined_query(query)

print(combined_results)

Query: What is the best restaurant for Kids in Manhattan?
Continue to answer the query by using these Search Results:
Name: Keats Restaurant, Location: Manhattan, Cuisine: American, Stars: 4, Review_count: 149, PriceRange: 2, Menu: [ ' F r e n c h   f r i e s ' ,   ' C h i c k e n   p o t   p i e ' ,   ' M a c   &   c h e e s e ' ,   ' C h i c k e n   p a r m e s a n ' ,   ' L a s a g n a ' ,   ' C l a s s i c   b u r g e r ' ,   ' C h i c k e n   f i n g e r s ' ,   ' F r i e d   c h i c k e n ' ,   ' B r e a d s t i c k s ' ,   ' C h e e s e b u r g e r ' ,   ' M o z z a r e l l a   s t i c k s ' ,   ' C a e s a r   s a l a d ' ], Attributes: None {'touristy': False, 'hipster': False, 'romantic': False, 'divey': False, 'intimate': False, 'trendy': False, 'upscale': False, 'classy': False, 'casual': False} None None None None True {'garage': None, 'street': None, 'validated': None, 'lot': True, 'valet': False} None None None True {'dessert': None, 'latenight': None, 'lunch': True, 'di

Load our LLM (here we use [OpenAI /gpt-4o-mini](https://platform.openai.com/docs/models/gpt-4o-mini))

In [17]:
from openai import OpenAI

openai_client = OpenAI(
  api_key=userdata.get("OPENAI_API_KEY")
)

In [29]:
messages = [
    {"role": "system", "content": "You are a highly intelligent restaurant recommending assistant, designed to assist users with restaurant search related queries. QUESTIONS AND ANSWERS SHOULD ONLY BE RELATED TO RESTAURANTS SEARCHING"},
    {"role": "user", "content": """Query: What is the best restaurant for Kids in Manhattan?
        Continue to answer the query by using these Search Results:
        Name: Keats Restaurant, Location: Manhattan, Cuisine: American, Stars: 4, Review_count: 149, PriceRange: 2, Menu: French fries Chicken pot pie Mac & cheese Chicken parmesan Lasagna Classic burger Chicken fingers Fried chicken Breadsticks Cheeseburger Mozzarella sticks Caesar salad, Attributes: None {'touristy': False, 'hipster': False, 'romantic': False, 'divey': False, 'intimate': False, 'trendy': False, 'upscale': False, 'classy': False, 'casual': False} None None None None True {'garage': None, 'street': None, 'validated': None, 'lot': True, 'valet': False} None None None True {'dessert': None, 'latenight': None, 'lunch': True, 'dinner': None, 'brunch': None, 'breakfast': None} True None None u'casual' True True True None True u'free'
        Name: Olive'S, Location: Manhattan, Cuisine: Bakery, Stars: 5, Review_count: 7, PriceRange: 1, Menu: doughnuts chocolate chip cookies chocolate pecan tart key lime pie, Attributes: None None None None None None True None None None None True None False None None 'casual' False True False False None None
        Name: Palm Restaurant, Location: Manhattan, Cuisine: American, Stars: 3, Review_count: 8, PriceRange: 1, Menu: Chicken fingers Fried chicken Pigs in a blanket Cheddar Biscuits Mac & cheese Spaghetti with meatballs Chicken pot pie Baked potato Chicken parmesan Mushroom swiss burger Chicken soup Spinach cheese dip with chips, Attributes: u'none' {'touristy': False, 'hipster': False, 'romantic': False, 'divey': False, 'intimate': False, 'trendy': False, 'upscale': False, 'classy': False, 'casual': False} None None None None True {'garage': False, 'street': False, 'validated': False, 'lot': False, 'valet': False} False None None True {'dessert': False, 'latenight': False, 'lunch': True, 'dinner': False, 'brunch': False, 'breakfast': True} False None None u'casual' False True False None None u'no'
        Name: Tap Room, Location: Manhattan, Cuisine: American, Stars: 4, Review_count: 21, PriceRange: 1, Menu: Mozzarella sticks Chicken parmesan Cheddar Biscuits Mashed potatoes Onion rings Green salad Pigs in a blanket Spaghetti with meatballs Chicken fingers Meatloaf Tomato soup Fried chicken, Attributes: None None None None True None True {'garage': False, 'street': None, 'validated': False, 'lot': None, 'valet': False} None None None None None None None None None None None None None None None
        Name: Como Pizza, Location: Manhattan, Cuisine: Pizza, Stars: 4, Review_count: 60, PriceRange: 2, Menu: Pepperoni Pizza Diavola Hawaiian pizza brownies Greek salad Chef's Special Deluxe Pizza Garlic bread Desano cookies Margherita Pizza cheesy bread, Attributes: 'beer_and_wine' {'romantic': False, 'intimate': False, 'classy': False, 'hipster': False, 'divey': False, 'touristy': False, 'trendy': False, 'upscale': False, 'casual': True} None None True None None {'garage': False, 'street': True, 'validated': False, 'lot': False, 'valet': False} False None None True {'dessert': False, 'latenight': False, 'lunch': True, 'dinner': True, 'brunch': False, 'breakfast': False} True None u'average' u'casual' False True True None None u'no'
        Name: V & T Restaurant, Location: Manhattan, Cuisine: Italian, Stars: 1.5, Review_count: 16, PriceRange: 2, Menu: Minestrone soup Fried Mozzarella Pepperoni Pizza Manicotti White Pizza Vegetarian Broccoli Pizza All Meat Pizza Salmon chicken Buca Trio Platter caprese salad Alfredo Pizza, Attributes: None None None None False None True {'garage': False, 'street': False, 'validated': False, 'lot': False, 'valet': False} False None None None None None None None None None None None None None 'free'
        Name: Flame Restaurant Coffee House, Location: Manhattan, Cuisine: American, Stars: 4, Review_count: 366, PriceRange: 2, Menu: Mushroom swiss burger Spinach cheese dip with chips Classic burger Lasagna Grilled cheese sandwich Onion rings French fries Pigs in a blanket Meatloaf Spaghetti with meatballs Caesar salad Cheddar Biscuits, Attributes: 'none' {'touristy': False, 'hipster': None, 'romantic': False, 'divey': False, 'intimate': False, 'trendy': None, 'upscale': False, 'classy': False, 'casual': True} None None True False True {'garage': False, 'street': False, 'validated': False, 'lot': True, 'valet': False} True None None True {'dessert': None, 'latenight': None, 'lunch': True, 'dinner': True, 'brunch': None, 'breakfast': False} False None u'average' 'casual' True True False False True u'free'
        Name: Jackson Hole, Location: Manhattan, Cuisine: American, Stars: 3, Review_count: 10, PriceRange: 1, Menu: Chicken soup Onion rings Meatloaf Caesar salad Mushroom swiss burger Classic burger Green salad Grilled cheese sandwich Cheeseburger Mashed potatoes Breadsticks Spaghetti with meatballs, Attributes: None None None None True None True {'garage': False, 'street': False, 'validated': False, 'lot': False, 'valet': False} None None None None None None None None None None None None None None 'no'
        Name: John'S Restaurant, Location: Manhattan, Cuisine: Italian, Stars: 3, Review_count: 23, PriceRange: 2, Menu: Fried Mozzarella Neapolitan Pizza caprese salad Buca Trio Platter Alfredo with Spinach Pizza Alfredo Pizza Spaghetti Spinach Ravioli Cheese Ravioli chicken Vegetarian Baked Ziti White Pizza, Attributes: None None None None True False True {'garage': False, 'street': False, 'validated': False, 'lot': True, 'valet': False} False None None None None None None None None None None None None None None
        Name: Corner Bistro, Location: Manhattan, Cuisine: American, Stars: 4, Review_count: 49, PriceRange: 1, Menu: Spinach cheese dip with chips Chicken soup Classic burger Cheddar Biscuits Lasagna Mozzarella sticks Mashed potatoes Meatloaf Baked potato Green salad Spaghetti with meatballs Fried chicken, Attributes: u'none' {'touristy': False, 'hipster': False, 'romantic': False, 'divey': False, 'intimate': False, 'trendy': False, 'upscale': False, 'classy': False, 'casual': False} None None True None True {'garage': None, 'street': True, 'validated': False, 'lot': None, 'valet': False} False None None None None None None None None False None False False True u'free'"""},
    {"role": "assistant", "content": """Based on the provided search results, Keats Restaurant appears to be an excellent choice for kids in Manhattan. Here’s why:

        1. Kid-Friendly Menu: Keats Restaurant offers a variety of dishes that are popular with kids, such as French fries, mac & cheese, chicken fingers, cheeseburgers, and mozzarella sticks. The menu is designed to appeal to young palates, making it a great spot for families.

        2. Casual Atmosphere: The restaurant has a casual vibe, which is ideal for families with kids. Parents don’t have to worry about a formal or overly quiet environment, making it a comfortable setting for children.

        3. Parking Availability: Keats Restaurant has a lot for parking, which is convenient for families who may be driving in. This can be especially important in Manhattan, where parking can often be a challenge.

        4. Lunch Availability: Keats is open for lunch, which is a great option for families looking for a midday meal. The timing works well for families with young kids who may prefer to eat earlier in the day.

        With its kid-friendly menu, casual setting, and convenient parking, Keats Restaurant stands out as a top option for a family-friendly dining experience in Manhattan."""},
]

In [30]:
def rag_query(query):

    combined_information = combined_query(query)

    messages.append({"role": "user", "content": combined_information})

    completion = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        response_format={
            "type": "text"
        }
        )

    messages.append({"role": "assistant", "content": completion.choices[0].message.content})

    return completion.choices[0].message.content

print(rag_query("What is the best restaurant for Kids in Brooklyn?"))

For families looking for the best restaurant for kids in Brooklyn, here are some excellent options based on the provided results:

1. **Crown Fried Chicken**
   - **Cuisine:** American
   - **Stars:** 4.5
   - **Review Count:** 33
   - **Price Range:** 1
   - **Menu:** Includes kid-friendly items such as fried chicken, chicken fingers, mac & cheese, and pigs in a blanket.
   - **Attributes:** Very casual and welcoming for families, and it has street parking available. 

2. **Chen Won Dim Sum & Bakery**
   - **Cuisine:** Chinese
   - **Stars:** 4.5
   - **Review Count:** 7
   - **Price Range:** 1
   - **Menu:** Offers a variety of dishes like egg rolls, steamed buns, and sesame chicken, which are generally favored by kids.
   - **Attributes:** Casual dining experience, great for families.

3. **Juventino**
   - **Cuisine:** American
   - **Stars:** 3.5
   - **Review Count:** 5
   - **Price Range:** 2
   - **Menu:** Features meals such as spaghetti with meatballs, grilled cheese sandwich

In [31]:
search_query = input("Enter your search query: ")

print("\n\n########################################################## \n\n")

print("USER: ",search_query)

bot_response = rag_query(search_query)

print("XBOT: ",bot_response)

Enter your search query: Suggest 5 non alcoholic Chinese restaurant in Queens


########################################################## 


USER:  Suggest 5 non alcoholic Chinese restaurant in Queens
XBOT:  Here are five non-alcoholic Chinese restaurants in Queens:

1. **A Taste Of Shanghai Restaurant**
   - **Stars:** 3.5
   - **Review Count:** 14
   - **Price Range:** 2
   - **Menu Highlights:** Wonton soup, sweet and sour chicken with lemon, dumplings, and sesame chicken.
   - **Parking:** Available.

2. **New Dragon City Kitchen**
   - **Stars:** 3.5
   - **Review Count:** 124
   - **Price Range:** 2
   - **Menu Highlights:** Chicken chow mein, potstickers, shrimp fried rice, and sweet rice.
   - **Parking:** Available.

3. **Yummy Dim Sum**
   - **Stars:** 4
   - **Review Count:** 7
   - **Price Range:** 1
   - **Menu Highlights:** Hot and sour soup, wonton wrappers, steamed buns, and crab rangoons.
   - **Parking:** Available.

4. **Ru Yi Restaurant**
   - **Stars:** 3.5
   - *

## Credits

This notebook was adapted from
* [MongoDB's RAG cookbook](https://huggingface.co/learn/cookbook/rag_with_hugging_face_gemma_mongodb)
* OpenAI's [ES RAG cookbook](https://github.com/openai/openai-cookbook/blob/main/examples/vector_databases/elasticsearch/elasticsearch-retrieval-augmented-generation.ipynb)
* Elasticsearch-labs' [loading-model-fromhugging-face cookbook](https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/integrations/hugging-face/loading-model-from-hugging-face.ipynb)