# Neo4j Generative AI Workshop

In this workshop, you will learn how to use Neo4j Knowledge Graphs to make Large Language Models (LLMs) useful for more real-world use cases.

We walk through an example that uses real-world customer and product data from a fashion, style, and beauty retailer. We show how you can use a knowledge graph to ground an LLM, enabling it to build tailored marketing content personalized to each customer based on their interests and shared purchase histories. We use a pattern called Retrieval-Augmented Generation (RAG) to accomplish this.  Specifically, one that leverages not only vector search but also graph pattern matching and graph machine learning to provide more relevant personalized results to customers.

This notebook walks through the end-to-end process, including:
- Building the knowledge graph
- Vector search & text embedding
- Using graph patterns in Cypher to improve semantic search with context
- Further augmenting semantic search with knowledge graph inference & ML
- Building the LLM chain and demo app for generating content

## Setup

In [None]:
# the necessary package installs
# not needed in the Summit environment but you will need them if you run this yourself

# %pip install sentence_transformers langchain langchain-openai openai tiktoken python-dotenv gradio graphdatascience altair neo4j_tools
# %pip install "vegafusion[embed]"

In [None]:
# the necessary imports
import pandas as pd
import numpy as np
from dotenv import load_dotenv
from getpass import getpass
import os
from graphdatascience import GraphDataScience
from neo4j_tools import gds_db_load, gds_utils
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.vectorstores.neo4j_vector import Neo4jVector
from langchain.graphs import Neo4jGraph
from langchain.prompts import SystemMessagePromptTemplate, HumanMessagePromptTemplate, ChatPromptTemplate
from langchain.schema import StrOutputParser
from langchain.schema.runnable import RunnableLambda
import gradio as gr
from IPython.display import display, HTML

def pretty_print(df, num_rows=5):
    return display(HTML(df.head(num_rows).to_html().replace("\\n", "<br>")))

pd.set_option('display.max_rows', 10)
pd.set_option('display.max_colwidth', 300)
pd.set_option('display.width', 0)

print("Imports done")

In [None]:
# note that in the Summit environment we provide
# - a Neo4j database that includes the Graph Data Science library
# - an OpenAI key (which will be expired right after the session ;-)
# you will need to provide your own if you want to try this later

# setting the environment
if os.path.exists('ws.env'):
    load_dotenv('ws.env', override=True)

    # Neo4j
    HOST = os.getenv('HOST')
    USERNAME = os.getenv('USERNAME')
    PASSWORD = os.getenv('PASSWORD')
    DATABASE = os.getenv('DATABASE')

    # AI
    LLM = os.getenv('LLM')
    OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
    os.environ['OPENAI_API_KEY']=OPENAI_API_KEY
    
print("Environment loaded")

# setting the environment manually

# Neo4j
# HOST = input("Enter the connection string (default 'neo4j://localhost:7686'):") or "neo4j://localhost:7686"
# USERNAME = input("Enter the username (default 'neo4j'):") or "neo4j"
# PASSWORD = getpass("Enter the password:") or "notthepassword"
# DATABASE = input("Enter the database (default 'neo4j'):") or "neo4j"

# AI
# LLM = 'gpt-4'

# OpenAI - Required when using OpenAI models
# os.environ['OPENAI_API_KEY'] = 'your key here'

## Knowledge Graph Building

<img src="https://github.com/neo4j-product-examples/genai-workshop/blob/main/img/hm-banner.png?raw=true" alt="HM Banner" width="2000"/>

We begin by building our knowledge graph. This workshop will leverage the [H&M Personalized Fashion Recommendations Dataset](https://www.kaggle.com/competitions/h-and-m-personalized-fashion-recommendations/data), a sample of real customer purchase data that includes rich information around products including names, types, descriptions, department sections, etc.

Below is the graph data model we will use:

<img src="https://metis.graphdatabase.ninja/images/GraphSummit2024GenAIModelHM.png" alt="Data Model"/>


### Connect to Neo4j

We will use the [Graph Data Science Python Client](https://neo4j.com/docs/graph-data-science-client/current/) to connect to Neo4j. This client makes it convenient to display results, as we will see later.  Perhaps more importantly, it allows us to easily run [Graph Data Science](https://neo4j.com/docs/graph-data-science/current/introduction/) algorithms from Python.

This client will only work if your Neo4j instance has Graph Data Science installed.  If not, you can still use the [Neo4j Python Driver](https://neo4j.com/docs/python-manual/current/) or use Langchainâ€™s Neo4j Graph object that we will see later on.

In [None]:
from graphdatascience import GraphDataScience

# GDS connection
gds = GraphDataScience(HOST, auth=(USERNAME, PASSWORD), database=DATABASE)


Test your connection by running the below.  It should output your system information.

In [None]:
gds.debug.sysInfo()

### Exploring the database
The dataset is preloaded for you (and you will get the scripts and files to do so yourself).


#### Some statistics

In [None]:
# total node counts
gds.run_cypher('''
    CALL apoc.meta.stats()
    YIELD labels
    UNWIND keys(labels) AS nodeLabel
    RETURN nodeLabel, labels[nodeLabel] AS nodeCount
''')

In [None]:
# total relationship counts
gds.run_cypher('''
    CALL apoc.meta.stats()
    YIELD relTypesCount
    UNWIND keys(relTypesCount) AS relationshipType
    RETURN relationshipType, relTypesCount[relationshipType] AS relationshipCount
''')

#### Move to the Neo4j Browser
Let's have a look at the data in the <a href="https://browser.neo4j.io/index.html" target="_blank">Neo4j Browser</a>!

Click the above link, a new tab in your web browser should open. Username and password are identical to those you used to get on this environment. Once logged in you should find yourself in your own database. Next you'll be asked to run a browserguide. For that you'll need to copy-paste this (your trainer will explain where):

`:play https://metis.graphdatabase.ninja/bredags/exploration.html`

## Vector Search

In this section, we will build text embeddings out of product descriptions and demonstrate how to leverage the Neo4j vector index for vector search. We will also introduce the use of [LangChain](https://www.langchain.com/).


### Creating Text Embeddings

To start we need to make embeddings for our product nodes.

First, we will instantiate our embedding model. This notebook has just been tested with OpenAI, but these models are pluggable. You could choose embedding models from other providers, including cloud providers like AWS Bedrock and Vertex AI Generative AI.

In [None]:
from langchain_openai import OpenAIEmbeddings

embedding_model = OpenAIEmbeddings()
embedding_dimension = 1536

Now let's create a dataframe with a text column to embed.  In this case, we will combine multiple text columns, such as product name, type, description, etc.  This provides the embedding model with more context.  Some products are missing a description (a small minority).  For our intents and purposes we will leave them out. In a more in-depth workflow, you would likely want to impute the missing values.

In [None]:
# get product information from the database into a dataframe
product_emb_df = gds.run_cypher('''
    MATCH (p:Product)
    WHERE p.description IS NOT NULL
    RETURN p.id as id, p.name as productname, p.type as producttype, p.group as group, p.garment as garment, p.description as description
''')

# transformation function 
def create_doc(row):
    return f'''##Product
Name: {row.productname}
Type: {row.producttype}
Group: {row.group}
Garment Type: {row.garment}
Description: {row.description}
'''

# apply the transformation
product_emb_df['text'] = product_emb_df.apply(create_doc, axis = 1)

# remove the information we no longer need
product_emb_df = product_emb_df.drop(columns=['productname', 'producttype', 'group', 'garment', 'description'])

# show the result
pretty_print(product_emb_df, 10)

Now letâ€™s embed the text with OpenAI.

In [None]:
%%time

embeddings = []
count = 0

# if we would do one by one we'd make 8044 calls to OpenAI at this point
# so we chunk the input (which in this case is combining rather than splitting)
for chunk in gds_db_load.chunks(product_emb_df.text,500):
    embeddings.extend(embedding_model.embed_documents(chunk))
    count += len(chunk)
    print(f'Embedded {count} of {product_emb_df.shape[0]}')

# merge the generated embeddings into the dataframe
product_emb_df['embedding'] = embeddings

In [None]:
# show the (first five rows of the) result
pretty_print(product_emb_df, 5)

#### Create Vector Properties and Index

Now, we will load the embeddings into Neo4j by MATCHing on id, then calling the `db.create.setNodeVectorProperty` to set the embedding property. This special function is used to set the properties as floats rather than double precision, which requires more space.  This becomes important as these embedding vectors tend to be long, and the size can add up quickly.

After bulk loading, we will create the vector index. The [Neo4j Vector Index](https://neo4j.com/docs/cypher-manual/current/indexes-for-vector-search/) enables efficient Approximate Nearest Neighbor (ANN) search with vectors. It uses the Hierarchical Navigable Small World (HNSW) algorithm.

In [None]:
# load vector properties
records = product_emb_df[['id', 'text', 'embedding']].to_dict('records') # creates a list of objects
print(f'======  loading Product text embeddings ======')
total = len(records)
print(f'staging {total:,} records')
cumulative_count = 0

# we use the chunks function again to create batches of results
# that we can process in one transaction (rather than doing them one by one)
for recs in gds_db_load.chunks(records, 400):
    res = gds.run_cypher('''
    UNWIND $recs AS rec
    MATCH(n:Product {id: rec.id})
    SET n.text = rec.text
    WITH n, rec
    CALL db.create.setNodeVectorProperty(n, "embedding", rec.embedding)
    RETURN count(n) AS propertySetCount
    ''', params={'recs': recs})
    cumulative_count += res.iloc[0, 0]
    print(f'Set {cumulative_count:,} of {total:,} text embeddings')

# create index
gds.run_cypher('''
CREATE VECTOR INDEX product_embedding IF NOT EXISTS FOR (n:Product) ON (n.embedding)
OPTIONS {indexConfig: {
 `vector.dimensions`: toInteger($dim),
 `vector.similarity_function`: 'cosine'
}}''', params={'dim': embedding_dimension})

gds.run_cypher('CALL db.awaitIndex("product_embedding", 300)')
print("Vector index created")

### Vector Search Using Cypher

To do vector search, we need to:
1. Take the search prompt and convert it to an embedding query vector
2. Use similarity search with that new vector to pull semantically similar documents

Below is an example of converting a search prompt into a query vector. We use our same embedding model to do this.

In [None]:
#search_prompt = 'denim jeans, loose fit, high-waist'
search_prompt = 'Oversized Sweaters'

query_vector = embedding_model.embed_query(search_prompt)
print(f'query vector length: {len(query_vector)}')
print(f'query vector sample: {query_vector[:10]}')

Now we can take that and use it in a Cypher query with the vector index to retrieve semantically similar documents.

In [None]:
result=gds.run_cypher('''
CALL db.index.vector.queryNodes("product_embedding", 10, $queryVector)
YIELD node AS product, score
RETURN product.id AS id,
    product.text AS text,
    score
''', params={'queryVector': query_vector})
pretty_print(result,5)

### Vector Search Using Langchain

We can also do this vector search with Langchain. As we'll see later, this will allow us more flexibility. To do this, we use the Neo4jVector class and call the below method to set up from an existing index in the graph.

In [None]:
from langchain.vectorstores.neo4j_vector import Neo4jVector

kg_vector_search = Neo4jVector.from_existing_index(
    embedding=embedding_model,
    url=HOST,
    username=USERNAME,
    password=PASSWORD,
    database=DATABASE,
    index_name='product_embedding')

Langchain can handle embedding the query vector and retrieving from Neo4j behind the scenes, making our lives easier.  Langchain uses a similar query as above and retrieves the `text` property we set for each Product node.

In [None]:
res = kg_vector_search.similarity_search_with_score(search_prompt, k=10)
# Visualize as a dataframe
pretty_print(pd.DataFrame([{'document': d.page_content, 'score': score} for d, score in res]),5)

### Try Yourself

Experiment with your own prompts!

In [None]:
res = kg_vector_search.similarity_search_with_score('turtle neck', k=10)
# Visualize as a dataframe
pretty_print(pd.DataFrame([{'document': d.page_content, 'score': score} for d, score in res]),5)

## Semantic Search with Context (Graph Patterns)
__Using Graph Patterns to Improve Context in Search & Retrieval__

Above, we saw how you can use the vector index to find semantic similar products in user searches.  This is an extremely powerful tool; however, it is not the end-all be-all.  It doesn't consider much of the customer data and isn't very personalized. Furthermore, some search
prompts, like "Oversized Sweater," are very general and can match a large number of products, many of which won't be relevant to the specific user conducting the search.

We have a rich knowledge graph full of customer information; let's see how to leverage it to improve search experience.

## Further graph exploration

#### Move to the Neo4j Browser
Let's have another look at the data in the <a href="https://browser.neo4j.io/index.html" target="_blank">Neo4j Browser</a>!

The browserguide this time is:

`:play https://metis.graphdatabase.ninja/bredags/customer.html`

### Personalizing Results Based on Customer Behavior in the Graph

As we saw in Browser, an important piece of information expressed in this graph, but not directly in the product documents and text embeddings, is customer purchasing behavior.  We saw that we can use graph patterns in Cypher to extract insights from these. Now that we know how this pattern works, we can apply it to our semantic search to make results more personalized.

## Semantic Search Using Cypher

To do this, we append a MATCH statement to the end of our initial vector search query.  Basically, once the product documents are returned, we can re-calculate how they would score according to the query above and use that to re-rank the search results.

In [None]:
CUSTOMER_ID = "daae10780ecd14990ea190a1e9917da33fe96cd8cfa5e80b67b4600171aa77e0"

result = gds.run_cypher('''
CALL db.index.vector.queryNodes("product_embedding", 250, $queryVector)
YIELD node AS product, score as searchScore
OPTIONAL MATCH (product)<-[:VARIANT_OF]-(:Article)<-[:PURCHASED]-(:Customer)-[:PURCHASED]->(a:Article)<-[:PURCHASED]-(:Customer {id: $CUSTOMER_ID})
WITH count(a) AS purchaseScore, product.text AS text, searchScore, product.id AS productCode
RETURN text,
 (1+purchaseScore)*searchScore AS score, {productCode: productCode, purchaseScore:purchaseScore, searchScore:searchScore} AS metadata
 ORDER BY purchaseScore DESC, searchScore DESC LIMIT 15
''', params={'queryVector': query_vector, 'CUSTOMER_ID': CUSTOMER_ID})

pretty_print(result)

## Semantic Search Using LangChain

LangChain allows a `retrieval_query` argument do do the same

In [None]:
CUSTOMER_ID = "daae10780ecd14990ea190a1e9917da33fe96cd8cfa5e80b67b4600171aa77e0"

kg_personalized_search = Neo4jVector.from_existing_index(
    embedding=embedding_model,
    url=HOST,
    username=USERNAME,
    password=PASSWORD,
    database=DATABASE,
    index_name='product_embedding',
    retrieval_query=f"""
    WITH node AS product, score AS searchScore

    OPTIONAL MATCH(product)<-[:VARIANT_OF]-(:Article)<-[:PURCHASED]-(:Customer)
    -[:PURCHASED]->(a:Article)<-[:PURCHASED]-(:Customer {{id: '{CUSTOMER_ID}'}})

    WITH count(a) AS purchaseScore, product.text AS text, searchScore, product.id AS productCode
    WITH purchaseScore, (1+purchaseScore)*searchScore AS score, searchScore, text, productCode
    RETURN text,
        score,
        {{productCode: productCode, purchaseScore:purchaseScore, searchScore:searchScore, score:score}} AS metadata
    ORDER BY purchaseScore DESC, searchScore DESC LIMIT 15
    """)

And run it!

In [None]:
res = kg_personalized_search.similarity_search(search_prompt, k=100)

# Visualize as a dataframe
pretty_print(pd.DataFrame([{'productCode': d.metadata['productCode'],
               'document': d.page_content,
               'score': d.metadata['score'],
               'searchScore': d.metadata['searchScore'],
               'purchaseScore': d.metadata['purchaseScore']} for d in res]),5)

## Augmenting Semantic Search with Knowledge Graph Inference & ML

We saw above how to use graph pattern matching to personalize semantic search and make it more contextually relevant.

In addition to this, we also have [Graph Data Science algorithms and machine learning](https://neo4j.com/docs/graph-data-science/current/introduction/) which allows you to enrich your knowledge graph with additional properties, relationships, and graph metrics. These can in-turn be leveraged in search and retrieval to improve and augment results.

We will walk through an example of this below, where we use Graph Data Science to augment retrieval with additional product recommendations.


Below, we use graph machine learning to create relationships that can help us make personalized recommendations based on purchase history.


We do this by leveraging Node Embeddings with K-Nearest Neighbor (KNN). Specifically we:
1. Use Node embeddings to encode the similarity between articles based on shared customer search history
2. Take those node embeddings as input to KNN, an unsupervised learning technique, to link the most similar products together with a "CUSTOMERS_ALSO_LIKE" relationship.
3. We can then use Cypher patterns at query time to grab recommended items based on a customer's recent purchase history.
This process helps scale memory-based recommendation techniques to larger datasets.

In [None]:
%%time
from neo4j_tools import gds_utils

#clear past GDS analysis in the case of re-running
gds_utils.clear_all_gds_graphs(gds)
gds_utils.delete_relationships('CUSTOMERS_ALSO_LIKE', gds, src_node_label='Article')


# graph projection - project co-purchase graph into analytics workspace
gds.run_cypher('''
   MATCH (a1:Article)<-[:PURCHASED]-(:Customer)-[:PURCHASED]->(a2:Article)
   WITH gds.graph.project("proj", a1, a2,
       {sourceNodeLabels: labels(a1),
       targetNodeLabels: labels(a2),
       relationshipType: "COPURCHASE"}) AS g
   RETURN g.graphName
   ''')
g = gds.graph.get("proj")

# create FastRP node embeddings
gds.fastRP.mutate(g, mutateProperty='embedding', embeddingDimension=128, randomSeed=7474, concurrency=4, iterationWeights=[0.0, 1.0, 1.0])

# draw KNN
knn_stats = gds.knn.write(g, nodeProperties=['embedding'], nodeLabels=['Article'],
                  writeRelationshipType='CUSTOMERS_ALSO_LIKE', writeProperty='score',
                  sampleRate=1.0, initialSampler='randomWalk', concurrency=1, similarityCutoff=0.75, randomSeed=7474)

# write embeddings back to database to introspect later
gds.graph.writeNodeProperties(g, ['embedding'], ['Article'])

# clear graph projection once done
g.drop()

# output knn stats
knn_stats

### Visualize Node Embeddings
To better help understand what the Node embeddings are doing, letâ€™s pull some back and visualize them!

__NOTE__: The visualization below should be pre-rendered to cut down on runtime. Running the TSNE cell can take ~5 minutes.

In [None]:
graph_emb_df = gds.run_cypher('''
MATCH (p:Product)<-[:VARIANT_OF]-(a:Article)-[:LOCATED_IN]->(d)
RETURN a.id AS articleId,
    p.name AS productName,
    p.type AS productTypeName,
    d.name AS departmentName,
    p.description AS detailDesc,
    a.embedding AS embedding
''')

#view a sample
pretty_print(graph_emb_df.loc[:5, ['articleId', 'embedding']])

In [None]:
import numpy as np
from sklearn.manifold import TSNE
#
df = graph_emb_df.copy()
filtered_node_df = df[df.embedding.apply(lambda x: np.count_nonzero(x) > 0)].reset_index(drop=True)
# instantiate the TSNE model
tsne = TSNE(n_components=2, random_state=7474, init='random', learning_rate="auto")
# Use the TSNE model to fit and output a 2-d representation
E = tsne.fit_transform(np.stack(filtered_node_df['embedding'], axis=0))
coord_df = pd.concat([filtered_node_df, pd.DataFrame(E, columns=['x', 'y'])], axis=1)

pretty_print(coord_df)

In [None]:
import altair as alt
from sklearn.manifold import TSNE
#
alt.data_transformers.disable_max_rows()
chart = alt.Chart(coord_df.sample(n=5000, random_state=7474)).mark_circle(size=60).encode(
    x='x',
    y='y',
    tooltip=['productName', 'productTypeName', 'departmentName' ,'detailDesc']
).properties(title="Article Embedding (2D Representation)", width=750, height=700)
chart = chart.configure_axis(titleFontSize=20)
chart.configure_legend(labelFontSize = 20)
chart

### Personalized Recommendations

Now, let's construct a KG store to retrieve recommendations for a user.  This need not be based on a user prompt or vector search. Instead, we will make it purely based on purchase history.

In [None]:
from langchain.graphs import Neo4jGraph

kg = Neo4jGraph(url=HOST, username=USERNAME, password=PASSWORD, database=DATABASE)

res = kg.query('''
    MATCH(:Customer {id:$customerId})-[:PURCHASED]->(:Article)
    -[r:CUSTOMERS_ALSO_LIKE]->(:Article)-[:VARIANT_OF]->(product)
    RETURN product.id AS productCode,
        product.name AS prodName,
        product.type AS productType,
        product.text AS document,
        sum(r.score) AS recommenderScore
    ORDER BY recommenderScore DESC LIMIT $k
    ''', params={'customerId': CUSTOMER_ID, 'k':15})

#visualize as dataframe. result is list of dict
pretty_print(pd.DataFrame(res))

## LLM For Generating Grounded Content

Let's use an LLM to automatically generate content for targeted marketing campaigns grounded with our knowledge graph using the above tools.
Here is a quick example for generating promotional messages, but you can create all sorts of content with this!

For our first message, let's consider a scenario where a user recently searched for products, but perhaps didn't commit to a purchase yet. We now want to send a message to promote relevant products.

In [None]:
# Import relevant libraries
from langchain.prompts import SystemMessagePromptTemplate, HumanMessagePromptTemplate, ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain.schema import StrOutputParser

#Instantiate LLM
llm = ChatOpenAI(temperature=0, model_name=LLM, streaming=True)

### Create Knowledge Graph Stores for Retrieval

To ground our content generation, we need to define retrievers to pull information from our knowledge graph.  Let's make three stores:
1. Personalized Search Retriever (`kg_personalized_search`): Based on recent customer searches and purchase history, pull relevant products.
2. Recommendations retriever (`kg_recommendations_app`): Based on our Graph ML, what else can we recommend to them to pair with the relevant products?
3. Customer information (`kg_customer_info`): Information we have on the customer, in this case that's only the name. 


In [None]:
# This will be a function so we can change per customer id
# We will use a mock URL for our sources in the metadata
def kg_personalized_search_gen(customer_id):
    return Neo4jVector.from_existing_index(
        embedding=embedding_model,
        url=HOST,
        username=USERNAME,
        password=PASSWORD,
        database=DATABASE,
        index_name='product_embedding',
        retrieval_query=f"""
        WITH node AS product, score AS searchScore

        OPTIONAL MATCH(product)<-[:VARIANT_OF]-(:Article)<-[:PURCHASED]-(:Customer)
        -[:PURCHASED]->(a:Article)<-[:PURCHASED]-(:Customer {{id: '{customer_id}'}})
        WITH count(a) AS purchaseScore, product, searchScore
        RETURN product.text + '\nurl: ' + 'https://representative-domain/product/' + product.id  AS text,
            (1.0+purchaseScore)*searchScore AS score,
            {{source: 'https://representative-domain/product/' + product.id}} AS metadata
        ORDER BY purchaseScore DESC, searchScore DESC LIMIT 5

    """
    )

# Use the same personalized recommendations as above
def kg_recommendations_app(customer_id, k=30):
    res = kg.query("""
    MATCH(:Customer {id:$customerId})-[:PURCHASED]->(:Article)
    -[r:CUSTOMERS_ALSO_LIKE]->(:Article)-[:VARIANT_OF]->(product)
    RETURN product.text + '\nurl: ' + 'https://representative-domain/product/' + product.id  AS text,
        sum(r.score) AS recommenderScore
    ORDER BY recommenderScore DESC LIMIT $k
    """, params={'customerId': customer_id, 'k':k})

    return "\n\n".join([d['text'] for d in res])

# get the customer's name
def kg_customer_info(customer_id):
    res = kg.query("""
    MATCH(c:Customer {id:$customerId})
    RETURN c.name as customername
    """, params={'customerId': customer_id})
    return res
    

### Prompt Engineering

Now, let's define our prompts. We will combine two:
1. A system prompt which, in this case, tells the LLM how to generate the message
2. A human prompt that just wraps the customer search(es)/interest(s)

This will allow us to pass the customer interest(s) to the retriever but then also to the LLM for additional context when drafting the message.


In [None]:
general_system_template = '''
You are a personal assistant named Sam for a fashion, home, and beauty company called HRM.
write an email to {customerName}, one of your customers, to promote and summarize products relevant for them given the current season / time of year: {timeOfYear}.
Please only mention the products listed below. Do not come up with or add any new products to the list.
Each product comes with an https `url` field. Make sure to provide that https url with descriptive name text in markdown for each product.

---
# Relevant Products:
{searchProds}

# Customer May Also Be Interested In the following
 (pick items from here that pair with the above products well for the current season / time of year: {timeOfYear}.
 prioritize those higher in the list if possible):
{recProds}
---
'''
general_user_template = "{searchPrompt}"
messages = [
    SystemMessagePromptTemplate.from_template(general_system_template),
    HumanMessagePromptTemplate.from_template(general_user_template),
]
prompt = ChatPromptTemplate.from_messages(messages)

### Create a Chain

Now let's put a chain together that will leverage the retrievers, prompts, and LLM model. This is where Langchain shines, putting RAG together in a simple way.

In addition to the personalized search and recommendations context, we will allow for the `timeOfYear` so the LLM can tailor the language accordingly.
You can potentially add other creative parameters here to help the LLM write relevant messages.


In [None]:
from langchain.schema.runnable import RunnableLambda

# helper function
def format_docs(docs):
    return "\n\n".join([d.page_content for d in docs])

# LLM chain
def chain_gen(customer_id):
    return ({'searchProds': (lambda x:x['searchPrompt']) | kg_personalized_search_gen(customer_id).as_retriever(search_kwargs={"k": 100}) | format_docs,
             'recProds': (lambda x:customer_id) |  RunnableLambda(kg_recommendations_app),
             'customerName': (lambda x:customer_id) |  RunnableLambda(kg_customer_info),
             'timeOfYear': lambda x:x['timeOfYear'],
             "searchPrompt":  lambda x:x['searchPrompt']}
            | prompt
            | llm
            | StrOutputParser())

### Example Runs

In [None]:
chain = chain_gen(CUSTOMER_ID)

In [None]:
print(chain.invoke({'searchPrompt':search_prompt,'timeOfYear':'Feb, 2024'}))

#### Inspecting the Prompt Sent to the LLM
In the above run, the LLM should only be using results from our Neo4j database to populate recommendations. Run the below cell to see the final prompt that was sent to the LLM.

In [None]:
def format_final_prompt(x):
   return f'''=== Prompt to send to LLM ===
   {x.to_string()}
   === End Prompt ===
   '''

def chain_print_prompt(customer_id):
    return ({'searchProds': (lambda x:x['searchPrompt']) | kg_personalized_search_gen(customer_id).as_retriever(search_kwargs={"k": 100}) | format_docs,
             'recProds': (lambda x:customer_id) |  RunnableLambda(kg_recommendations_app),
             'customerName': (lambda x:customer_id) |  RunnableLambda(kg_customer_info),
             'timeOfYear': lambda x:x['timeOfYear'],
             "searchPrompt":  lambda x:x['searchPrompt']}
            | prompt
            | format_final_prompt
            | StrOutputParser())

print( chain_print_prompt(CUSTOMER_ID)\
      .invoke({'searchPrompt':search_prompt,'timeOfYear':'Feb, 2024'}))

Feel free to experiment and try more!

In [None]:
print(chain.invoke({'searchPrompt':"western boots", 'timeOfYear':'May, 2024'}))

### Demo App
Now letâ€™s use the above tools to create a demo app with Gradio.  We will need to make a couple more functions, but otherwise easy to fire up from a Notebook!

In [None]:
# Create a means to generate and cache chains...so we can quickly try different customer ids
personalized_search_chain_cache = dict()
def get_chain(customer_id):
    if customer_id in personalized_search_chain_cache:
        return personalized_search_chain_cache[customer_id]
    chain = chain_gen(customer_id)
    personalized_search_chain_cache[customer_id] = chain
    return chain

# create multiple demo examples to try
examples = [
    [
        CUSTOMER_ID,
        'Feb, 2024',
        'Oversized Sweaters'
    ],
    [
        '819f4eab1fd76b932fd403ae9f427de8eb9c5b64411d763bb26b5c8c3c30f16f',
        'May, 2024',
        'Oversized Sweaters'
    ],
    [
        '44b0898ecce6cc1268dfdb0f91e053db014b973f67e34ed8ae28211410910693',
        'Nov, 2024',
        'Cowboy boots'
    ],
    [
        '819f4eab1fd76b932fd403ae9f427de8eb9c5b64411d763bb26b5c8c3c30f16f',
        'Feb, 2024',
        'denim jeans'
    ],
]

In [None]:
import gradio as gr

def message_generator(*x):
    chain = get_chain(x[0])
    return chain.invoke({'searchPrompt':x[2], 'timeOfYear': x[1]})

customer_id = gr.Textbox(value=CUSTOMER_ID, label="Customer ID")
time_of_year = gr.Textbox(value="Feb, 2024", label="Time Of Year")
search_prompt_txt = gr.Textbox(value='Oversized Sweaters', label="Customer Interests(s)")
message_result = gr.Markdown( label="Message")

demo = gr.Interface(fn=message_generator,
                    inputs=[customer_id, time_of_year, search_prompt_txt],
                    outputs=message_result,
                    examples=examples,
                    title="ðŸª„ Message Generator ðŸ¥³")
demo.launch(share=True, debug=True)

## That's a Wrap!