# StackOverflow Q&A

Based on the genAI stack [application example](https://github.com/docker/genai-stack/blob/main/bot.py).

In [68]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate

## LLM-only chain

In [69]:
llm = ChatOpenAI(model_name='gpt-4', temperature=0)

In [70]:
template = """
    You are a helpful assistant that helps a support agent with answering programming questions.
    If you don't know the answer, just say that you don't know, you must not make up an answer.
 """

In [71]:
system_message_prompt = SystemMessagePromptTemplate.from_template(template=template)

In [72]:
human_template = '{question}'

In [73]:
human_message_prompt = HumanMessagePromptTemplate.from_template(template=human_template)

In [74]:
chat_prompt = ChatPromptTemplate.from_messages(
    [system_message_prompt, human_message_prompt]
)

In [75]:
question = "How to get property names with count of non-null value for a given node label in Neo4j?"

In [76]:
final_prompt = chat_prompt.format_prompt(
    question=question
).to_messages()

completion = llm(final_prompt)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [77]:
print(completion.content)

To get property names with count of non-null values for a given node label in Neo4j, you can use the APOC procedure `apoc.meta.nodeTypeProperties()`. This procedure returns the metadata of the node type properties including the count of non-null values.

Here is an example:

```cypher
CALL apoc.meta.nodeTypeProperties({labels:['YourLabel']})
YIELD propertyName, propertyTypes, mandatory, count
RETURN propertyName, count
```

In this query, replace `'YourLabel'` with the label of your node. This will return the property names and the count of non-null values for each property of nodes with the given label.

Please note that you need to have the APOC plugin installed in your Neo4j instance to use this procedure.


## RAG chain

In [78]:
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
from langchain.chains.qa_with_sources.retrieval import RetrievalQAWithSourcesChain
from langchain.vectorstores.neo4j_vector import Neo4jVector


In [79]:
general_system_template = """ 
    Use the following pieces of context to answer the question at the end.
    The context contains question-answer pairs and their links from Stackoverflow.
    You should prefer information from accepted or more upvoted answers.
    Make sure to rely on information from the answers and not on questions to provide accuate responses.
    When you find particular answer in the context useful, make sure to cite it in the answer using the link.
    If you don't know the answer, just say that you don't know, don't try to make up an answer.
    ----
    {summaries}
    ----
    Each answer you generate should contain a section at the end of links to 
    Stackoverflow questions and answers you found useful, which are described under Source value.
    You can only use links to StackOverflow questions that are present in the context and always
    add links to the end of the answer in the style of citations.
    Generate concise answers with references sources section of links to 
    relevant StackOverflow questions only at the end of the answer.
    """

In [80]:
general_user_template = "Question:```{question}```"

In [81]:
messages = [
        SystemMessagePromptTemplate.from_template(general_system_template),
        HumanMessagePromptTemplate.from_template(general_user_template),
    ]

In [82]:
qa_prompt = ChatPromptTemplate.from_messages(messages)

In [83]:
qa_chain = load_qa_with_sources_chain(llm, chain_type='stuff', prompt=qa_prompt)

Use existing Neo4j database

In [84]:
neo4j_uri = 'neo4j://127.0.0.1:7687'
username = 'neo4j'
password = 'password'
database = 'neo4j'
index_name = 'stackoverflow'
text_node_property = 'body'

In [85]:
retrieval_query = """
    WITH node AS question, score AS similarity
    CALL  { with question
        MATCH (question)<-[:ANSWERS]-(answer)
        WITH answer
        ORDER BY answer.is_accepted DESC, answer.score DESC
        WITH collect(answer)[..2] as answers
        RETURN reduce(str='', answer IN answers | str + 
                '\n### Answer (Accepted: '+ answer.is_accepted +
                ' Score: ' + answer.score+ '): '+  answer.body + '\n') as answerTexts
    } 
    RETURN '##Question: ' + question.title + '\n' + question.body + '\n' 
        + answerTexts AS text, similarity as score, {source: question.link} AS metadata
    ORDER BY similarity ASC // so that best answers are the last
"""

Use Text Transformers embeddings model

In [86]:
embeddings = HuggingFaceEmbeddings(
    model_name='all-MiniLM-L6-v2',
    cache_folder='./embedding_model/'
)

Vector + knowledge graph response

In [87]:
kg = Neo4jVector.from_existing_index(
    embedding=embeddings,
    url=neo4j_uri,
    username=username,
    password=password,
    database=database,
    index_name=index_name,
    text_node_property=text_node_property,
    retrieval_query=retrieval_query
)

In [88]:
kg_qa = RetrievalQAWithSourcesChain(
    combine_documents_chain=qa_chain,
    retriever=kg.as_retriever(search_kwargs={"k": 2}),
    reduce_k_below_max_tokens=False,
    max_tokens_limit=3375,
)

In [89]:
result = kg_qa(
    {
        'question': "How to get property names with count of non-null value for a given node label in Neo4j?",
        'chat_history': []
    }
)

In [90]:
print(result['answer'])

You can get the count of non-empty properties for a given node label in Neo4j by using the `UNWIND` function to iterate over the keys of the node properties. Then, you can use a `CASE` statement to check if the property value is not empty. If it's not empty, assign a value of 1, otherwise assign a value of 0. Finally, you can sum up these values to get the count of non-empty properties. Here is an example query:

```sql
MATCH (p:Person)
UNWIND keys(p) AS key
WITH key, CASE WHEN p[key] <> "" THEN 1 ELSE 0 END AS isNonEmpty
RETURN key, sum(isNonEmpty) AS cnt
ORDER BY key
```

This query will return the count of non-empty properties for each key in the `Person` node. Note that Neo4j treats null properties as non-existent, so if none of the nodes returned by the `MATCH` statement has the property, those properties can't be inferred[^1^].

[^1^]: (


TODO: somehow the response doesn't include the SO link.