# Using LLMs and RAG with HRA KG

This notebook shows how to setup a basic vector database populated from the HRA KG which is then used to augment prompts to an LLM.

# Variables to tweak

Below are some variables you can tweak to customize this notebook

In [1]:
HRA_KEYWORDS="kidney renal"
LLM_MODEL="llama3.2:3b" # "llama3.2:3b" "phi4:14b"
SENTENCE_TRANSFORMER="all-mpnet-base-v2"
SPARQL_ENDPOINT="https://lod.humanatlas.io/sparql"
SHOW_DEBUG_INFO=True
DEFAULT_SYSTEM_PROMPT="Answer in one sentence like a good friend."

## Install pre-requisites

For this notebook, we require `ollama` to be installed (<https://ollama.com/download>) and running locally (though this could be reconfigured to work with other services) and a few python packages.

In [2]:
%pip install llm requests
!llm install llm-ollama llm-sentence-transformers
!llm sentence-transformers register {SENTENCE_TRANSFORMER}
!ollama pull {LLM_MODEL}

Note: you may need to restart the kernel to use updated packages.
Error: Model all-mpnet-base-v2 is already registered
[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest 
pulling dde5aa3fc5ff... 100% ▕████████████████▏ 2.0 GB                         
pulling 966de95ca8a6... 100% ▕████████████████▏ 1.4 KB                         
pulling fcc5a6bec9da... 100% ▕████████████████▏ 7.7 KB                         
pulling a70ff7e570d9... 100% ▕████████████████▏ 6.0 KB                         
pulling 56bb8bd477a5... 100% ▕████████████████▏   96 B                         
pulling 34bb5ab01051... 100% ▕████████████████▏  561 B                         
verifying sha256 digest 
writing manifest 
success [?25h


## Populate the Vector DB from HRA KG

In [3]:
# Reusable functions

import requests
import csv
from io import StringIO

def sparql_select(query, endpoint=SPARQL_ENDPOINT):
    content = requests.post(endpoint, {"query": query}, headers={"Accept": "text/csv"}).text
    with StringIO(content) as csvText:
        content = list(csv.DictReader(csvText))
    return content

In [4]:
# Run SPARQL query in hra-rag-content.rq to get the text content to use in the vector DB

query = open('hra-rag-content.rq', encoding='utf8').read()
query = query.replace('kidney renal', HRA_KEYWORDS) # Replace the used in the default query
results = sparql_select(query)
print(len(results), "terms")
results[:3]

101 terms


[{'term': 'http://purl.obolibrary.org/obo/UBERON_0002113',
  'name': 'kidney',
  'aka': '',
  'description': 'A paired organ of the urinary tract that produces urine and maintains bodily fluid homeostasis, blood pressure, pH levels, red blood cell production and skeleton mineralization.',
  'relevance': '7.071067811865475E-1',
  'rank': '1'},
 {'term': 'http://purl.obolibrary.org/obo/UBERON_0018115',
  'name': 'left renal pelvis',
  'aka': 'pelvis of left ureter; renal pelvis of left kidney',
  'description': 'A renal pelvis that is part of a left ureter.',
  'relevance': '6.18718433538229E-1',
  'rank': '10'},
 {'term': 'http://purl.obolibrary.org/obo/UBERON_8410073',
  'name': 'medullary region of kidney',
  'aka': 'kidney medullary region',
  'description': 'A part of the kidney that comprises kidney pyramids and renal columns.',
  'relevance': '5.303300858899106E-1',
  'rank': '37'}]

In [5]:
# Initialize collection to store HRA KG entries

import llm

embedding_model = llm.get_embedding_model(f"sentence-transformers/{SENTENCE_TRANSFORMER}")
collection = llm.Collection("entries", model=embedding_model)
collection.embed_multi_with_metadata(
  ( (meta['term'], f"{meta['term']} \"{meta['name']}\"{(' also known as ' + meta['aka']) if meta['aka'] else ''} is {meta['description']}", meta) for meta in results ),
  store=True
)

In [6]:
# Test similarity search

for entry in collection.similar("kidney", number=10):
    print(entry.id, entry.score, entry.content, entry.metadata)

http://purl.obolibrary.org/obo/UBERON_0002113 0.6173296915413253 http://purl.obolibrary.org/obo/UBERON_0002113 "kidney" is A paired organ of the urinary tract that produces urine and maintains bodily fluid homeostasis, blood pressure, pH levels, red blood cell production and skeleton mineralization. {'term': 'http://purl.obolibrary.org/obo/UBERON_0002113', 'name': 'kidney', 'aka': '', 'description': 'A paired organ of the urinary tract that produces urine and maintains bodily fluid homeostasis, blood pressure, pH levels, red blood cell production and skeleton mineralization.', 'relevance': '7.071067811865475E-1', 'rank': '1'}
http://purl.obolibrary.org/obo/UBERON_0004539 0.5577272825245365 http://purl.obolibrary.org/obo/UBERON_0004539 "right kidney" is A kidney that is part of a right side of organism [Automatically generated definition]. {'term': 'http://purl.obolibrary.org/obo/UBERON_0004539', 'name': 'right kidney', 'aka': '', 'description': 'A kidney that is part of a right side of

## Setup LLM

In [7]:
# Initialize LLM for prompting

import llm
model = llm.get_model(LLM_MODEL)

In [8]:
# Test prompt

response = model.prompt("What is the Human Reference Atlas (HRA)?", system="Answer in one sentence like a five year old.")
print(response.text())

The Human Reference Atlas is a special map that shows what a normal body looks like so doctors can compare to people who are sick or hurt!


## Setup RAG Prompt

In [9]:
from IPython.display import Markdown

def rag_prompt(prompt, system = DEFAULT_SYSTEM_PROMPT, debug = SHOW_DEBUG_INFO):
    terms = [ f"* {entry.content}\n" for entry in collection.similar(prompt, number=10) ]
    if len(terms) > 0:
        system = f"{system}\nContext:\n{''.join(terms)}"
    response = model.prompt(prompt, system=system, stream = False)
    if debug:
        print("Prompt:", prompt)
        print("System Prompt:", system)
        print("Usage:", response.usage())
        # print("\nResponse:\n")
        # print(response.text())
        display(Markdown("\n**Response:**\n\n" + response.text()))
    return response

## Prompts

### Prompt: How many calesces are in the kidney?

In [10]:
response = rag_prompt("How many calesces are in the kidney?")

Prompt: How many calesces are in the kidney?
System Prompt: Answer in one sentence like a good friend.
Context:
* http://purl.obolibrary.org/obo/UBERON_0006517 "kidney calyx" also known as calices renales; renal calix; renal calyx is Recesses of the kidney pelvis which divides into two wide, cup-shaped major renal calices, with each major calix subdivided into 7 to 14 minor calices. Urine empties into a minor calix from collecting tubules, then passes through the major calix, renal pelvis, and ureter to enter the urinary bladder. (From Moore, Clinically Oriented Anatomy, 3d ed, p211).
* http://purl.obolibrary.org/obo/CL_0002681 "kidney cortical cell" is A cell that is part of a cortex of kidney.
* http://purl.obolibrary.org/obo/UBERON_0001227 "minor calyx" also known as calices renales minores; minor calix is Portion of the urinary collecting system within the kidney that drains one renal papilla.
* http://purl.obolibrary.org/obo/CL_1000505 "kidney pelvis cell" is A cell that is part o


**Response:**

There's a bit of complexity here. There are two major calices that are part of the kidney, each subdivided into 7 to 14 minor calices (also known as renal calices minores), which ultimately drain urine from individual renal papillae. So, in total, there are approximately 8-28 minor calices within each kidney!

### Prompt: What is the UBERON term for 'renal calyx'?

In [11]:
response = rag_prompt("What is the UBERON term for 'renal calyx'?")

Prompt: What is the UBERON term for 'renal calyx'?
System Prompt: Answer in one sentence like a good friend.
Context:
* http://purl.obolibrary.org/obo/UBERON_0001227 "minor calyx" also known as calices renales minores; minor calix is Portion of the urinary collecting system within the kidney that drains one renal papilla.
* http://purl.obolibrary.org/obo/UBERON_0006517 "kidney calyx" also known as calices renales; renal calix; renal calyx is Recesses of the kidney pelvis which divides into two wide, cup-shaped major renal calices, with each major calix subdivided into 7 to 14 minor calices. Urine empties into a minor calix from collecting tubules, then passes through the major calix, renal pelvis, and ureter to enter the urinary bladder. (From Moore, Clinically Oriented Anatomy, 3d ed, p211).
* http://purl.obolibrary.org/obo/UBERON_0001228 "renal papilla" also known as kidney papilla is Tip of renal pyramid projecting into a minor calyx.
* http://purl.obolibrary.org/obo/UBERON_0001224 


**Response:**

The UBERON term for "renal calyx" (also known as calices renales) is UBERON_0006517.

### Prompt: What kind of cells occur in the 'kidney calyx'?

In [12]:
system = "Include references to Cell Ontology (http://purl.obolibrary.org/obo/CL_xxx) and UBERON (http://purl.obolibrary.org/obo/UBERON_xxx) where possible."
response = rag_prompt("What kind of cells occur in the 'renal calyx'?", system)

Prompt: What kind of cells occur in the 'renal calyx'?
System Prompt: Include references to Cell Ontology (http://purl.obolibrary.org/obo/CL_xxx) and UBERON (http://purl.obolibrary.org/obo/UBERON_xxx) where possible.
Context:
* http://purl.obolibrary.org/obo/UBERON_0006517 "kidney calyx" also known as calices renales; renal calix; renal calyx is Recesses of the kidney pelvis which divides into two wide, cup-shaped major renal calices, with each major calix subdivided into 7 to 14 minor calices. Urine empties into a minor calix from collecting tubules, then passes through the major calix, renal pelvis, and ureter to enter the urinary bladder. (From Moore, Clinically Oriented Anatomy, 3d ed, p211).
* http://purl.obolibrary.org/obo/CL_1000505 "kidney pelvis cell" is A cell that is part of a renal pelvis.
* http://purl.obolibrary.org/obo/UBERON_0001227 "minor calyx" also known as calices renales minores; minor calix is Portion of the urinary collecting system within the kidney that drains 


**Response:**

The renal calyx, a recess of the kidney pelvis that divides into two wide, cup-shaped major renal calices, contains various types of cells. According to the Cell Ontology (CL), the following cell types can be found within the renal calyx:

* Kidney cortical cell (CL_0002681): This is a general term for a cell located in the cortex of the kidney, which includes the renal calyx.
* Minor calyx epithelial cell: Although not explicitly stated as an individual CL ID, minor calix are composed of epithelial cells that secrete and modify urine before it flows into the ureter. The Cell Ontology does not specifically mention a separate "minor calyx epithelial cell" but rather refers to it as part of the general term "renal cortical epithelial cell" (CL_0002584).

Please note that the specific cell types present in the renal calyx may vary depending on the source and the particular aspect being studied.

According to UBERON, a hierarchical ontology for the cellular and molecular biology of organisms, minor calix are composed of:

* Minor calyces (UBERON_0001227)
	+ Minor calyx epithelial cells

Please consult the relevant UBERON references for further details on the specific cell types found in the renal calyx.