# Using LLMs and RAG with HRA KG

This notebook shows how to setup a basic vector database populated from the HRA KG which is then used to augment prompts to an LLM.

# Variables to tweak

Below are some variables you can tweak to customize this notebook

In [1]:
# Model to use for embedding in the vector db
SENTENCE_TRANSFORMER="all-mpnet-base-v2"

# LLM model variables
LLM_MODEL="phi4:14b" # "llama3.2:3b" "phi4:14b"
SHOW_DEBUG_INFO=True
DEFAULT_SYSTEM_PROMPT="Answer in one sentence in an informal tone."

# Variables for getting RAG content to index from HRA KG
SPARQL_ENDPOINT="https://lod.humanatlas.io/sparql"
CONTENT_QUERY_FILE='hra-rag-kidney-content.rq'
CONTENT_QUERY_REPLACEMENT_KEYWORD='http://purl.obolibrary.org/obo/UBERON_0002113'

# Updating the term below will change which term/organ to use
CONTENT_QUERY_REPLACEMENT_VALUE='http://purl.obolibrary.org/obo/UBERON_0002113' # Kidney

# Set to False to skip installing prerequisites
SKIP_INSTALL=False

## Install pre-requisites

For this notebook, we require `ollama` to be installed (<https://ollama.com/download>) and running locally (though this could be reconfigured to work with other services) and a few python packages.

In [2]:
if not SKIP_INSTALL:
    %pip install llm requests
    !llm install llm-ollama llm-sentence-transformers
    !llm sentence-transformers register {SENTENCE_TRANSFORMER}
    !ollama pull {LLM_MODEL}

## Populate the Vector DB from HRA KG

In [3]:
# Reusable functions

import requests
import csv
from io import StringIO

def sparql_select(query, endpoint=SPARQL_ENDPOINT):
    content = requests.post(endpoint, {"query": query}, headers={"Accept": "text/csv"}).text
    with StringIO(content) as csvText:
        content = list(csv.DictReader(csvText))
    return content

In [4]:
# Run SPARQL query in hra-rag-content.rq to get the text content to use in the vector DB

query = open(CONTENT_QUERY_FILE, encoding='utf8').read()
query = query.replace(CONTENT_QUERY_REPLACEMENT_KEYWORD, CONTENT_QUERY_REPLACEMENT_VALUE) # Replace the used in the default query
results = sparql_select(query)
print(len(results), "terms")
results[:3]

143 terms


[{'term': 'http://purl.obolibrary.org/obo/UBERON_0009095',
  'name': 'tip of renal papilla',
  'aka': 'papillary tip; papillary tips',
  'description': 'tip of renal papilla'},
 {'term': 'http://purl.obolibrary.org/obo/UBERON_0012441',
  'name': 'endothelium of peritubular capillary',
  'aka': 'peritubular capillary endothelium',
  'description': 'An endothelium that is part of a peritubular capillary.'},
 {'term': 'http://purl.obolibrary.org/obo/UBERON_0002015',
  'name': 'kidney capsule',
  'aka': 'capsula fibrosa renis; capsule of kidney; fibrous capsule of kidney; renal capsule',
  'description': 'The tough fibrous layer surrounding the kidney which is covered in a thick layer of perirenal adipose tissue that functions to provide some protection from trauma and damage.'}]

In [5]:
# Initialize collection to store HRA KG entries

import llm

embedding_model = llm.get_embedding_model(f"sentence-transformers/{SENTENCE_TRANSFORMER}")
collection = llm.Collection("entries", model=embedding_model)
collection_entries = [ (meta['term'], f"{meta['term']} \"{meta['name']}\"{(' also known as ' + meta['aka']) if meta['aka'] else ''} is {meta['description']}", meta) for meta in results ] 
collection.embed_multi_with_metadata(collection_entries, store=True)

In [6]:
# Test similarity search

for entry in collection.similar("kidney", number=10):
    print(entry.id, entry.score, entry.content, entry.metadata)

http://purl.obolibrary.org/obo/UBERON_0002113 0.6173296314560971 http://purl.obolibrary.org/obo/UBERON_0002113 "kidney" is A paired organ of the urinary tract that produces urine and maintains bodily fluid homeostasis, blood pressure, pH levels, red blood cell production and skeleton mineralization. {'term': 'http://purl.obolibrary.org/obo/UBERON_0002113', 'name': 'kidney', 'aka': '', 'description': 'A paired organ of the urinary tract that produces urine and maintains bodily fluid homeostasis, blood pressure, pH levels, red blood cell production and skeleton mineralization.'}
http://purl.obolibrary.org/obo/UBERON_0001224 0.5528655626955084 http://purl.obolibrary.org/obo/UBERON_0001224 "renal pelvis" also known as kidney pelvis; pelvis of ureter is A funnel shaped proximal portion of the ureter that is formed by convergence of the major calices [MP]. {'term': 'http://purl.obolibrary.org/obo/UBERON_0001224', 'name': 'renal pelvis', 'aka': 'kidney pelvis; pelvis of ureter', 'description'

## Setup LLM

In [7]:
# Initialize LLM for prompting

import llm
model = llm.get_model(LLM_MODEL)

In [8]:
# Test prompt

response = model.prompt("What is the Human Reference Atlas (HRA)?", system="Answer in one sentence like a five year old.")
print(response.text())

The Human Reference Atlas is like a giant map showing where all the body parts and special helpers inside us live, so doctors can understand better how our bodies work!


## Setup RAG Prompt

In [9]:
from IPython.display import Markdown

def rag_prompt(prompt, system = DEFAULT_SYSTEM_PROMPT, debug = SHOW_DEBUG_INFO):
    terms = [ f"* {entry.content}\n" for entry in collection.similar(prompt, number=10) ]
    if len(terms) > 0:
        system = f"{system}\nContext:\n{''.join(terms)}"
    response = model.prompt(prompt, system=system, stream = False)
    if debug:
        print("Prompt:", prompt)
        print("System Prompt:", system)
        print("Usage:", response.usage())
        # print("\nResponse:\n")
        # print(response.text())
        display(Markdown("\n**Response:**\n\n" + response.text()))
    return response


## Prompts

### Prompt: How many calesces are in the kidney?

In [10]:
response = rag_prompt("How many calesces are in the kidney?")

Prompt: How many calesces are in the kidney?
System Prompt: Answer in one sentence in an informal tone.
Context:
* http://purl.obolibrary.org/obo/UBERON_0006517 "kidney calyx" also known as calices renales; renal calix; renal calyx is Recesses of the kidney pelvis which divides into two wide, cup-shaped major renal calices, with each major calix subdivided into 7 to 14 minor calices. Urine empties into a minor calix from collecting tubules, then passes through the major calix, renal pelvis, and ureter to enter the urinary bladder. (From Moore, Clinically Oriented Anatomy, 3d ed, p211).
* http://purl.obolibrary.org/obo/UBERON_0001226 "major calyx" also known as calices renales majores; major calix is Portion of the urinary collecting system within the kidney that drains several minor calyces.
* http://purl.obolibrary.org/obo/UBERON_0001227 "minor calyx" also known as calices renales minores; minor calix is Portion of the urinary collecting system within the kidney that drains one renal 


**Response:**

The kidney has two major calyces, each divided into 7 to 14 minor calyces.

### Prompt: What is the UBERON term for 'renal calyx'?

In [11]:
response = rag_prompt("What is the UBERON term for 'renal calyx'?")

Prompt: What is the UBERON term for 'renal calyx'?
System Prompt: Answer in one sentence in an informal tone.
Context:
* http://purl.obolibrary.org/obo/UBERON_0001227 "minor calyx" also known as calices renales minores; minor calix is Portion of the urinary collecting system within the kidney that drains one renal papilla.
* http://purl.obolibrary.org/obo/UBERON_0001226 "major calyx" also known as calices renales majores; major calix is Portion of the urinary collecting system within the kidney that drains several minor calyces.
* http://purl.obolibrary.org/obo/UBERON_0006517 "kidney calyx" also known as calices renales; renal calix; renal calyx is Recesses of the kidney pelvis which divides into two wide, cup-shaped major renal calices, with each major calix subdivided into 7 to 14 minor calices. Urine empties into a minor calix from collecting tubules, then passes through the major calix, renal pelvis, and ureter to enter the urinary bladder. (From Moore, Clinically Oriented Anatomy,


**Response:**

The UBERON term for "renal calyx" is `UBERON_0006517`.

### Prompt: What kind of cells occur in the 'kidney calyx'?

In [12]:
system = "Include references to Cell Ontology (http://purl.obolibrary.org/obo/CL_xxx) and UBERON (http://purl.obolibrary.org/obo/UBERON_xxx) where possible."
response = rag_prompt("What kind of cells occur in the 'renal calyx'?", system)

Prompt: What kind of cells occur in the 'renal calyx'?
System Prompt: Include references to Cell Ontology (http://purl.obolibrary.org/obo/CL_xxx) and UBERON (http://purl.obolibrary.org/obo/UBERON_xxx) where possible.
Context:
* http://purl.obolibrary.org/obo/UBERON_0006517 "kidney calyx" also known as calices renales; renal calix; renal calyx is Recesses of the kidney pelvis which divides into two wide, cup-shaped major renal calices, with each major calix subdivided into 7 to 14 minor calices. Urine empties into a minor calix from collecting tubules, then passes through the major calix, renal pelvis, and ureter to enter the urinary bladder. (From Moore, Clinically Oriented Anatomy, 3d ed, p211).
* http://purl.obolibrary.org/obo/UBERON_0001226 "major calyx" also known as calices renales majores; major calix is Portion of the urinary collecting system within the kidney that drains several minor calyces.
* http://purl.obolibrary.org/obo/UBERON_0001227 "minor calyx" also known as calices 


**Response:**

The term "renal calyx," which includes both major and minor calyces, refers to specific anatomical structures within the kidney pelvis (UBERON:0006517). The primary function of these structures is to channel urine from the collecting ducts into the renal pelvis, ureter, and eventually the urinary bladder. 

In terms of cellular composition:
- **Transitional epithelial cells** are predominant in the lining of the calyces. These cells form a type of stratified epithelium that can stretch and accommodate the passage of urine (UBERON:0000033, UBERON:0010076).

While specific cell types such as those listed in your context, like "brush border cell" (CL_0002307) or "kidney granular cells" (CL_0000648), are found within other parts of the kidney tubules and vasculature, they do not occur specifically within the renal calyces. The transitional epithelial cells provide a protective lining to prevent urine from damaging underlying tissues as it passes through the renal calyx.

If you have further questions about related cell types or structures in the kidney, feel free to ask!