In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import pipeline
from tqdm import tqdm
import pandas as pd
import torch
import re
from pyhpo import Ontology
Ontology()

<pyhpo.ontology.OntologyClass at 0x7fafcc13bbe0>

In [2]:
def Get_Definition(hpo_list):
    definition_list = []
    for t in hpo_list:
        definition = Ontology.get_hpo_object(t).definition
        match = re.search(r'"(.*?)"', definition)
        if match:
            definition_list.append(match.group(1))
    return ' '.join(definition_list)

# HPO Terms Detected in a Patient

1. **HP:0000006** - Autosomal dominant inheritance  
2. **HP:0003593** - Infantile onset  
3. **HP:0025104** - Capillary malformation  
4. **HP:0001009** - Telangiectasia  
5. **HP:0003829** - Typified by incomplete penetrance  
6. **HP:0030713** - Vein of Galen aneurysmal malformation  


In [3]:
Patient_hps = (["HP:0000006", "HP:0003593", "HP:0025104", "HP:0001009", "HP:0003829", "HP:0030713"])

In [4]:
Input_text = Get_Definition(Patient_hps)

In [5]:
Model_Path = './Model_LoRA/checkpoint/'

device = "cuda:3" # the device to load the model onto
model_name_or_path = Model_Path

model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path,
    torch_dtype=torch.float16,
    device_map="cuda:3"
)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [6]:

summaries = []

max_input_tokens = 2048

hpo_def = Input_text

prompt = f"""
I will provide you with the definitions of some HPO (Human Phenotype Ontology) terms exhibited by a patient. Based on these definitions, please generate a concise, clinically focused summary of the patient's symptoms in one paragraph, approximately 100-300 words in length. Ensure the summary is highly readable, with smooth transitions between ideas, logical coherence, and accurate representation of the clinical features. Emphasize clarity, fluency, and clinical relevance to create a realistic and precise description of the patient's presentation.\nText:\n{hpo_def}
"""
  
tokenized_text = tokenizer(prompt, return_tensors="pt").input_ids[0]
truncated_tokenized_text = tokenized_text[:max_input_tokens]
    
truncated_text = tokenizer.decode(truncated_tokenized_text)  + '<think>:\n'

summarizer = pipeline(
    "text-generation",  
    model=model,  
    tokenizer=tokenizer
)
response = summarizer(
    truncated_text,
    max_new_tokens= max_input_tokens + 1024,
    top_p=0.95,
    top_k=50,
    do_sample=True
)

summary = response[0]['generated_text'].split('<think>:')
summaries.append(summary)
torch.cuda.empty_cache()


Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)


## Input prompt

In [7]:
print(summaries[0][0])


I will provide you with the definitions of some HPO (Human Phenotype Ontology) terms exhibited by a patient. Based on these definitions, please generate a concise, clinically focused summary of the patient's symptoms in one paragraph, approximately 100-300 words in length. Ensure the summary is highly readable, with smooth transitions between ideas, logical coherence, and accurate representation of the clinical features. Emphasize clarity, fluency, and clinical relevance to create a realistic and precise description of the patient's presentation.
Text:
A mode of inheritance that is observed for traits related to a gene encoded on one of the autosomes (i.e., the human chromosomes 1-22) in which a trait manifests in heterozygotes. In the context of medical genetics, an autosomal dominant disorder is caused when a single copy of the mutant allele is present. Males and females are affected equally, and can both transmit the disorder with a risk of 50% for each child of inheriting the muta

In [8]:
## Output

In [9]:
print(summaries[0][1])


Okay, let me start by understanding what the user wants. They provided several HPO terms and want a concise clinical summary. The key points here are the autosomal dominant inheritance pattern, early onset (infancy), presence of capillary malformations and telangiectasias, the Vein of Galen aneurysmal malformation, and variable penetrance.

First, I need to integrate the genetic aspect: autosomal dominant. That means family history might show similar cases, but not everyone with the mutation gets the disease. Then the onset at infancy—so symptoms started within first year. Capillary malformations and telangiectasias are noted in specific areas like lips, palms, etc. The Vein of Galen issue is a critical part since it's a congenital vascular anomaly leading to possible neurological issues if untreated.

I should structure this logically. Start with the likely diagnosis, maybe a hereditary syndrome. Mention the vascular anomalies first, then the specific malformation. Link the onset and