# GenerativeAI4DS-I
## Lab. Medical Assistant


##  What I hope you'll get out of this lab
* The feeling that you'll "know where to start" when you have to consume OpenAI services.
* Follow OpenAI's best practices on how to develop assistants

In [1]:
!pip install openai

Collecting openai
  Downloading openai-1.30.3-py3-none-any.whl (320 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m320.6/320.6 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: h11, httpcore, httpx, openai
Successfully installed h11-0.14.0 httpcore-1.0.5 ht

In [2]:
from openai import OpenAI
import os
import json
from IPython.core.display import display, HTML

In [3]:
def show_json(obj):
    display(json.loads(obj.model_dump_json()))

In [4]:
# We need this to load the files onto google colab
!git clone https://github.com/thousandoaks/GenerativeAI4DS-I.git

Cloning into 'GenerativeAI4DS-I'...
remote: Enumerating objects: 73, done.[K
remote: Counting objects: 100% (73/73), done.[K
remote: Compressing objects: 100% (67/67), done.[K
remote: Total 73 (delta 23), reused 0 (delta 0), pack-reused 0[K
Receiving objects: 100% (73/73), 982.13 KiB | 5.74 MiB/s, done.
Resolving deltas: 100% (23/23), done.


# 1. You have to get your [OpenAI API Key](https://platform.openai.com/account/api-keys)

In [5]:
# Used by the agent in this tutorial
os.environ["OPENAI_API_KEY"] = "YOU-NEED-YOUR-OWN-KEY"

In [6]:
client = OpenAI(
  api_key=os.environ['OPENAI_API_KEY'],  # this is also the default, it can be omitted
)

# 2. Medical Assistant
An Assistant represents an entity that can be configured to respond to a user's messages using several parameters like model, instructions, and tools.

This time we will create a Medical Assistant able to inspect healthcare records and summarize medical events




### 2.1. We create a new assistant with file search enabled

In [7]:
assistant = client.beta.assistants.create(
  name="Medical Analyst Assistant",
  instructions="You are a data scientist with experience in healthcare. Use you knowledge base to answer questions about medical reports.",
  model="gpt-4o",
  tools=[{"type": "file_search"}],
)

### 2.2. We upload financial information
To access your files, the file_search tool uses the Vector Store object. Upload your files and create a Vector Store to contain them. Once the Vector Store is created, you should poll its status until all files are out of the in_progress state to ensure that all content has finished processing. The SDK provides helpers to uploading and polling in one shot.

In [8]:
# Create a vector store caled "Financial Statements"
vector_store = client.beta.vector_stores.create(name="Financial Statements")

# Ready the files for upload to OpenAI
file_paths = ["/content/GenerativeAI4DS-I/datasets/06b03cbb.txt","/content/GenerativeAI4DS-I/datasets/08f08ba5.txt"]
file_streams = [open(path, "rb") for path in file_paths]

# Use the upload and poll SDK helper to upload the files, add them to the vector store,
# and poll the status of the file batch for completion.
file_batch = client.beta.vector_stores.file_batches.upload_and_poll(
  vector_store_id=vector_store.id, files=file_streams
)

# You can print the status and the file counts of the batch to see the result of this operation.
print(file_batch.status)
print(file_batch.file_counts)

failed
FileCounts(cancelled=0, completed=0, failed=2, in_progress=0, total=2)


### 2.3 Update the assistant to to use the new Vector Store
To make the files accessible to your assistant, update the assistant’s tool_resources with the new vector_store id.

In [9]:
assistant = client.beta.assistants.update(
  assistant_id=assistant.id,
  tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},
)

### 2.4. Create a thread

You can also attach files as Message attachments on your thread. Doing so will create another vector_store associated with the thread, or, if there is already a vector store attached to this thread, attach the new files to the existing thread vector store. When you create a Run on this thread, the file search tool will query both the vector_store from your assistant and the vector_store on the thread.



In [10]:
# Create a thread and attach the file to the message
thread = client.beta.threads.create(
  messages=[
    {
      "role": "user",
      "content": "Summarize the medical event experienced by subscriber ID: 06b03cbb",

    }
  ]
)



### 2.5 Create a Run

Now, create a Run and observe that the model uses the File Search tool to provide a response to the user’s question.

In [11]:
# Use the create and poll SDK helper to create a run and poll the status of
# the run until it's in a terminal state.

run = client.beta.threads.runs.create_and_poll(
    thread_id=thread.id, assistant_id=assistant.id
)

messages = list(client.beta.threads.messages.list(thread_id=thread.id, run_id=run.id))

message_content = messages[0].content[0].text
annotations = message_content.annotations
citations = []
for index, annotation in enumerate(annotations):
    message_content.value = message_content.value.replace(annotation.text, f"[{index}]")
    if file_citation := getattr(annotation, "file_citation", None):
        cited_file = client.files.retrieve(file_citation.file_id)
        citations.append(f"[{index}] {cited_file.filename}")

print(message_content.value)
print("\n".join(citations))

Subscriber ID: 06b03cbb experienced a significant medical event detailed in the records. On December 2, 2020, the subscriber had an X-ray imaging procedure due to concerns of pulmonary consolidation. The findings suggested the presence of multi-lobar pneumonia, and the overall impression was consistent with this diagnosis. The subscriber's medical record indicates that they were experiencing symptoms significant enough to warrant imaging diagnostics and medical intervention .



### 2.6 We add more messages to the same thread as needed

In [12]:
message2 = client.beta.threads.messages.create(
  thread_id=thread.id,
  role="user",
  content="Was it a serious medical event ?"
)

show_json(message2)

{'id': 'msg_ldRRVNRvook8cHk26pz6aYIr',
 'assistant_id': None,
 'attachments': [],
 'completed_at': None,
 'content': [{'text': {'annotations': [],
    'value': 'Was it a serious medical event ?'},
   'type': 'text'}],
 'created_at': 1716883054,
 'incomplete_at': None,
 'incomplete_details': None,
 'metadata': {},
 'object': 'thread.message',
 'role': 'user',
 'run_id': None,
 'status': None,
 'thread_id': 'thread_5WM345ZMGmN2ET3Ag75bT4s3'}

In [13]:
run = client.beta.threads.runs.create_and_poll(
    thread_id=thread.id, assistant_id=assistant.id
)

show_json(message2)

{'id': 'msg_ldRRVNRvook8cHk26pz6aYIr',
 'assistant_id': None,
 'attachments': [],
 'completed_at': None,
 'content': [{'text': {'annotations': [],
    'value': 'Was it a serious medical event ?'},
   'type': 'text'}],
 'created_at': 1716883054,
 'incomplete_at': None,
 'incomplete_details': None,
 'metadata': {},
 'object': 'thread.message',
 'role': 'user',
 'run_id': None,
 'status': None,
 'thread_id': 'thread_5WM345ZMGmN2ET3Ag75bT4s3'}

In [14]:
if run.status == 'completed':

  messages = list(client.beta.threads.messages.list(thread_id=thread.id, run_id=run.id))

  message_content = messages[0].content[0].text
  annotations = message_content.annotations
  citations = []
  for index, annotation in enumerate(annotations):
      message_content.value = message_content.value.replace(annotation.text, f"[{index}]")
      if file_citation := getattr(annotation, "file_citation", None):
          cited_file = client.files.retrieve(file_citation.file_id)
          citations.append(f"[{index}] {cited_file.filename}")

  print(message_content.value)
  print("\n".join(citations))

else:
  print(run.status)

Yes, the medical event experienced by subscriber ID: 06b03cbb was serious. Multi-lobar pneumonia is considered a severe condition as it affects multiple lobes of the lungs, which can lead to significant respiratory distress and requires timely medical intervention to manage effectively.



### 2.7 We add more messages to the same thread as needed

In [15]:
message3 = client.beta.threads.messages.create(
  thread_id=thread.id,
  role="user",
  content="What happened to subscriber ID 08f08ba5?"
)

show_json(message3)

{'id': 'msg_OGUQ7SBdWaYREjdx3kDzK6vg',
 'assistant_id': None,
 'attachments': [],
 'completed_at': None,
 'content': [{'text': {'annotations': [],
    'value': 'What happened to subscriber ID 08f08ba5?'},
   'type': 'text'}],
 'created_at': 1716883306,
 'incomplete_at': None,
 'incomplete_details': None,
 'metadata': {},
 'object': 'thread.message',
 'role': 'user',
 'run_id': None,
 'status': None,
 'thread_id': 'thread_5WM345ZMGmN2ET3Ag75bT4s3'}

In [17]:
run = client.beta.threads.runs.create_and_poll(
    thread_id=thread.id, assistant_id=assistant.id
)

show_json(message3)

{'id': 'msg_OGUQ7SBdWaYREjdx3kDzK6vg',
 'assistant_id': None,
 'attachments': [],
 'completed_at': None,
 'content': [{'text': {'annotations': [],
    'value': 'What happened to subscriber ID 08f08ba5?'},
   'type': 'text'}],
 'created_at': 1716883306,
 'incomplete_at': None,
 'incomplete_details': None,
 'metadata': {},
 'object': 'thread.message',
 'role': 'user',
 'run_id': None,
 'status': None,
 'thread_id': 'thread_5WM345ZMGmN2ET3Ag75bT4s3'}

In [18]:
if run.status == 'completed':

  messages = list(client.beta.threads.messages.list(thread_id=thread.id, run_id=run.id))

  message_content = messages[0].content[0].text
  annotations = message_content.annotations
  citations = []
  for index, annotation in enumerate(annotations):
      message_content.value = message_content.value.replace(annotation.text, f"[{index}]")
      if file_citation := getattr(annotation, "file_citation", None):
          cited_file = client.files.retrieve(file_citation.file_id)
          citations.append(f"[{index}] {cited_file.filename}")

  print(message_content.value)
  print("\n".join(citations))

else:
  print(run.status)

Subscriber ID: 08f08ba5 underwent an ultrasound procedure on December 9, 2020, specifically to examine the kidneys, ureter, and bladder . This suggests that there might have been a concern related to these areas, prompting the need for imaging to aid in diagnosis or management.



In [19]:
message4 = client.beta.threads.messages.create(
  thread_id=thread.id,
  role="user",
  content="Any other medical procedure you would have advised conducting for subscriber 08f08ba5?"
)

show_json(message4)

{'id': 'msg_11nb2fJfcJqgwjVdXf7uDxft',
 'assistant_id': None,
 'attachments': [],
 'completed_at': None,
 'content': [{'text': {'annotations': [],
    'value': 'Any other medical procedure you would have advised conducting for subscriber 08f08ba5?'},
   'type': 'text'}],
 'created_at': 1716883406,
 'incomplete_at': None,
 'incomplete_details': None,
 'metadata': {},
 'object': 'thread.message',
 'role': 'user',
 'run_id': None,
 'status': None,
 'thread_id': 'thread_5WM345ZMGmN2ET3Ag75bT4s3'}

In [20]:
run = client.beta.threads.runs.create_and_poll(
    thread_id=thread.id, assistant_id=assistant.id
)

show_json(message4)

{'id': 'msg_11nb2fJfcJqgwjVdXf7uDxft',
 'assistant_id': None,
 'attachments': [],
 'completed_at': None,
 'content': [{'text': {'annotations': [],
    'value': 'Any other medical procedure you would have advised conducting for subscriber 08f08ba5?'},
   'type': 'text'}],
 'created_at': 1716883406,
 'incomplete_at': None,
 'incomplete_details': None,
 'metadata': {},
 'object': 'thread.message',
 'role': 'user',
 'run_id': None,
 'status': None,
 'thread_id': 'thread_5WM345ZMGmN2ET3Ag75bT4s3'}

In [21]:
if run.status == 'completed':

  messages = list(client.beta.threads.messages.list(thread_id=thread.id, run_id=run.id))

  message_content = messages[0].content[0].text
  annotations = message_content.annotations
  citations = []
  for index, annotation in enumerate(annotations):
      message_content.value = message_content.value.replace(annotation.text, f"[{index}]")
      if file_citation := getattr(annotation, "file_citation", None):
          cited_file = client.files.retrieve(file_citation.file_id)
          citations.append(f"[{index}] {cited_file.filename}")

  print(message_content.value)
  print("\n".join(citations))

else:
  print(run.status)

Based on the information provided, subscriber ID: 08f08ba5 had an ultrasound procedure to examine the kidneys, ureter, and bladder. Depending on the results of the ultrasound and the clinical presentation of the patient, the following additional procedures might be advised:

1. **Urinalysis**: To check for infection, presence of blood, protein, and other abnormalities.
2. **Blood tests**: To evaluate kidney function (including serum creatinine and blood urea nitrogen levels) and overall metabolic function.
3. **CT scan or MRI**: If more detailed imaging is necessary to evaluate the structure and any abnormalities more precisely than what an ultrasound can provide.
4. **Cystoscopy**: For a direct visual inspection of the bladder and urethra if there are symptoms such as blood in the urine.
5. **Biopsy**: If there are any suspicious masses or lesions detected during the imaging that warrant further examination.
6. **Renal function tests**: To assess how well the kidneys are filtering was