# GenerativeAI4DS-I
## Lab. Medical Assistant


##  What I hope you'll get out of this lab
* The feeling that you'll "know where to start" when you have to consume OpenAI services.
* Follow OpenAI's best practices on how to develop assistants

In [27]:
!pip install openai



In [28]:
from openai import OpenAI
import os
import json
from IPython.core.display import display, HTML

In [29]:
def show_json(obj):
    display(json.loads(obj.model_dump_json()))

In [30]:
# We need this to load the files onto google colab
!git clone https://github.com/thousandoaks/GenerativeAI4DS-I.git

fatal: destination path 'GenerativeAI4DS-I' already exists and is not an empty directory.


# 1. You have to get your [OpenAI API Key](https://platform.openai.com/account/api-keys)

In [31]:
# Used by the agent in this tutorial
os.environ["OPENAI_API_KEY"] = "YOU-NEED-A-KEY"

In [32]:
client = OpenAI(
  api_key=os.environ['OPENAI_API_KEY'],  # this is also the default, it can be omitted
)

# 2. Medical Assistant
An Assistant represents an entity that can be configured to respond to a user's messages using several parameters like model, instructions, and tools.

This time we will create a Medical Assistant able to inspect healthcare records and summarize medical events




### 2.1. We create a new assistant with file search enabled

In [33]:
assistant = client.beta.assistants.create(
  name="Medical Analyst Assistant",
  instructions="You are a data scientist with experience in healthcare. Use you knowledge base to answer questions about medical reports.",
  model="gpt-4o",
  tools=[{"type": "file_search"}],
)

### 2.2. We upload financial information
To access your files, the file_search tool uses the Vector Store object. Upload your files and create a Vector Store to contain them. Once the Vector Store is created, you should poll its status until all files are out of the in_progress state to ensure that all content has finished processing. The SDK provides helpers to uploading and polling in one shot.

In [34]:
# Create a vector store caled "Financial Statements"
vector_store = client.vector_stores.create(name="Healthcare Records")

# Ready the files for upload to OpenAI
file_paths = ["/content/GenerativeAI4DS-I/datasets/06b03cbb.txt","/content/GenerativeAI4DS-I/datasets/08f08ba5.txt"]
file_streams = [open(path, "rb") for path in file_paths]

# Use the upload and poll SDK helper to upload the files, add them to the vector store,
# and poll the status of the file batch for completion.
file_batch = client.vector_stores.file_batches.upload_and_poll(
  vector_store_id=vector_store.id, files=file_streams
)

# You can print the status and the file counts of the batch to see the result of this operation.
print(file_batch.status)
print(file_batch.file_counts)

completed
FileCounts(cancelled=0, completed=2, failed=0, in_progress=0, total=2)


### 2.3 Update the assistant to to use the new Vector Store
To make the files accessible to your assistant, update the assistant’s tool_resources with the new vector_store id.

In [35]:
assistant = client.beta.assistants.update(
  assistant_id=assistant.id,
  tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},
)

### 2.4. Create a thread

You can also attach files as Message attachments on your thread. Doing so will create another vector_store associated with the thread, or, if there is already a vector store attached to this thread, attach the new files to the existing thread vector store. When you create a Run on this thread, the file search tool will query both the vector_store from your assistant and the vector_store on the thread.



In [36]:
# Create a thread and attach the file to the message
thread = client.beta.threads.create(
  messages=[
    {
      "role": "user",
      "content": "Summarize the medical event experienced by subscriber ID: 06b03cbb",

    }
  ]
)



  thread = client.beta.threads.create(


### 2.5 Create a Run

Now, create a Run and observe that the model uses the File Search tool to provide a response to the user’s question.

In [37]:
# Use the create and poll SDK helper to create a run and poll the status of
# the run until it's in a terminal state.

run = client.beta.threads.runs.create_and_poll(
    thread_id=thread.id, assistant_id=assistant.id
)

messages = list(client.beta.threads.messages.list(thread_id=thread.id, run_id=run.id))

message_content = messages[0].content[0].text
annotations = message_content.annotations
citations = []
for index, annotation in enumerate(annotations):
    message_content.value = message_content.value.replace(annotation.text, f"[{index}]")
    if file_citation := getattr(annotation, "file_citation", None):
        cited_file = client.files.retrieve(file_citation.file_id)
        citations.append(f"[{index}] {cited_file.filename}")

print(message_content.value)
print("\n".join(citations))

  run = client.beta.threads.runs.create_and_poll(
  messages = list(client.beta.threads.messages.list(thread_id=thread.id, run_id=run.id))


The subscriber with ID 06b03cbb experienced a medical event related to Chronic Obstructive Pulmonary Disease (COPD). They received treatment at a hospital's emergency room on January 1st, 2019. The principal diagnosis during this visit was chronic obstructive pulmonary disease (coded as J44.1). The treatment involved various services including clinical diagnostics in bacteriology/microbiology, hematology, and chemistry; pulmonary function assessments; radiology diagnostics via chest X-ray; pharmacy services; respiratory services; and EKG/ECG. The event included multiple treatments and diagnostic services on the same day[0].
[0] 06b03cbb.txt


### 2.6 We add more messages to the same thread as needed

In [38]:
message2 = client.beta.threads.messages.create(
  thread_id=thread.id,
  role="user",
  content="Was it a serious medical event ?"
)

show_json(message2)

  message2 = client.beta.threads.messages.create(


{'id': 'msg_2VPPYpYpJEk0jKWhGjpbv8gX',
 'assistant_id': None,
 'attachments': [],
 'completed_at': None,
 'content': [{'text': {'annotations': [],
    'value': 'Was it a serious medical event ?'},
   'type': 'text'}],
 'created_at': 1769585884,
 'incomplete_at': None,
 'incomplete_details': None,
 'metadata': {},
 'object': 'thread.message',
 'role': 'user',
 'run_id': None,
 'status': None,
 'thread_id': 'thread_vjvVCurCCIl94wTHK4K93dvy'}

In [39]:
run = client.beta.threads.runs.create_and_poll(
    thread_id=thread.id, assistant_id=assistant.id
)

show_json(message2)

  run = client.beta.threads.runs.create_and_poll(


{'id': 'msg_2VPPYpYpJEk0jKWhGjpbv8gX',
 'assistant_id': None,
 'attachments': [],
 'completed_at': None,
 'content': [{'text': {'annotations': [],
    'value': 'Was it a serious medical event ?'},
   'type': 'text'}],
 'created_at': 1769585884,
 'incomplete_at': None,
 'incomplete_details': None,
 'metadata': {},
 'object': 'thread.message',
 'role': 'user',
 'run_id': None,
 'status': None,
 'thread_id': 'thread_vjvVCurCCIl94wTHK4K93dvy'}

In [40]:
if run.status == 'completed':

  messages = list(client.beta.threads.messages.list(thread_id=thread.id, run_id=run.id))

  message_content = messages[0].content[0].text
  annotations = message_content.annotations
  citations = []
  for index, annotation in enumerate(annotations):
      message_content.value = message_content.value.replace(annotation.text, f"[{index}]")
      if file_citation := getattr(annotation, "file_citation", None):
          cited_file = client.files.retrieve(file_citation.file_id)
          citations.append(f"[{index}] {cited_file.filename}")

  print(message_content.value)
  print("\n".join(citations))

else:
  print(run.status)

  messages = list(client.beta.threads.messages.list(thread_id=thread.id, run_id=run.id))


The medical event experienced by the subscriber with ID 06b03cbb involved a visit to the emergency room for treatment related to Chronic Obstructive Pulmonary Disease (COPD), which indicates that it was considered urgent or severe enough to require immediate medical attention. COPD exacerbations or acute episodes often require urgent care due to the potential for significant breathing difficulties. Thus, while the document does not explicitly label the event as "serious," the necessity of emergency room treatment for COPD generally suggests a serious medical event[0].
[0] 06b03cbb.txt


### 2.7 We add more messages to the same thread as needed

In [41]:
message3 = client.beta.threads.messages.create(
  thread_id=thread.id,
  role="user",
  content="What happened to subscriber ID 08f08ba5?"
)

show_json(message3)

  message3 = client.beta.threads.messages.create(


{'id': 'msg_1grTQpFPVMgyzjckssbEA78l',
 'assistant_id': None,
 'attachments': [],
 'completed_at': None,
 'content': [{'text': {'annotations': [],
    'value': 'What happened to subscriber ID 08f08ba5?'},
   'type': 'text'}],
 'created_at': 1769585889,
 'incomplete_at': None,
 'incomplete_details': None,
 'metadata': {},
 'object': 'thread.message',
 'role': 'user',
 'run_id': None,
 'status': None,
 'thread_id': 'thread_vjvVCurCCIl94wTHK4K93dvy'}

In [42]:
run = client.beta.threads.runs.create_and_poll(
    thread_id=thread.id, assistant_id=assistant.id
)

show_json(message3)

  run = client.beta.threads.runs.create_and_poll(


{'id': 'msg_1grTQpFPVMgyzjckssbEA78l',
 'assistant_id': None,
 'attachments': [],
 'completed_at': None,
 'content': [{'text': {'annotations': [],
    'value': 'What happened to subscriber ID 08f08ba5?'},
   'type': 'text'}],
 'created_at': 1769585889,
 'incomplete_at': None,
 'incomplete_details': None,
 'metadata': {},
 'object': 'thread.message',
 'role': 'user',
 'run_id': None,
 'status': None,
 'thread_id': 'thread_vjvVCurCCIl94wTHK4K93dvy'}

In [43]:
if run.status == 'completed':

  messages = list(client.beta.threads.messages.list(thread_id=thread.id, run_id=run.id))

  message_content = messages[0].content[0].text
  annotations = message_content.annotations
  citations = []
  for index, annotation in enumerate(annotations):
      message_content.value = message_content.value.replace(annotation.text, f"[{index}]")
      if file_citation := getattr(annotation, "file_citation", None):
          cited_file = client.files.retrieve(file_citation.file_id)
          citations.append(f"[{index}] {cited_file.filename}")

  print(message_content.value)
  print("\n".join(citations))

else:
  print(run.status)

  messages = list(client.beta.threads.messages.list(thread_id=thread.id, run_id=run.id))


The subscriber with ID 08f08ba5 experienced a medical event involving unspecified chest pain. This individual was treated at a hospital in the emergency room from January 1st to January 3rd, 2017. The principal diagnosis was recorded as chest pain unspecified (coded as R07.9). During this period, the patient received several services including EKG/ECG, various laboratory tests in hematology and chemistry, other therapeutic services, respiratory services, radiology diagnostic tests via chest X-ray, and treatments in an observation room. There were also specific pharmacological and medical/surgical supplies noted as part of the treatment[0].
[0] 08f08ba5.txt


In [44]:
message4 = client.beta.threads.messages.create(
  thread_id=thread.id,
  role="user",
  content="Any other medical procedure you would have advised conducting for subscriber 08f08ba5?"
)

show_json(message4)

  message4 = client.beta.threads.messages.create(


{'id': 'msg_9n76CN1AEBeeMTIngcxsGKMb',
 'assistant_id': None,
 'attachments': [],
 'completed_at': None,
 'content': [{'text': {'annotations': [],
    'value': 'Any other medical procedure you would have advised conducting for subscriber 08f08ba5?'},
   'type': 'text'}],
 'created_at': 1769585900,
 'incomplete_at': None,
 'incomplete_details': None,
 'metadata': {},
 'object': 'thread.message',
 'role': 'user',
 'run_id': None,
 'status': None,
 'thread_id': 'thread_vjvVCurCCIl94wTHK4K93dvy'}

In [45]:
run = client.beta.threads.runs.create_and_poll(
    thread_id=thread.id, assistant_id=assistant.id
)

show_json(message4)

  run = client.beta.threads.runs.create_and_poll(


{'id': 'msg_9n76CN1AEBeeMTIngcxsGKMb',
 'assistant_id': None,
 'attachments': [],
 'completed_at': None,
 'content': [{'text': {'annotations': [],
    'value': 'Any other medical procedure you would have advised conducting for subscriber 08f08ba5?'},
   'type': 'text'}],
 'created_at': 1769585900,
 'incomplete_at': None,
 'incomplete_details': None,
 'metadata': {},
 'object': 'thread.message',
 'role': 'user',
 'run_id': None,
 'status': None,
 'thread_id': 'thread_vjvVCurCCIl94wTHK4K93dvy'}

In [46]:
if run.status == 'completed':

  messages = list(client.beta.threads.messages.list(thread_id=thread.id, run_id=run.id))

  message_content = messages[0].content[0].text
  annotations = message_content.annotations
  citations = []
  for index, annotation in enumerate(annotations):
      message_content.value = message_content.value.replace(annotation.text, f"[{index}]")
      if file_citation := getattr(annotation, "file_citation", None):
          cited_file = client.files.retrieve(file_citation.file_id)
          citations.append(f"[{index}] {cited_file.filename}")

  print(message_content.value)
  print("\n".join(citations))

else:
  print(run.status)

In cases of unspecified chest pain, especially when treated in an emergency room, it is crucial to rule out life-threatening conditions such as myocardial infarction (heart attack) or pulmonary embolism. The procedures already mentioned—like the EKG/ECG and chest X-ray—are standard for initial assessment. However, additional procedures that could be considered for a comprehensive evaluation include:

1. **Cardiac Enzymes Test**: To check for markers of heart damage, such as troponins, which can indicate a heart attack.

2. **Stress Testing**: To evaluate how the heart performs under stress, if the initial cardiac workup is inconclusive and the patient's condition allows for it.

3. **CT Angiography**: To visualize the coronary arteries and assess for blockages, particularly if a pulmonary embolism is suspected.

4. **Echocardiogram**: To assess the function and structures of the heart in more detail.

5. **D-Dimer Test**: Especially if a pulmonary embolism is suspected, this test can h

  messages = list(client.beta.threads.messages.list(thread_id=thread.id, run_id=run.id))


### 2.7 We delete the vector store

In [47]:
target_name = "Healthcare Records"
deleted_count = 0

print(f"Searching for all vector stores named '{target_name}'...")

# 1. List all vector stores (this iterator automatically handles pagination)
vector_stores = client.vector_stores.list()

# 2. Iterate and delete matches
for store in vector_stores:
    if store.name == target_name:
        try:
            client.vector_stores.delete(vector_store_id=store.id)
            print(f"Deleted: {store.id}")
            deleted_count += 1
        except Exception as e:
            print(f"Failed to delete {store.id}: {e}")

if deleted_count == 0:
    print(f"No vector stores found with the name '{target_name}'.")
else:
    print(f"---")
    print(f"Total deleted: {deleted_count}")

Searching for all vector stores named 'Healthcare Records'...
Deleted: vs_6979bcca89b48191aa59e8264d1db55e
---
Total deleted: 1
