## Vladimir Maksimov
Este proyecto propone desarrollar un asistente basado en modelos de IA generativa que ayude a un paciente a entender de una manera más sencilla y directa la información relevante de un prospecto médico. La idea de negocio es tomar una foto de un prospecto médico y, mediante el uso de modelos de IA generativa, poder ofrecer un resumen, puntos importantes o la respuesta a una pregunta específica del usuario a partir de la información de la imagen.

Objetivos:
- Desarrollo y justificación de la solución respecto a la parte de modelado.
- Evaluación del modelo para analizar su impacto y posibles sesgos o alucinaciones.
- Desarrollo de los componentes necesarios tanto para el desarrollo de los modelos como para la inferencia en producción.
### Work Plan:
1. Data preparation.
- There will be several different datasets - one contains medicine leaflets, another consists of patients' questions and answers, and the third one contains images of written text.
- The data should be pre-processed and cleaned.
2. Model training. The idea is to train a model to process images and extract texts first, then summarize these texts following a detailed prompt to provide a patient with important information from the drug leaflets.

#### Processing the medicine leaflet images

In [1]:
!pip install openai
!pip install --upgrade os_sys

Collecting os_sys
  Using cached os_sys-2.1.4-py3-none-any.whl.metadata (9.9 kB)
Collecting pygubu (from os_sys)
  Using cached pygubu-0.35.4-py3-none-any.whl.metadata (7.3 kB)
Collecting sqlparse (from os_sys)
  Using cached sqlparse-0.5.1-py3-none-any.whl.metadata (3.9 kB)
Collecting progress (from os_sys)
  Using cached progress-1.6.tar.gz (7.8 kB)
  Preparing metadata (setup.py) ... [?25ldone
Collecting progressbar (from os_sys)
  Using cached progressbar-2.5.tar.gz (10 kB)
  Preparing metadata (setup.py) ... [?25ldone
Collecting Eel (from os_sys)
  Using cached eel-0.17.0.tar.gz (24 kB)
  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting extract-zip (from os_sys)
  Using cached extract_zip-1.0.0-py3-none-any.whl.metadata (403 bytes)
INFO: pip is looking at multiple versions of os-sys to determine which version is compatible with other requirements. This could take a while.
Collecting os_sys
  Using cached os_sys-2.1.3-py3-none-any.whl.metadata (9.9 kB)
  Using cached

In [2]:
from openai import OpenAI
import json
import os
import base64

In [6]:
from dotenv import load_dotenv

load_dotenv()

True

In [13]:
client = OpenAI()

In [14]:
api_key=os.getenv('OPENAI_API_KEY')

In [15]:
def load_json_schema(schema_file: str) -> dict:
    with open(schema_file, 'r') as file:
        return json.load(file)

In [16]:
image_path = '/Users/vladimirmaksimov/Desktop/Python/AI Bootcamp/TFB/leaflets/2024-07-17 11.46.59.jpg'

In [21]:
# Load the JSON schema
leaflet_schema = load_json_schema('leaflet_schema.json')

In [22]:
# Open the local image file in binary mode
with open(image_path, 'rb') as image_file:
    image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

In [25]:
response = client.chat.completions.create(
    model='gpt-4o-mini',
    response_format={"type": "json_object"},
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "provide JSON file that represents this document. In the 'leafletText' part provide the whole text of the leaflet. Use this JSON Schema: " +
                    json.dumps(leaflet_schema)},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{image_base64}"
                    }
                }
            ]
        }
    ],
    max_tokens=15000,
)

print(response.choices[0].message.content)
json_data = json.loads(response.choices[0].message.content)
filename_without_extension = os.path.splitext(os.path.basename(image_path))[0]
json_filename = f"{filename_without_extension}.json"



In [26]:
with open(json_filename, 'w') as file:
    json.dump(json_data, file, indent=4)

In [27]:
print(f"JSON data saved to {json_filename}")

JSON data saved to 2024-07-17 11.46.59.json
