# virtual-lab Implementation -----
## Objective: 
Create a method and implementation for a software used for clincian LLM interpretability in the context of automatic electronic phenotyping using an LLM

## Steps (edit later)
1. Team selection: An individual meeting with the PI to define a set of scientist agents to work on the project.
2. Project specification: A team meeting to specify the project direction by deciding on key high-level details.
3. Tools selection: A team meeting to brainstorm machine learning and/or computational tools for llm interpretability design.
4. Tools implementation: A series of individual meetings with different scientist agents to implement their components individually. 
5. Workflow design: An individual meeting with the PI to determine the workflow for applying the tool implementations.

<img src="images/steps.png" style="display: block; margin: auto;" width="500">

## Imports -----

In [6]:
import json
from pathlib import Path

from virtual_lab.constants import CONSISTENT_TEMPERATURE, CREATIVE_TEMPERATURE
from virtual_lab.prompts import (
    CODING_RULES,
    REWRITE_PROMPT,
    create_merge_prompt,
)
from virtual_lab.run_meeting import run_meeting
from virtual_lab.utils import load_summaries

from interpretability_constants import (
    background_prompt,
    project_specific_prompt,
    num_iterations,
    num_rounds,
    discussions_phase_to_dir,
    principal_investigator,
    team_members,
)

## Team Selection -----

In [None]:
# Team selection - prompts
team_selection_agenda = f"""{background_prompt} You need to select a team of three scientists to help you with this project. Please select the team members that you would like to invite to a discussion to create the LLM interpretability approach. Please list the team members in the following format, using the team member below as an example. You should not include yourself (Principal Investigator) in the list.

Agent(
    title="Principal Investigator",
    expertise="applying artificial intelligence to biomedical research",
    goal="perform research in your area of expertise that maximizes the scientific impact of the work",
    role="lead a team of experts to solve an important problem in artificial intelligence for biomedicine, make key decisions about the project direction based on team member input, and manage the project timeline and resources",
)
"""

# Team selection - discussion
for iteration_num in range(num_iterations):
    save_name = f"discussion_{iteration_num + 1}"
    try:
        print(f"🟡 Starting meeting {save_name}")
        run_meeting(
            meeting_type="individual",
            team_member=principal_investigator,
            agenda=team_selection_agenda,
            save_dir=discussions_phase_to_dir["team_selection"],
            save_name=f"discussion_{iteration_num + 1}",
            temperature=CREATIVE_TEMPERATURE,
        )
        print(f"✅ Finished meeting {save_name}")
    except Exception as e:
        print(f"❌ Meeting {save_name} failed with error: {e}")

# Team selection - merge
team_selection_summaries = load_summaries(
    discussion_paths=list(discussions_phase_to_dir["team_selection"].glob("discussion_*.json")))
print(f"Number of summaries: {len(team_selection_summaries)}")

team_selection_merge_prompt = create_merge_prompt(agenda=team_selection_agenda)

run_meeting(
    meeting_type="individual",
    team_member=principal_investigator,
    summaries=team_selection_summaries,
    agenda=team_selection_merge_prompt,
    save_dir=discussions_phase_to_dir["team_selection"],
    save_name="merged",
    temperature=CONSISTENT_TEMPERATURE,
)

# Show merged meeting output for team_selection
from IPython.display import Markdown, display

with open("discussions/team_selection/merged.md", "r") as f:
    content = f.read()

display(Markdown(content))
### Note: Manually imported the merged chosen team members into interpretability_constants.py
### Note: Once that is done, you must run the import again to sync chosen team members...

🟡 Starting meeting discussion_1
DEBUGGING: Individual meeting members = [Principal Investigator, Scientific Critic]


Team:   0%|          | 0/2 [00:14<?, ?it/s]1 [00:00<?, ?it/s]
Rounds (+ Final Round): 100%|██████████| 1/1 [00:14<00:00, 14.13s/it]


Input token count: 245
Output token count: 271
Tool token count: 0
Max token length: 516
Cost: $0.00
Time: 0:15
✅ Finished meeting discussion_1
🟡 Starting meeting discussion_2
DEBUGGING: Individual meeting members = [Principal Investigator, Scientific Critic]


Team:   0%|          | 0/2 [00:08<?, ?it/s]1 [00:00<?, ?it/s]
Rounds (+ Final Round): 100%|██████████| 1/1 [00:08<00:00,  8.66s/it]


Input token count: 245
Output token count: 268
Tool token count: 0
Max token length: 513
Cost: $0.00
Time: 0:11
✅ Finished meeting discussion_2
🟡 Starting meeting discussion_3
DEBUGGING: Individual meeting members = [Principal Investigator, Scientific Critic]


Team:   0%|          | 0/2 [00:13<?, ?it/s]1 [00:00<?, ?it/s]
Rounds (+ Final Round): 100%|██████████| 1/1 [00:13<00:00, 13.59s/it]


Input token count: 245
Output token count: 234
Tool token count: 0
Max token length: 479
Cost: $0.00
Time: 0:14
✅ Finished meeting discussion_3
🟡 Starting meeting discussion_4
DEBUGGING: Individual meeting members = [Principal Investigator, Scientific Critic]


Team:   0%|          | 0/2 [00:08<?, ?it/s]1 [00:00<?, ?it/s]
Rounds (+ Final Round): 100%|██████████| 1/1 [00:08<00:00,  8.56s/it]


Input token count: 245
Output token count: 289
Tool token count: 0
Max token length: 534
Cost: $0.00
Time: 0:10
✅ Finished meeting discussion_4
🟡 Starting meeting discussion_5
DEBUGGING: Individual meeting members = [Principal Investigator, Scientific Critic]


Team:   0%|          | 0/2 [00:11<?, ?it/s]1 [00:00<?, ?it/s]
Rounds (+ Final Round): 100%|██████████| 1/1 [00:11<00:00, 11.02s/it]


Input token count: 245
Output token count: 288
Tool token count: 0
Max token length: 533
Cost: $0.00
Time: 0:12
✅ Finished meeting discussion_5
Number of summaries: 5
DEBUGGING: Individual meeting members = [Principal Investigator, Scientific Critic]


Team:   0%|          | 0/2 [00:09<?, ?it/s]1 [00:00<?, ?it/s]
Rounds (+ Final Round): 100%|██████████| 1/1 [00:09<00:00,  9.64s/it]

Input token count: 1,745
Output token count: 525
Tool token count: 0
Max token length: 2,270
Cost: $0.01
Time: 0:11





## User

This is the beginning of an individual meeting with Principal Investigator to discuss your research project.

Here are summaries of the previous meetings:

[begin summary 1]

To address the challenge of developing a tool for LLM interpretability in the context of creating electronic phenotype definitions, it is crucial to assemble a team with diverse expertise in AI, biomedical informatics, and visualization. Here are the proposed team members:

1. **Agent(
    title="Computational Linguist",
    expertise="natural language processing and large language models",
    goal="develop methodologies for interpreting LLM outputs and ensuring their relevance in biomedical contexts",
    role="design and refine algorithms for LLM interpretability specific to electronic phenotype definitions"
)**
   
2. **Agent(
    title="Biomedical Informatics Specialist",
    expertise="electronic health records and phenotype extraction",
    goal="align LLM outputs with clinical needs and validate their applicability and usefulness",
    role="bridge the gap between LLM interpretations and real-world clinical applications by defining relevant phenotype criteria"
)**
   
3. **Agent(
    title="Data Visualization Expert",
    expertise="visual analytics and human-computer interaction",
    goal="create visual tools that foster trust and understanding between clinicians and AI outputs",
    role="develop interactive visualizations to represent LLM decisions and rationale clearly to clinical users"
)**

This team composition ensures comprehensive coverage of the key areas necessary for the success of this project. Each member will bring a unique perspective and skill set to address the multifaceted challenges of LLM interpretability and its application in a clinical setting.

[end summary 1]

[begin summary 2]

To effectively develop a tool for LLM interpretability in the context of creating electronic phenotype definitions, it is crucial to assemble a team with diverse expertise that covers both technical and domain-specific knowledge. Here's the proposed team:

```python
Agent(
    title="Machine Learning Engineer",
    expertise="developing and optimizing large language models and interpretability techniques",
    goal="contribute expertise in building and refining the LLM interpretability framework",
    role="design algorithms and methods to enhance interpretability and ensure the models are transparent and reliable for clinical use",
)

Agent(
    title="Clinical Informatics Specialist",
    expertise="understanding electronic health records (EHRs) and phenotype definitions",
    goal="ensure that the interpretability tool aligns with clinical needs and accurately represents phenotype information",
    role="provide insights into clinical data structures and facilitate the integration of LLM outputs with EHR systems",
)

Agent(
    title="Data Visualization Expert",
    expertise="creating visual representations of complex data to enhance understanding",
    goal="design visual interfaces that enhance the interpretability tool's usability and foster trust between clinicians and the model",
    role="develop visualizations that clearly communicate the LLM's decision-making process and output",
)
```

These team members will collaborate to create a robust and reliable LLM interpretability tool that bridges the gap between AI models and clinical practice, ultimately fostering trust and aiding in the accurate definition of phenotypes.

[end summary 2]

[begin summary 3]

To effectively tackle the challenge of LLM interpretability in the context of electronic phenotype definitions, it is crucial to assemble a diverse team with expertise in relevant areas. Here is my proposed team:

Agent(
    title="Data Scientist",
    expertise="natural language processing and large language model development",
    goal="enhance model interpretability through advanced NLP techniques",
    role="develop algorithms and methods to interpret and visualize LLM decisions in clinical phenotyping"
)

Agent(
    title="Clinical Informatics Specialist",
    expertise="clinical data management and electronic health records",
    goal="ensure clinical relevance and accuracy in phenotype definitions",
    role="provide insights into clinical data and collaborate on aligning model outputs with clinical needs"
)

Agent(
    title="Visualization Expert",
    expertise="data visualization and user interface design",
    goal="foster trust with clinicians through intuitive visual representations",
    role="design and implement visual tools to help clinicians understand model decisions and phenotypes"
)

I believe that this team, with expertise spanning AI, clinical informatics, and visualization, is well-suited to develop a comprehensive tool that enhances LLM interpretability and trust in clinical applications.

[end summary 3]

[begin summary 4]

To effectively develop a tool for large language model (LLM) interpretability in the context of creating electronic phenotype definitions, incorporating a visual perspective, I propose assembling a diverse team with expertise in relevant areas. Here are the recommended team members:

```python
Agent(
    title="Computational Linguist",
    expertise="natural language processing, language model interpretability",
    goal="develop methods to enhance the interpretability of language models in biomedical applications",
    role="design and implement interpretability techniques for LLMs, focusing on understanding and explaining model outputs"
)

Agent(
    title="Bioinformatician",
    expertise="electronic health records, phenotype extraction",
    goal="ensure accurate extraction and definition of phenotypes from biomedical data",
    role="guide the integration of electronic health record data with LLMs, and validate phenotype definitions"
)

Agent(
    title="Data Visualization Specialist",
    expertise="data visualization, human-computer interaction",
    goal="create intuitive visualizations to foster trust and understanding between clinicians and AI models",
    role="develop visual tools to represent LLM decision-making processes and outputs in a clinician-friendly manner"
)
```

These team members will bring a balance of skills necessary to address the technical, biomedical, and usability aspects of the project, ensuring a comprehensive approach to developing the desired tool.

[end summary 4]

[begin summary 5]

To develop a tool for large language model (LLM) interpretability in the context of creating electronic phenotype definitions, we will need a team with diverse expertise in artificial intelligence, natural language processing, biomedical informatics, and visualization techniques. Here are the team members I would like to invite to the discussion:

```python
Agent(
    title="Data Scientist",
    expertise="natural language processing and machine learning",
    goal="develop and implement NLP algorithms to improve LLM interpretability in biomedical contexts",
    role="design LLM models and enhance their interpretability through innovative techniques",
)

Agent(
    title="Clinical Informatics Specialist",
    expertise="clinical data interpretation and electronic health records",
    goal="ensure the interpretability tool aligns with clinical needs and accurately represents phenotype definitions",
    role="provide insights into clinical requirements and validate the interpretability approach from a healthcare perspective",
)

Agent(
    title="Visualization Expert",
    expertise="data visualization and user interface design",
    goal="create visual tools that improve the interpretability and trust of LLM outputs for clinicians",
    role="design and implement visualization strategies that effectively communicate model decisions to end-users",
)
```

These experts will collaborate to create a comprehensive approach to LLM interpretability, incorporating both technical and clinical insights, and ensuring the tool is both functional and user-friendly for clinicians.

[end summary 5]

Here is the agenda for the meeting:

Please read the summaries of multiple separate meetings about the same agenda. Based on the summaries, provide a single answer that merges the best components of each individual answer. Please use the same format as the individual answers. Additionally, please explain what components of your answer came from each individual answer and why you chose to include them in your answer.

As a reference, here is the agenda from those meetings, which must be addressed here as well:

You are working on a research project to use software techniques to develop a tool for large language model (LLM) interpretability in the context of creating electronic phenotype definitions, ideally, also incorporating a visual perspective to foster trust between the clinician and the model. You need to select a team of three scientists to help you with this project. Please select the team members that you would like to invite to a discussion to create the LLM interpretability approach. Please list the team members in the following format, using the team member below as an example. You should not include yourself (Principal Investigator) in the list.

Agent(
    title="Principal Investigator",
    expertise="applying artificial intelligence to biomedical research",
    goal="perform research in your area of expertise that maximizes the scientific impact of the work",
    role="lead a team of experts to solve an important problem in artificial intelligence for biomedicine, make key decisions about the project direction based on team member input, and manage the project timeline and resources",
)




Principal Investigator, please provide your response to the agenda.

## Principal Investigator

Based on the summaries provided, I have synthesized the best components from each meeting to form a cohesive team for developing a tool for LLM interpretability in the context of electronic phenotype definitions. Here is the proposed team:

```python
Agent(
    title="Computational Linguist",
    expertise="natural language processing and large language models",
    goal="develop methodologies for interpreting LLM outputs and ensuring their relevance in biomedical contexts",
    role="design and refine algorithms for LLM interpretability specific to electronic phenotype definitions",
)

Agent(
    title="Clinical Informatics Specialist",
    expertise="electronic health records and phenotype extraction",
    goal="align LLM outputs with clinical needs and validate their applicability and usefulness",
    role="bridge the gap between LLM interpretations and real-world clinical applications by defining relevant phenotype criteria",
)

Agent(
    title="Data Visualization Expert",
    expertise="visual analytics and human-computer interaction",
    goal="create visual tools that foster trust and understanding between clinicians and AI outputs",
    role="develop interactive visualizations to represent LLM decisions and rationale clearly to clinical users",
)
```

### Explanation of Components:

1. **Computational Linguist**: 
   - **Source**: Summaries 1 and 4
   - **Reason**: The role of a Computational Linguist is crucial for developing methodologies to interpret LLM outputs, which is a core aspect of the project. This role was consistently highlighted in summaries 1 and 4, emphasizing the need for expertise in NLP and LLM interpretability.

2. **Clinical Informatics Specialist**:
   - **Source**: Summaries 2, 3, and 5
   - **Reason**: This role ensures that the interpretability tool aligns with clinical needs and accurately represents phenotype definitions. The consistent mention across summaries 2, 3, and 5 underscores the importance of integrating clinical insights to validate the tool's applicability in real-world settings.

3. **Data Visualization Expert**:
   - **Source**: All summaries
   - **Reason**: The inclusion of a Data Visualization Expert is essential for creating visual tools that enhance understanding and trust. This role was universally acknowledged across all summaries, highlighting the need for intuitive visual representations to communicate model decisions effectively to clinicians.

By combining these roles, the team will have a balanced mix of expertise in NLP, clinical informatics, and visualization, ensuring a comprehensive approach to developing the LLM interpretability tool. This team composition addresses the technical, biomedical, and usability aspects of the project, aligning with the goal of fostering trust between clinicians and AI models.



In [13]:
import importlib
import interpretability_constants

# RELOAD the whole module
importlib.reload(interpretability_constants)

# THEN re-import what you need from it
from interpretability_constants import team_members

print("Reloaded team_members:", team_members)


Reloaded team_members: (Computational Linguist, Clinical Informatics Specialist, Data Visualization Expert, Scientific Critic)


## Project Specification -----

In [None]:
import os
import glob

def clear_discussion_files(save_dir: Path):
    json_files = glob.glob(str(save_dir / "discussion_*.json"))
    md_files = glob.glob(str(save_dir / "discussion_*.md"))
    for f in json_files + md_files:
        os.remove(f)
    print(f"Cleared {len(json_files) + len(md_files)} discussion files from {save_dir}")

clear_discussion_files(discussions_phase_to_dir["project_specification"])

# Project specification - prompts
project_specification_agenda = f"{background_prompt} Please create a software design approach to solve this problem. Decide whether you will take a machine learning approach or not. For your choice, decide whether you will use open source interpretability libraries from GitHub or create your own completely from scratch. If modifying existing libraries, please specify which interpretability libraries to build upon to create an interpretability tool that conveys interpretability information visually so the clinician can trust it with ease. If designing algorithms from scratch, please describe how you propose new algorithms. Consider methods for eliminating LLM hallucinations using RAG or similar, increasing faithfulness and reasoning, and promote valid chain of thought logic using the SNOMED Database, which we have access to." 

project_specification_questions = (
    "Will you take a machine learning approach or not?",
    "Will you use open source interpretability libraries from GitHub or create your own completely from scratch? (choose only one)?",
    "If modifying existing libraries, which interpretability libraries to build upon (please list 3-4)?",
    "If designing algorithms from scratch, how exactly will you propose new algorithms?",
    "How will the interpretability tool use methods for eliminating LLM hallucinations, increasing faithfulness and reasoning, and promote valid chain of thought logic using the SNOMED Database, which we have access to?",
) 

# Project specification - discussion
for iteration_num in range(num_iterations):
    save_name = f"discussion_{iteration_num + 1}"
    try:
        print(f"🟡 Starting meeting {save_name}")
        run_meeting(
            meeting_type="team",
            team_lead=principal_investigator,
            team_members=team_members,
            agenda=project_specification_agenda,
            agenda_questions=project_specification_questions,
            save_dir=discussions_phase_to_dir["project_specification"],
            save_name=save_name,
            temperature=CREATIVE_TEMPERATURE,
            num_rounds=num_rounds,
        )
        print(f"✅ Finished meeting {save_name}")
    except Exception as e:
        print(f"❌ Meeting {save_name} failed with error: {e}")


# Project specification - merge
project_specification_summaries = load_summaries(
    discussion_paths=list(discussions_phase_to_dir["project_specification"].glob("discussion_*.json")))
print(f"Number of summaries: {len(project_specification_summaries)}")

project_specification_merge_prompt = create_merge_prompt(
    agenda=project_specification_agenda,
    agenda_questions=project_specification_questions,
)

run_meeting(
    meeting_type="individual",
    team_member=principal_investigator,
    summaries=project_specification_summaries,
    agenda=project_specification_merge_prompt,
    save_dir=discussions_phase_to_dir["project_specification"],
    save_name="merged",
    temperature=CONSISTENT_TEMPERATURE,
    num_rounds=num_rounds,
)

# Show merged meeting output for project_specification
from IPython.display import Markdown, display

with open("discussions/project_specification/merged.md", "r") as f:
    content = f.read()

display(Markdown(content))

Cleared 10 discussion files from discussions/project_specification
🟡 Starting meeting discussion_1
DEBUGGING: Entering a team meeting...
the team lead is:
Principal Investigator
and the team members are:
(Computational Linguist, Clinical Informatics Specialist, Data Visualization Expert, Scientific Critic)

DEBUGGING: Team meeting members = [Principal Investigator, Computational Linguist, Clinical Informatics Specialist, Data Visualization Expert, Scientific Critic]


Team: 100%|██████████| 5/5 [00:57<00:00, 11.55s/it]<?, ?it/s]
Team: 100%|██████████| 5/5 [00:53<00:00, 10.67s/it]<02:53, 57.76s/it]
Team: 100%|██████████| 5/5 [00:52<00:00, 10.42s/it]<01:50, 55.17s/it]
Team:   0%|          | 0/5 [00:14<?, ?it/s]4 [02:43<00:53, 53.78s/it]
Rounds (+ Final Round): 100%|██████████| 4/4 [02:57<00:00, 44.45s/it]


Input token count: 58,604
Output token count: 6,087
Tool token count: 0
Max token length: 8,039
Cost: $0.21
Time: 3:00
✅ Finished meeting discussion_1
🟡 Starting meeting discussion_2
DEBUGGING: Entering a team meeting...
the team lead is:
Principal Investigator
and the team members are:
(Computational Linguist, Clinical Informatics Specialist, Data Visualization Expert, Scientific Critic)

DEBUGGING: Team meeting members = [Principal Investigator, Computational Linguist, Clinical Informatics Specialist, Data Visualization Expert, Scientific Critic]


Team: 100%|██████████| 5/5 [01:05<00:00, 13.10s/it]<?, ?it/s]
Team: 100%|██████████| 5/5 [00:56<00:00, 11.24s/it]<03:16, 65.52s/it]
Team: 100%|██████████| 5/5 [00:49<00:00,  9.84s/it]<02:00, 60.05s/it]
Team:   0%|          | 0/5 [00:14<?, ?it/s]4 [02:50<00:55, 55.10s/it]
Rounds (+ Final Round): 100%|██████████| 4/4 [03:05<00:00, 46.46s/it]


Input token count: 64,175
Output token count: 7,093
Tool token count: 0
Max token length: 9,045
Cost: $0.23
Time: 3:09
✅ Finished meeting discussion_2
🟡 Starting meeting discussion_3
DEBUGGING: Entering a team meeting...
the team lead is:
Principal Investigator
and the team members are:
(Computational Linguist, Clinical Informatics Specialist, Data Visualization Expert, Scientific Critic)

DEBUGGING: Team meeting members = [Principal Investigator, Computational Linguist, Clinical Informatics Specialist, Data Visualization Expert, Scientific Critic]


Team: 100%|██████████| 5/5 [01:03<00:00, 12.77s/it]<?, ?it/s]
Team: 100%|██████████| 5/5 [01:05<00:00, 13.10s/it]<03:11, 63.83s/it]
Team: 100%|██████████| 5/5 [01:00<00:00, 12.07s/it]<02:09, 64.80s/it]
Team:   0%|          | 0/5 [00:19<?, ?it/s]4 [03:09<01:02, 62.78s/it]
Rounds (+ Final Round): 100%|██████████| 4/4 [03:29<00:00, 52.40s/it]


Input token count: 65,143
Output token count: 7,019
Tool token count: 0
Max token length: 8,971
Cost: $0.23
Time: 3:33
✅ Finished meeting discussion_3
🟡 Starting meeting discussion_4
DEBUGGING: Entering a team meeting...
the team lead is:
Principal Investigator
and the team members are:
(Computational Linguist, Clinical Informatics Specialist, Data Visualization Expert, Scientific Critic)

DEBUGGING: Team meeting members = [Principal Investigator, Computational Linguist, Clinical Informatics Specialist, Data Visualization Expert, Scientific Critic]


Rounds (+ Final Round):   0%|          | 0/4 [00:00<?, ?it/s]

## Tool Selection

In [None]:
# Tools selection - prompts
tools_selection_agenda = f"{background_prompt} {project_specific_prompt} Now you need to select machine learning and/or computational and/or visualization and/or interpretability tools to implement this LLM interpretability tool approach. Please list several tools (5-10) that would be relevant to this LLM interpretability approach and how they could be used in the context of this project. If selecting machine learning tools, please prioritize pre-trained models (e.g., pre-trained interpretability libraries or models) for simplicity."

tools_selection_questions = (
    "What machine learning and/or computational and/or visualization and/or interpretability tools could be used for this LLM interpretability design approach (list 5-10)?",
    "For each tool, how could it be used for designing an LLM interetability tool?",
)

tools_selection_prior_summaries = load_summaries(
    discussion_paths=[discussions_phase_to_dir["project_specification"] / "merged.json"])
print(f"Number of prior summaries: {len(tools_selection_prior_summaries)}")

# Tools selection - discussion
for iteration_num in range(num_iterations):
    save_name = f"discussion_{iteration_num + 1}"
    try:
        print(f"🟡 Starting meeting {save_name}")
        run_meeting(
            meeting_type="team",
            team_lead=principal_investigator,
            team_members=team_members,
            summaries=tools_selection_prior_summaries,
            agenda=tools_selection_agenda,
            agenda_questions=tools_selection_questions,
            save_dir=discussions_phase_to_dir["tools_selection"],
            save_name=f"discussion_{iteration_num + 1}",
            temperature=CREATIVE_TEMPERATURE,
            num_rounds=num_rounds,
        )
        print(f"✅ Finished meeting {save_name}")
    except Exception as e:
        print(f"❌ Meeting {save_name} failed with error: {e}")

# Tools selection - merge
tools_selection_summaries = load_summaries(
    discussion_paths=list(discussions_phase_to_dir["tools_selection"].glob("discussion_*.json")))
print(f"Number of summaries: {len(tools_selection_summaries)}")

tools_selection_merge_prompt = create_merge_prompt(
    agenda=tools_selection_agenda,
    agenda_questions=tools_selection_questions,
)

run_meeting(
    meeting_type="individual",
    team_member=principal_investigator,
    summaries=tools_selection_summaries,
    agenda=tools_selection_merge_prompt,
    save_dir=discussions_phase_to_dir["tools_selection"],
    save_name="merged",
    temperature=CONSISTENT_TEMPERATURE,
    num_rounds=num_rounds,
)

# Show merged meeting output for tool_selection
from IPython.display import Markdown, display

with open("discussions/tool_selection/merged.md", "r") as f:
    content = f.read()

display(Markdown(content))

Number of prior summaries: 1
🟡 Starting meeting discussion_1
DEBUGGING: Entering a team meeting...
the team lead is:
Principal Investigator
and the team members are:
(Computational Linguist, Biomedical Informatics Specialist, Data Visualization Expert, Scientific Critic)

DEBUGGING: Team meeting members = [Principal Investigator, Computational Linguist, Biomedical Informatics Specialist, Data Visualization Expert, Scientific Critic]


Team: 100%|██████████| 5/5 [01:02<00:00, 12.48s/it]<?, ?it/s]
Team: 100%|██████████| 5/5 [01:03<00:00, 12.66s/it]<03:07, 62.39s/it]
Team: 100%|██████████| 5/5 [01:02<00:00, 12.43s/it]<02:05, 62.91s/it]
Team:   0%|          | 0/5 [00:16<?, ?it/s]4 [03:07<01:02, 62.57s/it]
Rounds (+ Final Round): 100%|██████████| 4/4 [03:23<00:00, 50.98s/it]


Input token count: 86,524
Output token count: 8,070
Tool token count: 0
Max token length: 10,819
Cost: $0.30
Time: 3:26
✅ Finished meeting discussion_1
🟡 Starting meeting discussion_2
DEBUGGING: Entering a team meeting...
the team lead is:
Principal Investigator
and the team members are:
(Computational Linguist, Biomedical Informatics Specialist, Data Visualization Expert, Scientific Critic)

DEBUGGING: Team meeting members = [Principal Investigator, Computational Linguist, Biomedical Informatics Specialist, Data Visualization Expert, Scientific Critic]


Team: 100%|██████████| 5/5 [00:53<00:00, 10.79s/it]<?, ?it/s]
Team: 100%|██████████| 5/5 [00:51<00:00, 10.37s/it]<02:41, 53.93s/it]
Team: 100%|██████████| 5/5 [00:49<00:00,  9.84s/it]<01:45, 52.69s/it]
Team:   0%|          | 0/5 [00:17<?, ?it/s]4 [02:34<00:51, 51.09s/it]
Rounds (+ Final Round): 100%|██████████| 4/4 [02:52<00:00, 43.05s/it]


Input token count: 76,107
Output token count: 6,643
Tool token count: 0
Max token length: 9,392
Cost: $0.26
Time: 2:57
✅ Finished meeting discussion_2
🟡 Starting meeting discussion_3
DEBUGGING: Entering a team meeting...
the team lead is:
Principal Investigator
and the team members are:
(Computational Linguist, Biomedical Informatics Specialist, Data Visualization Expert, Scientific Critic)

DEBUGGING: Team meeting members = [Principal Investigator, Computational Linguist, Biomedical Informatics Specialist, Data Visualization Expert, Scientific Critic]


Team: 100%|██████████| 5/5 [00:51<00:00, 10.38s/it]<?, ?it/s]
Team: 100%|██████████| 5/5 [00:53<00:00, 10.80s/it]<02:35, 51.91s/it]
Team: 100%|██████████| 5/5 [00:50<00:00, 10.03s/it]<01:46, 53.13s/it]
Team:   0%|          | 0/5 [00:13<?, ?it/s]4 [02:36<00:51, 51.77s/it]
Rounds (+ Final Round): 100%|██████████| 4/4 [02:49<00:00, 42.48s/it]


Input token count: 77,802
Output token count: 6,778
Tool token count: 0
Max token length: 9,527
Cost: $0.26
Time: 2:52
✅ Finished meeting discussion_3
🟡 Starting meeting discussion_4
DEBUGGING: Entering a team meeting...
the team lead is:
Principal Investigator
and the team members are:
(Computational Linguist, Biomedical Informatics Specialist, Data Visualization Expert, Scientific Critic)

DEBUGGING: Team meeting members = [Principal Investigator, Computational Linguist, Biomedical Informatics Specialist, Data Visualization Expert, Scientific Critic]


Team: 100%|██████████| 5/5 [01:04<00:00, 12.90s/it]<?, ?it/s]
Team: 100%|██████████| 5/5 [00:57<00:00, 11.57s/it]<03:13, 64.51s/it]
Team: 100%|██████████| 5/5 [00:56<00:00, 11.26s/it]<02:01, 60.59s/it]
Team:   0%|          | 0/5 [00:15<?, ?it/s]4 [02:58<00:58, 58.64s/it]
Rounds (+ Final Round): 100%|██████████| 4/4 [03:14<00:00, 48.64s/it]


Input token count: 82,391
Output token count: 7,396
Tool token count: 0
Max token length: 10,145
Cost: $0.28
Time: 3:17
✅ Finished meeting discussion_4
🟡 Starting meeting discussion_5
DEBUGGING: Entering a team meeting...
the team lead is:
Principal Investigator
and the team members are:
(Computational Linguist, Biomedical Informatics Specialist, Data Visualization Expert, Scientific Critic)

DEBUGGING: Team meeting members = [Principal Investigator, Computational Linguist, Biomedical Informatics Specialist, Data Visualization Expert, Scientific Critic]


Team: 100%|██████████| 5/5 [00:56<00:00, 11.23s/it]<?, ?it/s]
Team: 100%|██████████| 5/5 [00:56<00:00, 11.25s/it]<02:48, 56.15s/it]
Team: 100%|██████████| 5/5 [00:51<00:00, 10.20s/it]<01:52, 56.20s/it]
Team:   0%|          | 0/5 [00:19<?, ?it/s]4 [02:43<00:53, 53.83s/it]
Rounds (+ Final Round): 100%|██████████| 4/4 [03:02<00:00, 45.69s/it]


Input token count: 79,822
Output token count: 6,870
Tool token count: 0
Max token length: 9,619
Cost: $0.27
Time: 3:05
✅ Finished meeting discussion_5
Number of summaries: 5
DEBUGGING: Individual meeting members = [Principal Investigator, Scientific Critic]


Team: 100%|██████████| 2/2 [00:30<00:00, 15.28s/it]<?, ?it/s]
Team: 100%|██████████| 2/2 [00:39<00:00, 19.59s/it]<01:31, 30.55s/it]
Team: 100%|██████████| 2/2 [00:29<00:00, 14.60s/it]<01:11, 35.62s/it]
Team:   0%|          | 0/2 [00:13<?, ?it/s]4 [01:38<00:32, 32.69s/it]
Rounds (+ Final Round): 100%|██████████| 4/4 [01:52<00:00, 28.12s/it]


Input token count: 52,913
Output token count: 5,332
Tool token count: 0
Max token length: 10,836
Cost: $0.19
Time: 1:53


## Implementation -----

In [None]:
# Implementation - prompts
implementation_agent_selection_agenda = f"{background_prompt} {project_specific_prompt} Your team needs to build three components of a nanobody design pipeline: BioBERT/ClinicalBERT, SNOMED CT APIs, Plotly/Dash (or D3.js), and SHAP or LIME. For each component, please select the team member who will implement the component. A team member may implement more than one component."

implementation_agent_selection_questions = (
    "Which team member will implement BioBERT/ClinicalBERT?",
    "Which team member will implement SNOMED CT APIs?",
    "Which team member will implement Plotly/Dash (or D3.js)?",
    "Which team member will implement SHAP or LIME?",
)

implementation_agent_selection_prior_summaries = load_summaries(
    discussion_paths=[discussions_phase_to_dir["team_selection"] / "merged.json",
                      discussions_phase_to_dir["project_specification"] / "merged.json",
                      discussions_phase_to_dir["tools_selection"] / "merged.json"])
print(f"Number of prior summaries: {len(implementation_agent_selection_prior_summaries)}")

# Implementation - discussion
for iteration_num in range(num_iterations):
    save_name = f"discussion_{iteration_num + 1}"
    try:
        print(f"🟡 Starting meeting {save_name}")
        run_meeting(
            meeting_type="individual",
            team_member=principal_investigator,
            summaries=implementation_agent_selection_prior_summaries,
            agenda=implementation_agent_selection_agenda,
            agenda_questions=implementation_agent_selection_questions,
            save_dir=discussions_phase_to_dir["implementation_agent_selection"],
            save_name=f"discussion_{iteration_num + 1}",
            temperature=CREATIVE_TEMPERATURE,
        )
        print(f"✅ Finished meeting {save_name}")
    except Exception as e:
        print(f"❌ Meeting {save_name} failed with error: {e}")

# Implementation - merge
implementation_agent_selection_summaries = load_summaries(
    discussion_paths=list(discussions_phase_to_dir["implementation_agent_selection"].glob("discussion_*.json")))
print(f"Number of summaries: {len(implementation_agent_selection_summaries)}")

implementation_agent_selection_merge_prompt = create_merge_prompt(
    agenda=implementation_agent_selection_agenda,
    agenda_questions=implementation_agent_selection_questions
)

run_meeting(
    meeting_type="individual",
    team_member=principal_investigator,
    summaries=implementation_agent_selection_summaries,
    agenda=implementation_agent_selection_merge_prompt,
    save_dir=discussions_phase_to_dir["implementation_agent_selection"],
    save_name="merged",
    temperature=CONSISTENT_TEMPERATURE,
)

# Show merged meeting output for implementation_agent_selection
from IPython.display import Markdown, display

with open("discussions/implementation_agent_selection/merged.md", "r") as f:
    content = f.read()

display(Markdown(content))

Number of prior summaries: 3
🟡 Starting meeting discussion_1
DEBUGGING: Individual meeting members = [Principal Investigator, Scientific Critic]


Team:   0%|          | 0/2 [00:12<?, ?it/s]1 [00:00<?, ?it/s]
Rounds (+ Final Round): 100%|██████████| 1/1 [00:12<00:00, 13.00s/it]


Input token count: 2,634
Output token count: 376
Tool token count: 0
Max token length: 3,010
Cost: $0.01
Time: 0:15
✅ Finished meeting discussion_1
🟡 Starting meeting discussion_2
DEBUGGING: Individual meeting members = [Principal Investigator, Scientific Critic]


Team:   0%|          | 0/2 [00:13<?, ?it/s]1 [00:00<?, ?it/s]
Rounds (+ Final Round): 100%|██████████| 1/1 [00:13<00:00, 13.41s/it]


Input token count: 2,634
Output token count: 534
Tool token count: 0
Max token length: 3,168
Cost: $0.01
Time: 0:14
✅ Finished meeting discussion_2
🟡 Starting meeting discussion_3
DEBUGGING: Individual meeting members = [Principal Investigator, Scientific Critic]


Team:   0%|          | 0/2 [00:14<?, ?it/s]1 [00:00<?, ?it/s]
Rounds (+ Final Round): 100%|██████████| 1/1 [00:14<00:00, 14.06s/it]


Input token count: 2,634
Output token count: 415
Tool token count: 0
Max token length: 3,049
Cost: $0.01
Time: 0:15
✅ Finished meeting discussion_3
🟡 Starting meeting discussion_4
DEBUGGING: Individual meeting members = [Principal Investigator, Scientific Critic]


Team:   0%|          | 0/2 [00:16<?, ?it/s]1 [00:00<?, ?it/s]
Rounds (+ Final Round): 100%|██████████| 1/1 [00:16<00:00, 16.18s/it]


Input token count: 2,634
Output token count: 424
Tool token count: 0
Max token length: 3,058
Cost: $0.01
Time: 0:17
✅ Finished meeting discussion_4
🟡 Starting meeting discussion_5
DEBUGGING: Individual meeting members = [Principal Investigator, Scientific Critic]


Team:   0%|          | 0/2 [00:09<?, ?it/s]1 [00:00<?, ?it/s]
Rounds (+ Final Round): 100%|██████████| 1/1 [00:09<00:00,  9.42s/it]


Input token count: 2,634
Output token count: 465
Tool token count: 0
Max token length: 3,099
Cost: $0.01
Time: 0:11
✅ Finished meeting discussion_5
Number of summaries: 5
DEBUGGING: Individual meeting members = [Principal Investigator, Scientific Critic]


Team:   0%|          | 0/2 [00:29<?, ?it/s]1 [00:00<?, ?it/s]
Rounds (+ Final Round): 100%|██████████| 1/1 [00:29<00:00, 29.66s/it]

Input token count: 2,659
Output token count: 630
Tool token count: 0
Max token length: 3,289
Cost: $0.01
Time: 0:31



