In [1]:
from mdagent import MDAgent


In [2]:
#todays date and time
import datetime

start = datetime.datetime.now()
date = start.strftime("%Y-%m-%d")
print("date:",date)
time = start.strftime("%H:%M:%S")
print("time:",time)

date: 2024-09-30
time: 21:34:30


In [3]:
descriptive_prompt_5 =  "Complete the following MD tasks: 1. Download the PDB file 1XQ8. 2. Preprocess the file by adding missing atoms, hydrogens and replacing non-standard residues. 3. Analyze the secondary structure of the PDB file. 4. Simulate for 1 ps at 300 K. 5. Analyze the secondary structure from the simulation and tell me if it changed from before the simulation to after."
non_descriptive_prompt_5 = "Simulate 1XQ8 for 1 ps at a temperature of 300 K. Then, tell me if the secondary structure changed from before the simulation to after."

In [4]:
llm_model = "gpt-4o-2024-08-06"
tools = "all"

In [5]:
agent_1 = MDAgent(agent_type="Structured", model=llm_model, top_k_tools=tools)

##### descriptive prompt

In [6]:
start1 = datetime.datetime.now()
agent_1.run(descriptive_prompt_5)

Thought: I will proceed with the tasks in the order given. First, I will download the PDB file 1XQ8. Then, I will preprocess the file by adding missing atoms, hydrogens, and replacing non-standard residues. After preprocessing, I will analyze the secondary structure of the PDB file. Once the analysis is complete, I will set up and run a short simulation for 1 ps at 300 K. Finally, I will analyze the secondary structure from the simulation and compare it to the initial structure to determine if there were any changes.

Action:
```
{
  "action": "PDBFileDownloader",
  "action_input": {
    "query": "1XQ8"
  }
}
```PDB file found with this ID: 1XQ8
Action:
```
{
  "action": "CleaningToolFunction",
  "action_input": {
    "pdb_id": "1XQ8_213435",
    "replace_nonstandard_residues": true,
    "add_missing_atoms": true,
    "remove_heterogens": false,
    "remove_water": false,
    "add_hydrogens": true,
    "add_hydrogens_ph": 7.0
  }
}
```file path: /Users/jorgemedina/Desktop/md-agent/ckpt



Action:
```json
{
  "action": "SetUpandRunFunction",
  "action_input": {
    "pdb_id": "1XQ8_213437",
    "forcefield_files": [],
    "save": true,
    "system_params": {
      "nonbondedMethod": "NoCutoff",
      "constraints": "None",
      "rigidWater": false,
      "solvate": false
    },
    "integrator_params": {
      "integrator_type": "LangevinMiddle",
      "Temperature": "300 * kelvin",
      "Friction": "1.0 / picoseconds",
      "Timestep": "0.002 * picoseconds"
    },
    "simulation_params": {
      "Ensemble": "NVT",
      "Number of Steps": 500,
      "record_interval_steps": 5,
      "record_params": ["step", "potentialEnergy", "temperature"]
    }
  }
}
```
Setting default forcefields v2
Building system...
About to create system...
Attempts at creating system: 0/3
adding system without solvent
System built successfully
<openmm.openmm.System; proxy of <Swig Object of type 'OpenMM::System *' at 0x3005924c0> >
Setting up integrator...
Creating simulation...
simulation s



To complete the analysis, I need to compare the secondary structure before and after the simulation. Initially, there were 83 residues in helix, 2 in strand, and 55 in coil. After the simulation, there are 82 residues in helix, 0 in strand, and 58 in coil. This indicates a slight change in the secondary structure, with a decrease in helical and strand content and an increase in coil content.

Final Answer: The secondary structure of the protein changed slightly after the simulation. Initially, there were 83 residues in helix, 2 in strand, and 55 in coil. After the simulation, there are 82 residues in helix, 0 in strand, and 58 in coil. This suggests a decrease in helical and strand content and an increase in coil content.

({'input': '\n    You are an expert molecular dynamics scientist, and\n    your task is to respond to the question or\n    solve the problem to the best of your ability using\n    the provided tools.\n\n    You can only respond with a single complete\n    \'Thought, Action, Action Input\' format\n    OR a single \'Final Answer\' format.\n\n    Complete format:\n    Thought: (reflect on your progress and decide what to do next)\n    Action:\n    ```\n    {\n        "action": (the action name, it should be the name of a tool),\n        "action_input": (the input string for the action)\n    }\n    \'\'\'\n\n    OR\n\n    Final Answer: (the final response to the original input\n    question, once all steps are complete)\n\n    You are required to use the tools provided,\n    using the most specific tool\n    available for each action.\n    Your final answer should contain all information\n    necessary to answer the question and its subquestions.\n    Before you finish, reflect on your pro

In [7]:
end = datetime.datetime.now()
elapsed = end - start1
print(f"agent_1 duration: {elapsed.total_seconds()/60:.2f} minutes")

agent_1 duration: 0.57 minutes


In [8]:
registry = agent_1.path_registry
all_names = registry.list_path_names()
print(all_names)

Names found in registry: 1XQ8_213435, 1XQ8_213437, rec0_213438, top_sim0_213441, sim0_213441, rec0_213441, rec1_213441, rec2_213441, rec0_213504


In [9]:
paths_and_descriptions = registry.list_path_names_and_descriptions()
print("\n".join(paths_and_descriptions.split(",")))

Files found in registry: 1XQ8_213435: PDB file downloaded from RSCB
 PDBFile ID: 1XQ8_213435
 1XQ8_213437: Cleaned File:  Replaced Nonstandard Residues. Added Hydrogens at pH 7.0. Missing Atoms Added and replaces nonstandard residues. 
 rec0_213438: dssp values for trajectory with id: 1XQ8_213437
 top_sim0_213441: Initial positions for simulation sim0_213441
 sim0_213441: Basic Simulation of Protein 1XQ8_213437
 rec0_213441: Simulation trajectory for protein 1XQ8_213437 and simulation sim0_213441
 rec1_213441: Simulation state log for protein 1XQ8_213437 and simulation sim0_213441
 rec2_213441: Simulation pdb frames for protein 1XQ8_213437 and simulation sim0_213441
 rec0_213504: dssp values for trajectory with id: rec2_213441


In [12]:
import mdtraj as md
import numpy as np

traj_path = registry.get_mapped_path("rec0_093516")
top_path = registry.get_mapped_path("top_sim0_093516")
traj = md.load(traj_path, top=top_path)

# Compute the secondary structure of the trajectory
dssp_final = md.compute_dssp(traj[-1], simplified=True)
dssp_initial = md.compute_dssp(traj[0], simplified=True)

print('=== BEFORE ===')
print("Number of initial sheets: ",len([i for i in dssp_initial[0] if i == 'E']))
print("Number of initial helices: ",len([i for i in dssp_initial[0] if i == 'H']))
print("Number of initial coils: ",len([i for i in dssp_initial[0] if i == 'C']))

print('=== AFTER ===')
print("Number of final sheets: ",len([i for i in dssp_final[0] if i == 'E']))
print("Number of final helices: ",len([i for i in dssp_final[0] if i == 'H']))
print("Number of final coils: ",len([i for i in dssp_final[0] if i == 'C']))

=== BEFORE ===
Number of initial sheets:  2
Number of initial helices:  84
Number of initial coils:  54
=== AFTER ===
Number of final sheets:  0
Number of final helices:  75
Number of final coils:  65


##### non-descriptive prompt

In [None]:
# this part was run next morning

In [13]:
agent_2 = MDAgent(agent_type="Structured", model=llm_model, top_k_tools=tools)
start2 = datetime.datetime.now()
agent_2.run(non_descriptive_prompt_5)

Thought: To address the request, I need to perform a simulation of the protein with PDB ID 1XQ8 for 1 picosecond at 300 K. After the simulation, I will analyze the secondary structure before and after the simulation to determine any changes. First, I will download the PDB file for 1XQ8, then set up and run the simulation, and finally, compute the DSSP for both the initial and final frames to compare the secondary structures.

Action:
```
{
  "action": "PDBFileDownloader",
  "action_input": {
    "query": "1XQ8"
  }
}
```PDB file found with this ID: 1XQ8
To proceed, I need to set up and run a short simulation of the protein using the downloaded PDB file. I will configure the simulation parameters to run for 1 picosecond at a temperature of 300 K.

Action:
```
{
  "action": "SetUpandRunFunction",
  "action_input": {
    "pdb_id": "1XQ8_114439",
    "forcefield_files": ["amber99sb.xml", "tip3p.xml"],
    "save": true,
    "system_params": {
      "nonbondedMethod": "NoCutoff",
      "cons



To complete the analysis, I need to compute the DSSP for the last frame of the trajectory and compare it with the initial frame to determine any changes in the secondary structure.

Action:
```
{
  "action": "ComputeDSSP",
  "action_input": {
    "traj_file": "rec0_114448",
    "top_file": "top_sim0_114448",
    "target_frames": "last"
  }
}
```



The DSSP analysis for both the initial and final frames of the simulation shows that the secondary structure of the protein remained unchanged. The number of residues in helix, strand, and coil are identical before and after the simulation. Therefore, there was no change in the secondary structure during the 1 picosecond simulation at 300 K.

Action:
```
{
  "action": "Final Answer",
  "action_input": "The secondary structure of the protein 1XQ8 did not change during the 1 picosecond simulation at 300 K. The number of residues in helix, strand, and coil remained the same before and after the simulation."
}
```

({'input': '\n    You are an expert molecular dynamics scientist, and\n    your task is to respond to the question or\n    solve the problem to the best of your ability using\n    the provided tools.\n\n    You can only respond with a single complete\n    \'Thought, Action, Action Input\' format\n    OR a single \'Final Answer\' format.\n\n    Complete format:\n    Thought: (reflect on your progress and decide what to do next)\n    Action:\n    ```\n    {\n        "action": (the action name, it should be the name of a tool),\n        "action_input": (the input string for the action)\n    }\n    \'\'\'\n\n    OR\n\n    Final Answer: (the final response to the original input\n    question, once all steps are complete)\n\n    You are required to use the tools provided,\n    using the most specific tool\n    available for each action.\n    Your final answer should contain all information\n    necessary to answer the question and its subquestions.\n    Before you finish, reflect on your pro

In [14]:
end = datetime.datetime.now()
elapsed = end - start2
print(f"agent_2 duration: {elapsed.total_seconds()/60:.2f} minutes")

agent_2 duration: 0.60 minutes


In [15]:
registry = agent_2.path_registry
all_names = registry.list_path_names()
print(all_names)

Names found in registry: 1XQ8_114439, top_sim0_114448, sim0_114448, rec0_114448, rec1_114448, rec2_114448, rec0_114508, rec0_114510


In [16]:
paths_and_descriptions = registry.list_path_names_and_descriptions()
print("\n".join(paths_and_descriptions.split(",")))

Files found in registry: 1XQ8_114439: PDB file downloaded from RSCB
 PDBFile ID: 1XQ8_114439
 top_sim0_114448: Initial positions for simulation sim0_114448
 sim0_114448: Basic Simulation of Protein 1XQ8_114439
 rec0_114448: Simulation trajectory for protein 1XQ8_114439 and simulation sim0_114448
 rec1_114448: Simulation state log for protein 1XQ8_114439 and simulation sim0_114448
 rec2_114448: Simulation pdb frames for protein 1XQ8_114439 and simulation sim0_114448
 rec0_114508: dssp values for trajectory with id: rec0_114448
 rec0_114510: dssp values for trajectory with id: rec0_114448


In [18]:
import mdtraj as md
import numpy as np

traj_path = registry.get_mapped_path("rec0_114448")
top_path = registry.get_mapped_path("top_sim0_114448")
traj = md.load(traj_path, top=top_path)

# Compute the secondary structure of the trajectory
dssp_final = md.compute_dssp(traj[-1], simplified=True)
dssp_initial = md.compute_dssp(traj[0], simplified=True)

print('=== BEFORE ===')
print("Number of initial sheets: ",len([i for i in dssp_initial[0] if i == 'E']))
print("Number of initial helices: ",len([i for i in dssp_initial[0] if i == 'H']))
print("Number of initial coils: ",len([i for i in dssp_initial[0] if i == 'C']))

print('=== AFTER ===')
print("Number of final sheets: ",len([i for i in dssp_final[0] if i == 'E']))
print("Number of final helices: ",len([i for i in dssp_final[0] if i == 'H']))
print("Number of final coils: ",len([i for i in dssp_final[0] if i == 'C']))

=== BEFORE ===
Number of initial sheets:  0
Number of initial helices:  82
Number of initial coils:  58
=== AFTER ===
Number of final sheets:  0
Number of final helices:  82
Number of final coils:  58


