In [9]:
from mdcrow import MDCrow
import os
import mdtraj as md

In [4]:
descriptive_prompt_4 = "Complete all of the following tasks: 1. Download the PDB file 1LYZ. 2. Tell me how many residues are in the chains and beta sheets. 3. Simulate for 1 ps at 300 K. 4. Compute the RMSD."
non_descriptive_prompt_4 = "Simulate 1LYZ for 1 ps at 300 K. Tell me how many residues are in the chains and beta sheets and compute the RMSD of the simulation."

Required Tasks:
- Download
- SS Analysis
- Simulation
- RMSD

In [5]:
llm_model = "claude-3-opus-20240229"
tools = "all"

In [6]:
agent_1 = MDCrow(agent_type="Structured", model=llm_model, top_k_tools=tools)
agent_1.run(descriptive_prompt_4)

Thought: To answer this question, I will need to download the PDB file for 1LYZ, analyze its structure, run a short simulation, and compute the RMSD. I will use the PDBFileDownloader tool to get the PDB file.

Action:
```
{
  "action": "PDBFileDownloader",
  "action_input": "1LYZ"
}
```

PDB file found with this ID: 1LYZ
Thought: Now that I have the PDB file, I can use the GetTurnsBetaSheetsHelices tool to get information about the number of residues in the chains and beta sheets.

Action:
```
{
  "action": "GetTurnsBetaSheetsHelices",
  "action_input": {
    "query": "1LYZ",
    "primary_accession": "P00720"
  }
}
```

The primary accession number provided does not match any entry in the data, using the first entry instead.
The primary accession number provided does not match any entry in the data, using the first entry instead.
The primary accession number provided does not match any entry in the data, using the first entry instead.
It looks like the GetTurnsBetaSheetsHelices tool di

({'input': '\n    You are an expert molecular dynamics scientist, and\n    your task is to respond to the question or\n    solve the problem to the best of your ability using\n    the provided tools.\n\n    You can only respond with a single complete\n    \'Thought, Action, Action Input\' format\n    OR a single \'Final Answer\' format.\n\n    Complete format:\n    Thought: (reflect on your progress and decide what to do next)\n    Action:\n    ```\n    {\n        "action": (the action name, it should be the name of a tool),\n        "action_input": (the input string for the action)\n    }\n    \'\'\'\n\n    OR\n\n    Final Answer: (the final response to the original input\n    question, once all steps are complete)\n\n    You are required to use the tools provided,\n    using the most specific tool\n    available for each action.\n    Your final answer should contain all information\n    necessary to answer the question and its subquestions.\n    Before you finish, reflect on your pro

In [7]:
registry = agent_1.path_registry
print(registry.list_path_names_and_descriptions().replace(",", "\n"))

Files found in registry: 1LYZ_205358: PDB file downloaded from RSCB
 PDBFile ID: 1LYZ_205358
 1LYZ_205443: Cleaned File:  Removed Heterogens
 and Water Removed.  Replaced Nonstandard Residues. Added Hydrogens at pH 7.0. Missing Atoms Added and replaces nonstandard residues. 
 top_sim0_205457: Initial positions for simulation sim0_205457
 sim0_205457: Basic Simulation of Protein 1LYZ_205443
 rec0_205457: Simulation trajectory for protein 1LYZ_205443 and simulation sim0_205457
 rec1_205457: Simulation state log for protein 1LYZ_205443 and simulation sim0_205457
 rec2_205457: Simulation pdb frames for protein 1LYZ_205443 and simulation sim0_205457


In [10]:
traj_path = registry.get_mapped_path("rec0_205457")
top_path = registry.get_mapped_path("top_sim0_205457")

assert os.path.exists(traj_path)
assert os.path.exists(top_path)

path = registry.get_mapped_path("1LYZ_205443")
traj = md.load(path)
#get dssp 
number_of_chains = traj.n_chains
secondary_structure = md.compute_dssp(traj,simplified=True)
print("Number of chains: ",number_of_chains)
print("Number of sheets: ",len([i for i in secondary_structure[0] if i == 'E']))
print("Number of helices: ",len([i for i in secondary_structure[0] if i == 'H']))
print("Number of coils: ",len([i for i in secondary_structure[0] if i == 'C']))

Number of chains:  1
Number of sheets:  14
Number of helices:  50
Number of coils:  65


In [11]:
agent_2 = MDCrow(agent_type="Structured", model=llm_model, top_k_tools=tools)
agent_2.run(non_descriptive_prompt_4)

Thought: To simulate 1LYZ for 1 ps at 300 K, I will first need to download the PDB file. Then I can use the SetUpandRunFunction tool to set up and run the simulation. After that, I can use the ComputeDSSP tool to get information about the secondary structure, including the number of residues in beta sheets. I can also use the ComputeRMSD tool to calculate the RMSD of the simulation.

Action:
```
{
  "action": "PDBFileDownloader",
  "action_input": "1LYZ"
}
```

PDB file found with this ID: 1LYZ
Here is my next step:

Thought: Now that I have downloaded the 1LYZ PDB file, I can set up and run a short 1 ps simulation at 300 K using the SetUpandRunFunction tool. I will use default parameters for the system, integrator, and simulation, except I will set the temperature to 300 K and the number of steps to 500 to get 1 ps (0.002 ps timestep * 500 steps = 1 ps).

Action:
```
{
  "action": "SetUpandRunFunction",
  "action_input": {
    "pdb_id": "1LYZ_205702",
    "forcefield_files": ["amber14

KeyError: 'record_interval_steps'

In [12]:
registry = agent_2.path_registry
print(registry.list_path_names_and_descriptions().replace(",", "\n"))

Files found in registry: 1LYZ_205702: PDB file downloaded from RSCB
 PDBFile ID: 1LYZ_205702
 1LYZ_205756: Cleaned File:  Removed Heterogens
 and Water Removed.  Replaced Nonstandard Residues. Added Hydrogens at pH 7.0. Missing Atoms Added and replaces nonstandard residues. 
