In [1]:
from mdcrow import MDCrow
import os 
import mdtraj as md

In [2]:
descriptive_prompt_3 = "Complete all of the following tasks: 1. Download the PDB file 1LYZ. 2. Tell me how many residues are in the chains and beta sheets. 3. Simulate for 1 ps at 300 K"
non_descriptive_prompt_3 = "Simulate 1LYZ for 1 ps at 300 K. Tell me how many residues are in the chains and beta sheets."

Required Tasks:
- Download
- SS Analysis
- Simulation

In [3]:
llm_model = "claude-3-5-sonnet-20240620"
tools = "all"

In [4]:
agent_1 = MDCrow(agent_type="Structured", model=llm_model, top_k_tools=tools)
agent_1.run(descriptive_prompt_3)

Certainly! I'll complete the tasks step by step using the provided tools.

Thought: First, I need to download the PDB file 1LYZ using the PDBFileDownloader tool.

Action:
```
{
    "action": "PDBFileDownloader",
    "action_input": "1LYZ"
}
```PDB file found with this ID: 1LYZ
Thought: Great, we have successfully downloaded the PDB file for 1LYZ. Now, let's analyze the structure to determine the number of residues in the chains and beta sheets using the ComputeDSSP tool.

Action:
```
{
    "action": "ComputeDSSP",
    "action_input": {
        "traj_file": "1LYZ_001947",
        "top_file": "1LYZ_001947",
        "target_frames": "first"
    }
}
```Thought: We have successfully analyzed the structure of 1LYZ. Now we need to set up and run a short simulation for 1 ps at 300 K. Let's use the SetUpandRunFunction tool for this purpose.

Action:
```
{
    "action": "SetUpandRunFunction",
    "action_input": {
        "pdb_id": "1LYZ_001947",
        "forcefield_files": ["amber14-all.xml", "

({'input': '\n    You are an expert molecular dynamics scientist, and\n    your task is to respond to the question or\n    solve the problem to the best of your ability using\n    the provided tools.\n\n    You can only respond with a single complete\n    \'Thought, Action, Action Input\' format\n    OR a single \'Final Answer\' format.\n\n    Complete format:\n    Thought: (reflect on your progress and decide what to do next)\n    Action:\n    ```\n    {\n        "action": (the action name, it should be the name of a tool),\n        "action_input": (the input string for the action)\n    }\n    \'\'\'\n\n    OR\n\n    Final Answer: (the final response to the original input\n    question, once all steps are complete)\n\n    You are required to use the tools provided,\n    using the most specific tool\n    available for each action.\n    Your final answer should contain all information\n    necessary to answer the question and its subquestions.\n    Before you finish, reflect on your pro

In [5]:
registry = agent_1.path_registry
print(registry.list_path_names_and_descriptions().replace(",", "\n"))

Files found in registry: 1LYZ_001947: PDB file downloaded from RSCB
 PDBFile ID: 1LYZ_001947
 rec0_001950: dssp values for trajectory with id: 1LYZ_001947
 1LYZ_002004: Cleaned File:  Removed Heterogens
 and Water Removed.  Replaced Nonstandard Residues. Added Hydrogens at pH 7.0. Missing Atoms Added and replaces nonstandard residues. 
 top_sim0_002010: Initial positions for simulation sim0_002010
 sim0_002010: Basic Simulation of Protein 1LYZ_002004
 rec0_002011: Simulation trajectory for protein 1LYZ_002004 and simulation sim0_002010
 rec1_002011: Simulation state log for protein 1LYZ_002004 and simulation sim0_002010
 rec2_002011: Simulation pdb frames for protein 1LYZ_002004 and simulation sim0_002010


In [6]:
traj_path_1 = registry.get_mapped_path("rec0_002011")
top_path_1 = registry.get_mapped_path("top_sim0_002010")

assert os.path.exists(traj_path_1)
assert os.path.exists(top_path_1)

In [7]:
path = registry.get_mapped_path("1LYZ_001947")
traj = md.load(path)
#get dssp 
number_of_chains = traj.n_chains
secondary_structure = md.compute_dssp(traj,simplified=True)
print("Number of chains: ",number_of_chains)
print("Number of sheets: ",len([i for i in secondary_structure[0] if i == 'E']))
print("Number of helices: ",len([i for i in secondary_structure[0] if i == 'H']))
print("Number of coils: ",len([i for i in secondary_structure[0] if i == 'C']))

Number of chains:  2
Number of sheets:  14
Number of helices:  50
Number of coils:  65


In [8]:
agent_2 = MDCrow(agent_type="Structured", model=llm_model, top_k_tools=tools)
agent_2.run(non_descriptive_prompt_3)

Thought: To answer this question, I need to follow these steps:
1. Download the PDB file for 1LYZ
2. Set up and run a short simulation
3. Analyze the structure to get information about residues and beta sheets

Let's start by downloading the PDB file.

Action:
```
{
    "action": "PDBFileDownloader",
    "action_input": "1LYZ"
}
```PDB file found with this ID: 1LYZ
Thought: Great, we have successfully downloaded the PDB file for 1LYZ. Now, let's set up and run a short simulation for 1 ps at 300 K.

Action:
```
{
    "action": "SetUpandRunFunction",
    "action_input": {
        "pdb_id": "1LYZ_002241",
        "forcefield_files": ["amber14-all.xml", "amber14/tip3pfb.xml"],
        "system_params": {
            "nonbondedMethod": "PME",
            "nonbondedCutoff": "1 * nanometers",
            "constraints": "HBonds",
            "rigidWater": true,
            "ewaldErrorTolerance": 0.0005,
            "constraintTolerance": 0.000001
        },
        "integrator_params": {
      

({'input': '\n    You are an expert molecular dynamics scientist, and\n    your task is to respond to the question or\n    solve the problem to the best of your ability using\n    the provided tools.\n\n    You can only respond with a single complete\n    \'Thought, Action, Action Input\' format\n    OR a single \'Final Answer\' format.\n\n    Complete format:\n    Thought: (reflect on your progress and decide what to do next)\n    Action:\n    ```\n    {\n        "action": (the action name, it should be the name of a tool),\n        "action_input": (the input string for the action)\n    }\n    \'\'\'\n\n    OR\n\n    Final Answer: (the final response to the original input\n    question, once all steps are complete)\n\n    You are required to use the tools provided,\n    using the most specific tool\n    available for each action.\n    Your final answer should contain all information\n    necessary to answer the question and its subquestions.\n    Before you finish, reflect on your pro

In [9]:
registry = agent_2.path_registry
print(registry.list_path_names_and_descriptions().replace(",", "\n"))

Files found in registry: 1LYZ_002241: PDB file downloaded from RSCB
 PDBFile ID: 1LYZ_002241
 1LYZ_002253: Cleaned File:  Removed Heterogens
 and Water Removed.  Replaced Nonstandard Residues. Added Hydrogens at pH 7.0. Missing Atoms Added and replaces nonstandard residues. 
 top_sim0_002300: Initial positions for simulation sim0_002300
 sim0_002300: Basic Simulation of Protein 1LYZ_002253
 rec0_002301: Simulation trajectory for protein 1LYZ_002253 and simulation sim0_002300
 rec1_002301: Simulation state log for protein 1LYZ_002253 and simulation sim0_002300
 rec2_002301: Simulation pdb frames for protein 1LYZ_002253 and simulation sim0_002300
 rec0_002449: dssp values for trajectory with id: rec2_002301


In [10]:
traj_path_2 = registry.get_mapped_path("rec0_002301")
top_path_2 = registry.get_mapped_path("top_sim0_002300")

assert os.path.exists(traj_path_2)
assert os.path.exists(top_path_2)

In [11]:
traj = md.load(traj_path_2, top=top_path_2)
traj = traj[-1]
#get dssp 
number_of_chains = traj.n_chains
secondary_structure = md.compute_dssp(traj,simplified=True)
print("Number of chains: ",number_of_chains)
print("Number of sheets: ",len([i for i in secondary_structure[0] if i == 'E']))
print("Number of helices: ",len([i for i in secondary_structure[0] if i == 'H']))
print("Number of coils: ",len([i for i in secondary_structure[0] if i == 'C']))

Number of chains:  1
Number of sheets:  8
Number of helices:  56
Number of coils:  65
