In [7]:
from mdcrow import MDCrow
import os 
import mdtraj as md

In [2]:
descriptive_prompt_3 = "Complete all of the following tasks: 1. Download the PDB file 1LYZ. 2. Tell me how many residues are in the chains and beta sheets. 3. Simulate for 1 ps at 300 K"
non_descriptive_prompt_3 = "Simulate 1LYZ for 1 ps at 300 K. Tell me how many residues are in the chains and beta sheets."

Required Tasks:
- Download
- SS Analysis
- Simulation

In [3]:
llm_model = "claude-3-opus-20240229"
tools = "all"

In [4]:
agent_1 = MDCrow(agent_type="Structured", model=llm_model, top_k_tools=tools)
agent_1.run(descriptive_prompt_3)

Thought: To answer this question, I will need to download the PDB file for 1LYZ, analyze its structure, and run a short simulation. I will use the PDBFileDownloader tool to get the PDB file, the GetTurnsBetaSheetsHelices tool to get information about the residues and beta sheets, and the SetUpandRunFunction tool to run the simulation.

Action:
```
{
  "action": "PDBFileDownloader",
  "action_input": "1LYZ"
}
```

PDB file found with this ID: 1LYZ
Thought: Now that I have the PDB file, I can use the GetTurnsBetaSheetsHelices tool to get information about the number of residues and beta sheets in the protein.

Action:
```
{
  "action": "GetTurnsBetaSheetsHelices",
  "action_input": {
    "query": "1LYZ",
    "primary_accession": "P00720"
  }
}
```

The primary accession number provided does not match any entry in the data, using the first entry instead.
The primary accession number provided does not match any entry in the data, using the first entry instead.
The primary accession number 

({'input': '\n    You are an expert molecular dynamics scientist, and\n    your task is to respond to the question or\n    solve the problem to the best of your ability using\n    the provided tools.\n\n    You can only respond with a single complete\n    \'Thought, Action, Action Input\' format\n    OR a single \'Final Answer\' format.\n\n    Complete format:\n    Thought: (reflect on your progress and decide what to do next)\n    Action:\n    ```\n    {\n        "action": (the action name, it should be the name of a tool),\n        "action_input": (the input string for the action)\n    }\n    \'\'\'\n\n    OR\n\n    Final Answer: (the final response to the original input\n    question, once all steps are complete)\n\n    You are required to use the tools provided,\n    using the most specific tool\n    available for each action.\n    Your final answer should contain all information\n    necessary to answer the question and its subquestions.\n    Before you finish, reflect on your pro

In [5]:
registry = agent_1.path_registry
print(registry.list_path_names_and_descriptions().replace(",", "\n"))

Files found in registry: 1LYZ_204757: PDB file downloaded from RSCB
 PDBFile ID: 1LYZ_204757
 1LYZ_204830: Cleaned File:  Removed Heterogens
 and Water Removed.  Replaced Nonstandard Residues. Added Hydrogens at pH 7.0. Missing Atoms Added and replaces nonstandard residues. 
 top_sim0_204841: Initial positions for simulation sim0_204841
 sim0_204841: Basic Simulation of Protein 1LYZ_204830
 rec0_204841: Simulation trajectory for protein 1LYZ_204830 and simulation sim0_204841
 rec1_204841: Simulation state log for protein 1LYZ_204830 and simulation sim0_204841
 rec2_204841: Simulation pdb frames for protein 1LYZ_204830 and simulation sim0_204841


In [8]:
traj_path_1 = registry.get_mapped_path("rec0_204841")
top_path_1 = registry.get_mapped_path("top_sim0_204841")

assert os.path.exists(traj_path_1)
assert os.path.exists(top_path_1)

In [9]:
path = registry.get_mapped_path("1LYZ_204757")
traj = md.load(path)
#get dssp 
number_of_chains = traj.n_chains
secondary_structure = md.compute_dssp(traj,simplified=True)
print("Number of chains: ",number_of_chains)
print("Number of sheets: ",len([i for i in secondary_structure[0] if i == 'E']))
print("Number of helices: ",len([i for i in secondary_structure[0] if i == 'H']))
print("Number of coils: ",len([i for i in secondary_structure[0] if i == 'C']))

Number of chains:  2
Number of sheets:  14
Number of helices:  50
Number of coils:  65


In [10]:
agent_2 = MDCrow(agent_type="Structured", model=llm_model, top_k_tools=tools)
agent_2.run(non_descriptive_prompt_3)

Thought: To simulate 1LYZ and analyze its structure, I will first download the PDB file, then set up and run a short simulation. After that, I can use analysis tools to determine the number of residues in the chains and beta sheets.

Action:
```
{
  "action": "PDBFileDownloader",
  "action_input": "1LYZ"
}
```

PDB file found with this ID: 1LYZ
Thought: Now that I have the PDB file, I can set up and run a short 1 ps simulation at 300 K. I will use the SetUpandRunFunction tool to do this.

Action:
```
{
  "action": "SetUpandRunFunction",
  "action_input": {
    "pdb_id": "1LYZ_205117",
    "forcefield_files": ["amber14/protein.ff14SB.xml", "amber14/tip3p.xml"],
    "save": true,
    "system_params": {
      "nonbondedMethod": "PME", 
      "nonbondedCutoff": 1.0,
      "constraints": "HBonds",
      "rigidWater": true,
      "constraintTolerance": 0.000001,
      "ewaldErrorTolerance": 0.0005,
      "solvate": true
    },
    "integrator_params": {
      "integrator_type": "LangevinMidd

({'input': '\n    You are an expert molecular dynamics scientist, and\n    your task is to respond to the question or\n    solve the problem to the best of your ability using\n    the provided tools.\n\n    You can only respond with a single complete\n    \'Thought, Action, Action Input\' format\n    OR a single \'Final Answer\' format.\n\n    Complete format:\n    Thought: (reflect on your progress and decide what to do next)\n    Action:\n    ```\n    {\n        "action": (the action name, it should be the name of a tool),\n        "action_input": (the input string for the action)\n    }\n    \'\'\'\n\n    OR\n\n    Final Answer: (the final response to the original input\n    question, once all steps are complete)\n\n    You are required to use the tools provided,\n    using the most specific tool\n    available for each action.\n    Your final answer should contain all information\n    necessary to answer the question and its subquestions.\n    Before you finish, reflect on your pro

In [11]:
registry = agent_2.path_registry
print(registry.list_path_names_and_descriptions().replace(",", "\n"))

Files found in registry: 1LYZ_205117: PDB file downloaded from RSCB
 PDBFile ID: 1LYZ_205117
 1LYZ_205142: Cleaned File:  Removed Heterogens
 and Water Removed.  Replaced Nonstandard Residues. Added Hydrogens at pH 7.0. Missing Atoms Added and replaces nonstandard residues. 
 top_sim0_205159: Initial positions for simulation sim0_205159
 sim0_205159: Basic Simulation of Protein 1LYZ_205142
 rec0_205202: Simulation trajectory for protein 1LYZ_205142 and simulation sim0_205159
 rec1_205202: Simulation state log for protein 1LYZ_205142 and simulation sim0_205159
 rec2_205202: Simulation pdb frames for protein 1LYZ_205142 and simulation sim0_205159
 rec0_205645: dssp values for trajectory with id: rec0_205202


In [12]:
traj_path_2 = registry.get_mapped_path("rec0_205202")
top_path_2 = registry.get_mapped_path("top_sim0_205159")

assert os.path.exists(traj_path_2)
assert os.path.exists(top_path_2)

In [15]:
traj = md.load(traj_path_2, top=top_path_2)
traj = traj[-1]
#get dssp 
number_of_chains = traj.n_chains
secondary_structure = md.compute_dssp(traj,simplified=True)
print("Number of chains: ",number_of_chains)
print("Number of sheets: ",len([i for i in secondary_structure[0] if i == 'E']))
print("Number of helices: ",len([i for i in secondary_structure[0] if i == 'H']))
print("Number of coils: ",len([i for i in secondary_structure[0] if i == 'C']))

Number of chains:  3
Number of sheets:  16
Number of helices:  52
Number of coils:  61
