In [1]:
import datetime
from mdcrow import MDCrow

In [2]:
prompt3 = "Download the PDB file for protein 1GZX. Then, analyze the secondary structure of \
    the protein and provide information on how many helices, sheets, and other components are present. Get the gene names for this protein."
llm_var = "gpt-4o-2024-08-06"
tools = "all"
agent = MDCrow(agent_type="Structured", model=llm_var, top_k_tools=tools)
now = datetime.datetime.now()
date = now.strftime("%Y-%m-%d")
print("date and time:",date)
time = now.strftime("%H:%M:%S")
print("time:",time)
print("LLM: ",agent.llm.model_name,"\nTemperature: ",agent.llm.temperature)

date and time: 2024-10-17
time: 18:14:08
LLM:  gpt-4o-2024-08-06 
Temperature:  0.1


In [3]:
agent.run(prompt3)

Thought: To address the question, I need to first download the PDB file for the protein 1GZX. Then, I will analyze its secondary structure to determine the number of helices, sheets, and other components. Finally, I will retrieve the gene names associated with this protein. I will start by downloading the PDB file.

Action:
```json
{
  "action": "PDBFileDownloader",
  "action_input": "1GZX"
}
```PDB file found with this ID: 1GZX
To proceed, I need to analyze the secondary structure of the downloaded PDB file for protein 1GZX to determine the number of helices, sheets, and other components. After that, I will retrieve the gene names associated with this protein.

Action:
```json
{
  "action": "ComputeDSSP",
  "action_input": {
    "traj_file": "1GZX_181414",
    "target_frames": "all"
  }
}
```To complete the analysis, I need to summarize the secondary structure information and then retrieve the gene names associated with the protein 1GZX.

Action:
```json
{
  "action": "GetGeneNames",


({'input': '\n    You are an expert molecular dynamics scientist, and\n    your task is to respond to the question or\n    solve the problem to the best of your ability using\n    the provided tools.\n\n    You can only respond with a single complete\n    \'Thought, Action, Action Input\' format\n    OR a single \'Final Answer\' format.\n\n    Complete format:\n    Thought: (reflect on your progress and decide what to do next)\n    Action:\n    ```\n    {\n        "action": (the action name, it should be the name of a tool),\n        "action_input": (the input string for the action)\n    }\n    \'\'\'\n\n    OR\n\n    Final Answer: (the final response to the original input\n    question, once all steps are complete)\n\n    You are required to use the tools provided,\n    using the most specific tool\n    available for each action.\n    Your final answer should contain all information\n    necessary to answer the question and its subquestions.\n    Before you finish, reflect on your pro

In [4]:
registry = agent.path_registry
print(registry.list_path_names_and_descriptions())
print(registry.ckpt_dir)

Files found in registry: 1GZX_181414: PDB file downloaded from RSCB, PDBFile ID: 1GZX_181414, rec0_181417: dssp values for trajectory with id: 1GZX_181414
/gpfs/fs2/scratch/qcampbe2/repos/md-agent/ckpt/ckpt_5


In [7]:
import mdtraj as md
path = registry.get_mapped_path("1GZX_181414")

traj = md.load(path)
top = traj.topology
number_of_chains = top.n_chains
secondary_structure = md.compute_dssp(traj,simplified=True)
print("Number of chains: ",number_of_chains)
print("Number of sheets: ",len([i for i in secondary_structure[0] if i == 'E']))
print("Number of helices: ",len([i for i in secondary_structure[0] if i == 'H']))
print("Number of coils: ",len([i for i in secondary_structure[0] if i == 'C']))


Number of chains:  12
Number of sheets:  0
Number of helices:  444
Number of coils:  130
