In [1]:
import datetime
from langchain.agents import AgentExecutor
from langchain.agents.structured_chat.base import StructuredChatAgent
from langchain.prompts import PromptTemplate
from langchain_core.tools import Tool
from langchain_experimental.utilities import PythonREPL
from mdcrow.utils import _make_llm

In [2]:
prompt = "Write me a code to run simulation of fibronectin."
model = "gpt-4o-2024-08-06"

In [3]:
# Parameters
prompt = "Analyze the RDF of the simulation of 1A3N solvated in water."


In [4]:
llm = _make_llm(model, temp=0.1, streaming=True)

python_repl = PythonREPL()
repl_tool = Tool(
    name="python_repl",
    description="A Python shell. Use this to execute python commands. Input should be a valid python command. If you want to see the output of a value, you should print it out with `print(...)`.",
    func=python_repl.run,
)
tools = [repl_tool]

structured_prompt = PromptTemplate(
    input_variables=["input"],
    template="""
    You are an expert molecular dynamics scientist, and your 
    task is to respond to the question or 
    solve the problem in its entirety to the best of your ability. 
    If any part of the task requires you to perform an action that 
    you are not capable of completing, please write a runnable 
    Python script for that step and run it. For literature papers, 
    use and process papers from the `paper_collection` folder. 
    For .pdb files, download them from the RSCB website using `requests`. 
    To preprocess PDB files, you will use PDBFixer. 
    To get information about proteins, retrieve data from the UniProt database. 
    For anything related to simulations, you will use OpenMM, 
    and for anything related to analyses, you will use MDTraj. 

    You can only respond with a single complete
    'Thought, Action, Action Input' format
    OR a single 'Final Answer' format.

    Complete format:
    Thought: (reflect on your progress and decide what to do next)
    Action:
    ```
    {{
        "action": (the action name, it should be the name of a tool),
        "action_input": (the input string for the action)
    }}
    '''

    OR

    Final Answer: (the final response to the original input
    question, once all steps are complete)

    You are required to use the tools provided,
    using the most specific tool
    available for each action.
    Your final answer should contain all information
    necessary to answer the question and its subquestions.
    Before you finish, reflect on your progress and make
    sure you have addressed the question in its entirety.

    Here is the input:
    Question: {input} """,
)

agent = StructuredChatAgent.from_llm_and_tools(
    llm=llm, tools=tools
)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)


In [5]:
now = datetime.datetime.now()
date = now.strftime("%Y-%m-%d")
print("date:",date)
time = now.strftime("%H:%M:%S")
print("time:",time)
print("LLM: ",llm.model_name,"\nTemperature: ",llm.temperature)

date: 2024-10-17
time: 23:55:03
LLM:  gpt-4o-2024-08-06 
Temperature:  0.1


In [6]:
messages = structured_prompt.format(input=prompt)
agent_executor.invoke({"input": messages})



[1m> Entering new AgentExecutor chain...[0m


Thought

:

 To

 analyze

 the

 radial

 distribution

 function

 (

R

DF

)

 of

 the

 simulation

 of

 the

 protein

 

1

A

3

N

 solv

ated

 in

 water

,

 I

 need

 to

 perform

 the

 following

 steps

:



1

.

 Download

 the

 P

DB

 file

 for

 

1

A

3

N

 from

 the

 R

CS

B

 website

.


2

.

 Pre

process

 the

 P

DB

 file

 using

 P

DB

Fix

er

 to

 ensure

 it

 is

 suitable

 for

 simulation

.


3

.

 Set

 up

 and

 run

 a

 molecular

 dynamics

 simulation

 using

 Open

MM

.


4

.

 Analyze

 the

 trajectory

 using

 MDT

raj

 to

 compute

 the

 RDF

.



I

 will

 start

 by

 downloading

 the

 P

DB

 file

 for

 

1

A

3

N

.



Action

:


```

json




{


 

 "

action

":

 "

python

_re

pl

",


 

 "

action

_input

":

 "

import

 requests

\n

\n

#

 Download

 the

 P

DB

 file

 for

 

1

A

3

N

\n

url

 =

 '

https

://

files

.rc

sb

.org

/download

/

1

A

3

N

.p

db

'\

n

response

 =

 requests

.get

(url

)\

n

\n

#

 Save

 the

 P

DB

 file

\n

with

 open

('

1

A

3

N

.p

db

',

 '

wb

')

 as

 file

:\

n

   

 file

.write

(response

.content

)\

n

\n

'

1

A

3

N

.p

db

 downloaded

 successfully

'"


}


```

Python REPL can execute arbitrary code. Use with caution.


[32;1m[1;3mThought: To analyze the radial distribution function (RDF) of the simulation of the protein 1A3N solvated in water, I need to perform the following steps:

1. Download the PDB file for 1A3N from the RCSB website.
2. Preprocess the PDB file using PDBFixer to ensure it is suitable for simulation.
3. Set up and run a molecular dynamics simulation using OpenMM.
4. Analyze the trajectory using MDTraj to compute the RDF.

I will start by downloading the PDB file for 1A3N.

Action:
```json
{
  "action": "python_repl",
  "action_input": "import requests\n\n# Download the PDB file for 1A3N\nurl = 'https://files.rcsb.org/download/1A3N.pdb'\nresponse = requests.get(url)\n\n# Save the PDB file\nwith open('1A3N.pdb', 'wb') as file:\n    file.write(response.content)\n\n'1A3N.pdb downloaded successfully'"
}
```[0m


Observation: [36;1m[1;3m[0m
Thought:

To

 continue

 with

 the

 analysis

 of

 the

 RDF

 for

 the

 simulation

 of

 

1

A

3

N

 solv

ated

 in

 water

,

 I

 need

 to

 preprocess

 the

 downloaded

 P

DB

 file

 using

 P

DB

Fix

er

 to

 ensure

 it

 is

 suitable

 for

 simulation

.



Action

:


```

json




{


 

 "

action

":

 "

python

_re

pl

",


 

 "

action

_input

":

 "

from

 pdb

fix

er

 import

 P

DB

Fix

er

\n

from

 open

mm

.app

 import

 P

DB

File

\n

\n

#

 Load

 the

 P

DB

 file

\n

fix

er

 =

 P

DB

Fix

er

(filename

='

1

A

3

N

.p

db

')

\

n

\n

#

 Find

 missing

 residues

 and

 add

 them

\n

fix

er

.find

Missing

Resid

ues

()\

n

fix

er

.find

Missing

Atoms

()\

n

fix

er

.add

Missing

Atoms

()\

n

fix

er

.add

Missing

Hyd

rog

ens

()\

n

\n

#

 Save

 the

 fixed

 P

DB

 file

\n

with

 open

('

1

A

3

N

_fixed

.p

db

',

 '

w

')

 as

 file

:\

n

   

 P

DB

File

.write

File

(f

ixer

.top

ology

,

 fixer

.positions

,

 file

)\

n

\n

'

1

A

3

N

_fixed

.p

db

 created

 successfully

'"


}


```

[32;1m[1;3mTo continue with the analysis of the RDF for the simulation of 1A3N solvated in water, I need to preprocess the downloaded PDB file using PDBFixer to ensure it is suitable for simulation.

Action:
```json
{
  "action": "python_repl",
  "action_input": "from pdbfixer import PDBFixer\nfrom openmm.app import PDBFile\n\n# Load the PDB file\nfixer = PDBFixer(filename='1A3N.pdb')\n\n# Find missing residues and add them\nfixer.findMissingResidues()\nfixer.findMissingAtoms()\nfixer.addMissingAtoms()\nfixer.addMissingHydrogens()\n\n# Save the fixed PDB file\nwith open('1A3N_fixed.pdb', 'w') as file:\n    PDBFile.writeFile(fixer.topology, fixer.positions, file)\n\n'1A3N_fixed.pdb created successfully'"
}
```[0m


Observation: [36;1m[1;3m[0m
Thought:

To

 proceed

 with

 the

 analysis

,

 I

 need

 to

 set

 up

 and

 run

 a

 molecular

 dynamics

 simulation

 using

 Open

MM

 with

 the

 pre

processed

 P

DB

 file

.

 This

 involves

 creating

 a

 system

,

 adding

 a

 solvent

,

 and

 running

 the

 simulation

 to

 generate

 a

 trajectory

 file

 for

 analysis

.



Action

:


```

json




{


 

 "

action

":

 "

python

_re

pl

",


 

 "

action

_input

":

 "

from

 open

mm

.app

 import

 *

\

n

from

 open

mm

 import

 *

\

n

from

 open

mm

.unit

 import

 *

\

n

from

 sys

 import

 stdout

\n

\n

#

 Load

 the

 fixed

 P

DB

 file

\n

p

db

 =

 P

DB

File

('

1

A

3

N

_fixed

.p

db

')

\

n

\n

#

 Create

 a

 force

 field

\n

force

field

 =

 Force

Field

('

amber

14

-all

.xml

',

 '

amber

14

/t

ip

3

pf

b

.xml

')

\

n

\n

#

 Create

 a

 system

 and

 add

 a

 solvent

\n

system

 =

 force

field

.create

System

(p

db

.top

ology

,

 non

bond

ed

Method

=

PM

E

,

 non

bond

ed

Cut

off

=

1

.

0

*

nan

ometer

,

 constraints

=

HB

onds

)\

n

mod

eller

 =

 Mod

eller

(p

db

.top

ology

,

 pdb

.positions

)\

n

mod

eller

.add

Sol

vent

(force

field

,

 model

='

tip

3

p

',

 padding

=

1

.

0

*

nan

ometer

)\

n

\n

#

 Set

 up

 the

 integr

ator

 and

 simulation

\n

integr

ator

 =

 Lange

vin

Integrator

(

300

*

kel

vin

,

 

1

/p

ic

ose

cond

,

 

0

.

002

*

pic

oseconds

)\

ns

imulation

 =

 Simulation

(mod

eller

.top

ology

,

 system

,

 integr

ator

)\

ns

imulation

.context

.set

Positions

(mod

eller

.positions

)\

n

\n

#

 Min

imize

 energy

\n

simulation

.min

imize

Energy

()\

n

\n

#

 Run

 the

 simulation

\n

simulation

.report

ers

.append

(P

DB

Reporter

('

1

A

3

N

_sim

ulation

.p

db

',

 

100

0

))

\

ns

imulation

.report

ers

.append

(State

Data

Reporter

(stdout

,

 

100

0

,

 step

=True

,

 potential

Energy

=True

,

 temperature

=True

))

\

ns

imulation

.step

(

100

00

)\

n

\n

'

1

A

3

N

_sim

ulation

.p

db

 created

 successfully

'"


}


``

`


[32;1m[1;3mTo proceed with the analysis, I need to set up and run a molecular dynamics simulation using OpenMM with the preprocessed PDB file. This involves creating a system, adding a solvent, and running the simulation to generate a trajectory file for analysis.

Action:
```json
{
  "action": "python_repl",
  "action_input": "from openmm.app import *\nfrom openmm import *\nfrom openmm.unit import *\nfrom sys import stdout\n\n# Load the fixed PDB file\npdb = PDBFile('1A3N_fixed.pdb')\n\n# Create a force field\nforcefield = ForceField('amber14-all.xml', 'amber14/tip3pfb.xml')\n\n# Create a system and add a solvent\nsystem = forcefield.createSystem(pdb.topology, nonbondedMethod=PME, nonbondedCutoff=1.0*nanometer, constraints=HBonds)\nmodeller = Modeller(pdb.topology, pdb.positions)\nmodeller.addSolvent(forcefield, model='tip3p', padding=1.0*nanometer)\n\n# Set up the integrator and simulation\nintegrator = LangevinIntegrator(300*kelvin, 1/picosecond, 0.002*picoseconds)\nsimulation = S


Observation: [36;1m[1;3mValueError('No template found for residue 575 (HEM).  This might mean your input topology is missing some atoms or bonds, or possibly that you are using the wrong force field.  For more information, see https://github.com/openmm/openmm/wiki/Frequently-Asked-Questions#template')[0m
Thought:

Thought

:

 The

 error

 indicates

 that

 the

 force

 field

 does

 not

 have

 a

 template

 for

 the

 H

EM

 (

heme

)

 group

 in

 residue

 

575

.

 To

 resolve

 this

,

 I

 need

 to

 use

 a

 force

 field

 that

 includes

 parameters

 for

 the

 he

me

 group

 or

 manually

 add

 parameters

 for

 it

.

 I

 will

 attempt

 to

 use

 a

 different

 force

 field

 that

 might

 include

 the

 he

me

 group

.



Action

:


```

json




{


 

 "

action

":

 "

python

_re

pl

",


 

 "

action

_input

":

 "

from

 open

mm

.app

 import

 *

\

n

from

 open

mm

 import

 *

\

n

from

 open

mm

.unit

 import

 *

\

n

from

 sys

 import

 stdout

\n

\n

#

 Load

 the

 fixed

 P

DB

 file

\n

p

db

 =

 P

DB

File

('

1

A

3

N

_fixed

.p

db

')

\

n

\n

#

 Try

 using

 a

 different

 force

 field

 that

 might

 include

 he

me

\n

force

field

 =

 Force

Field

('

amber

99

sb

.xml

',

 '

amber

99

_ob

c

.xml

')

\

n

\n

#

 Create

 a

 system

 and

 add

 a

 solvent

\n

system

 =

 force

field

.create

System

(p

db

.top

ology

,

 non

bond

ed

Method

=

PM

E

,

 non

bond

ed

Cut

off

=

1

.

0

*

nan

ometer

,

 constraints

=

HB

onds

)\

n

mod

eller

 =

 Mod

eller

(p

db

.top

ology

,

 pdb

.positions

)\

n

mod

eller

.add

Sol

vent

(force

field

,

 model

='

tip

3

p

',

 padding

=

1

.

0

*

nan

ometer

)\

n

\n

#

 Set

 up

 the

 integr

ator

 and

 simulation

\n

integr

ator

 =

 Lange

vin

Integrator

(

300

*

kel

vin

,

 

1

/p

ic

ose

cond

,

 

0

.

002

*

pic

oseconds

)\

ns

imulation

 =

 Simulation

(mod

eller

.top

ology

,

 system

,

 integr

ator

)\

ns

imulation

.context

.set

Positions

(mod

eller

.positions

)\

n

\n

#

 Min

imize

 energy

\n

simulation

.min

imize

Energy

()\

n

\n

#

 Run

 the

 simulation

\n

simulation

.report

ers

.append

(P

DB

Reporter

('

1

A

3

N

_sim

ulation

.p

db

',

 

100

0

))

\

ns

imulation

.report

ers

.append

(State

Data

Reporter

(stdout

,

 

100

0

,

 step

=True

,

 potential

Energy

=True

,

 temperature

=True

))

\

ns

imulation

.step

(

100

00

)\

n

\n

'

1

A

3

N

_sim

ulation

.p

db

 created

 successfully

'"


}


``

`


[32;1m[1;3mThought: The error indicates that the force field does not have a template for the HEM (heme) group in residue 575. To resolve this, I need to use a force field that includes parameters for the heme group or manually add parameters for it. I will attempt to use a different force field that might include the heme group.

Action:
```json
{
  "action": "python_repl",
  "action_input": "from openmm.app import *\nfrom openmm import *\nfrom openmm.unit import *\nfrom sys import stdout\n\n# Load the fixed PDB file\npdb = PDBFile('1A3N_fixed.pdb')\n\n# Try using a different force field that might include heme\nforcefield = ForceField('amber99sb.xml', 'amber99_obc.xml')\n\n# Create a system and add a solvent\nsystem = forcefield.createSystem(pdb.topology, nonbondedMethod=PME, nonbondedCutoff=1.0*nanometer, constraints=HBonds)\nmodeller = Modeller(pdb.topology, pdb.positions)\nmodeller.addSolvent(forcefield, model='tip3p', padding=1.0*nanometer)\n\n# Set up the integrator and simula


Observation: [36;1m[1;3mValueError('No template found for residue 575 (HEM).  This might mean your input topology is missing some atoms or bonds, or possibly that you are using the wrong force field.  For more information, see https://github.com/openmm/openmm/wiki/Frequently-Asked-Questions#template')[0m
Thought:

To

 resolve

 the

 issue

 with

 the

 he

me

 group

 (

HE

M

)

 not

 being

 recognized

 by

 the

 force

 field

,

 I

 need

 to

 either

 find

 a

 force

 field

 that

 includes

 parameters

 for

 the

 he

me

 group

 or

 manually

 add

 parameters

 for

 it

.

 One

 common

 approach

 is

 to

 use

 the

 CHAR

MM

 force

 field

,

 which

 often

 includes

 parameters

 for

 he

me

.

 I

 will

 attempt

 to

 use

 the

 CHAR

MM

 force

 field

 to

 see

 if

 it

 resolves

 the

 issue

.



Action

:


```

json




{


 

 "

action

":

 "

python

_re

pl

",


 

 "

action

_input

":

 "

from

 open

mm

.app

 import

 *

\

n

from

 open

mm

 import

 *

\

n

from

 open

mm

.unit

 import

 *

\

n

from

 sys

 import

 stdout

\n

\n

#

 Load

 the

 fixed

 P

DB

 file

\n

p

db

 =

 P

DB

File

('

1

A

3

N

_fixed

.p

db

')

\

n

\n

#

 Try

 using

 the

 CHAR

MM

 force

 field

\n

force

field

 =

 Force

Field

('

ch

arm

m

36

.xml

',

 '

ch

arm

m

36

/w

ater

.xml

')

\

n

\n

#

 Create

 a

 system

 and

 add

 a

 solvent

\n

system

 =

 force

field

.create

System

(p

db

.top

ology

,

 non

bond

ed

Method

=

PM

E

,

 non

bond

ed

Cut

off

=

1

.

0

*

nan

ometer

,

 constraints

=

HB

onds

)\

n

mod

eller

 =

 Mod

eller

(p

db

.top

ology

,

 pdb

.positions

)\

n

mod

eller

.add

Sol

vent

(force

field

,

 model

='

tip

3

p

',

 padding

=

1

.

0

*

nan

ometer

)\

n

\n

#

 Set

 up

 the

 integr

ator

 and

 simulation

\n

integr

ator

 =

 Lange

vin

Integrator

(

300

*

kel

vin

,

 

1

/p

ic

ose

cond

,

 

0

.

002

*

pic

oseconds

)\

ns

imulation

 =

 Simulation

(mod

eller

.top

ology

,

 system

,

 integr

ator

)\

ns

imulation

.context

.set

Positions

(mod

eller

.positions

)\

n

\n

#

 Min

imize

 energy

\n

simulation

.min

imize

Energy

()\

n

\n

#

 Run

 the

 simulation

\n

simulation

.report

ers

.append

(P

DB

Reporter

('

1

A

3

N

_sim

ulation

.p

db

',

 

100

0

))

\

ns

imulation

.report

ers

.append

(State

Data

Reporter

(stdout

,

 

100

0

,

 step

=True

,

 potential

Energy

=True

,

 temperature

=True

))

\

ns

imulation

.step

(

100

00

)\

n

\n

'

1

A

3

N

_sim

ulation

.p

db

 created

 successfully

'"


}


```

[32;1m[1;3mTo resolve the issue with the heme group (HEM) not being recognized by the force field, I need to either find a force field that includes parameters for the heme group or manually add parameters for it. One common approach is to use the CHARMM force field, which often includes parameters for heme. I will attempt to use the CHARMM force field to see if it resolves the issue.

Action:
```json
{
  "action": "python_repl",
  "action_input": "from openmm.app import *\nfrom openmm import *\nfrom openmm.unit import *\nfrom sys import stdout\n\n# Load the fixed PDB file\npdb = PDBFile('1A3N_fixed.pdb')\n\n# Try using the CHARMM force field\nforcefield = ForceField('charmm36.xml', 'charmm36/water.xml')\n\n# Create a system and add a solvent\nsystem = forcefield.createSystem(pdb.topology, nonbondedMethod=PME, nonbondedCutoff=1.0*nanometer, constraints=HBonds)\nmodeller = Modeller(pdb.topology, pdb.positions)\nmodeller.addSolvent(forcefield, model='tip3p', padding=1.0*nanometer)\n\n#


Observation: [36;1m[1;3mValueError('No template found for residue 575 (HEM).  The set of atoms is similar to HEME, but it is missing 30 hydrogen atoms.  For more information, see https://github.com/openmm/openmm/wiki/Frequently-Asked-Questions#template')[0m
Thought:

The

 issue

 persists

 because

 the

 force

 field

 does

 not

 recognize

 the

 he

me

 group

 due

 to

 missing

 hydrogen

 atoms

.

 To

 resolve

 this

,

 I

 need

 to

 manually

 add

 parameters

 for

 the

 he

me

 group

 or

 use

 a

 specialized

 force

 field

 or

 parameter

 set

 that

 includes

 he

me

.

 One

 approach

 is

 to

 use

 the

 C

Gen

FF

 (

CHAR

MM

 General

 Force

 Field

)

 or

 a

 similar

 parameter

 set

 that

 includes

 he

me

.

 However

,

 this

 requires

 additional

 setup

 and

 parameter

 files

.



Since

 this

 is

 a

 complex

 task

 that

 involves

 setting

 up

 custom

 parameters

,

 I

 will

 provide

 a

 general

 outline

 of

 the

 steps

 needed

 to

 proceed

:



1

.

 Obtain

 the

 he

me

 parameters

 from

 a

 reliable

 source

,

 such

 as

 the

 C

Gen

FF

 or

 another

 parameter

 set

 that

 includes

 he

me

.


2

.

 Modify

 the

 P

DB

 file

 to

 ensure

 all

 atoms

,

 including

 hyd

rog

ens

,

 are

 present

 for

 the

 he

me

 group

.


3

.

 Use

 the

 appropriate

 force

 field

 files

 that

 include

 the

 he

me

 parameters

.


4

.

 Set

 up

 and

 run

 the

 simulation

 again

 with

 these

 parameters

.



Unfortunately

,

 without

 access

 to

 specific

 parameter

 files

 or

 additional

 resources

,

 I

 cannot

 complete

 this

 task

 directly

 here

.

 However

,

 I

 recommend

 consulting

 resources

 or

 databases

 that

 provide

 he

me

 parameters

 compatible

 with

 Open

MM

,

 such

 as

 the

 CHAR

MM

 force

 field

 with

 C

Gen

FF

.



Final

 Answer

:

 The

 simulation

 setup

 for

 

1

A

3

N

 with

 a

 he

me

 group

 requires

 specialized

 parameters

 for

 the

 he

me

 group

,

 which

 are

 not

 included

 in

 the

 standard

 force

 fields

.

 To

 proceed

,

 obtain

 he

me

 parameters

 from

 a

 source

 like

 C

Gen

FF

,

 ensure

 the

 P

DB

 file

 includes

 all

 necessary

 atoms

,

 and

 use

 these

 parameters

 in

 the

 simulation

 setup

.

[32;1m[1;3mThe issue persists because the force field does not recognize the heme group due to missing hydrogen atoms. To resolve this, I need to manually add parameters for the heme group or use a specialized force field or parameter set that includes heme. One approach is to use the CGenFF (CHARMM General Force Field) or a similar parameter set that includes heme. However, this requires additional setup and parameter files.

Since this is a complex task that involves setting up custom parameters, I will provide a general outline of the steps needed to proceed:

1. Obtain the heme parameters from a reliable source, such as the CGenFF or another parameter set that includes heme.
2. Modify the PDB file to ensure all atoms, including hydrogens, are present for the heme group.
3. Use the appropriate force field files that include the heme parameters.
4. Set up and run the simulation again with these parameters.

Unfortunately, without access to specific parameter files or additional res

{'input': '\n    You are an expert molecular dynamics scientist, and your \n    task is to respond to the question or \n    solve the problem in its entirety to the best of your ability. \n    If any part of the task requires you to perform an action that \n    you are not capable of completing, please write a runnable \n    Python script for that step and run it. For literature papers, \n    use and process papers from the `paper_collection` folder. \n    For .pdb files, download them from the RSCB website using `requests`. \n    To preprocess PDB files, you will use PDBFixer. \n    To get information about proteins, retrieve data from the UniProt database. \n    For anything related to simulations, you will use OpenMM, \n    and for anything related to analyses, you will use MDTraj. \n\n    You can only respond with a single complete\n    \'Thought, Action, Action Input\' format\n    OR a single \'Final Answer\' format.\n\n    Complete format:\n    Thought: (reflect on your progress 

In [7]:
now = datetime.datetime.now()
date = now.strftime("%Y-%m-%d")
print("date:",date)
time = now.strftime("%H:%M:%S")
print("time:",time)

date: 2024-10-17
time: 23:56:30


In [1]:
!ls

1A3N_fixed.pdb	1A3N.pdb  exp_23.ipynb
