# Annotator - extracting RNA base interactions and secondary structure

In this notebook we demonstrate how the `annotator.py` script can be used to:
- detect base pairs (Leontis-Westhof + Saenger),
- detect stacking interactions,
- detect base-phosphate and base-ribose interactions,
- build a secondary structure representation (dot-bracket, BPSEQ),
- export results to CSV and JSON for downstream analysis.

Two example structures are used:
- `5g35.pdb`
- `R1107TS128_4.pdb`

Input files are stored locally in: `../data/pdb/`

### Example 1: annotate first structure (5g35)

We run the annotator on a single PDB file and export:
- dot-bracket (saved to a text file),
- interactions table (CSV),
- Structure2D snapshot (JSON),
- BPSEQ,
- PyMOL script (PML) for stem visualization.

Outputs are stored in: `../outputs/annotator/5g35/`

In [2]:
!mkdir -p ../outputs/annotator/5g35

!python ../src/rnapolis/annotator.py \
  ../data/pdb/5g35.pdb \
  --csv ../outputs/annotator/5g35/interactions.csv \
  --json ../outputs/annotator/5g35/structure2d.json \
  --bpseq ../outputs/annotator/5g35/structure.bpseq \
  --pml ../outputs/annotator/5g35/stems.pml \
  > ../outputs/annotator/5g35/dotbracket.txt

In [23]:
!head -n 6 ../outputs/annotator/5g35/dotbracket.txt

>strand_C
GT
..
>strand_D
GCG
...


In [12]:
!head -n 5 ../outputs/annotator/5g35/interactions.csv

nt1,nt2,type,classification-1,classification-2
D.DG1,E.DG1,base pair,cSW,
D.DG1,E.DG1,base pair,cWS,
D.DG9,E.DT9,base pair,cHW,
D.DG9,F.DA7,base pair,cSS,


In [13]:
!head -n 5 ../outputs/annotator/5g35/structure.bpseq

1 G 0
2 T 0
3 G 0
4 C 0
5 G 0


### Example 2: annotate the second structure (R1107TS128_4)

In this section we apply the same `annotator.py` workflow to a different RNA structure:
`R1107TS128_4.pdb`.

The goal is to show that the annotator can be used in an identical way for
different input structures, producing the same types of outputs without any
additional configuration.

As before, the annotator is used to:
- detect RNA base interactions,
- reconstruct the secondary structure,
- export results in multiple standard formats.

Outputs for this structure are stored in:

`../outputs/annotator/R1107TS128_4/`

In [19]:
!mkdir -p ../outputs/annotator/R1107TS128_4

!python ../src/rnapolis/annotator.py \
  ../data/pdb/R1107TS128_4.pdb \
  --csv ../outputs/annotator/R1107TS128_4/interactions.csv \
  --json ../outputs/annotator/R1107TS128_4/structure2d.json \
  --bpseq ../outputs/annotator/R1107TS128_4/structure.bpseq \
  --pml ../outputs/annotator/R1107TS128_4/stems.pml \
  > ../outputs/annotator/R1107TS128_4/dotbracket.txt

In [20]:
!head -n 5 ../outputs/annotator/R1107TS128_4/dotbracket.txt

>strand_0
GGGGGCCACAGCAGAAGCGUUCACGUCGCAGCCCCUGUCAGCCAUUGCACUCCGGCUGCGAAUUCUGCU
[[[[[[...((((((((((.......))).]]]]]]........((..............)))))))))


In [38]:
!head -n 5 ../outputs/annotator/R1107TS128_4/interactions.csv

nt1,nt2,type,classification-1,classification-2
0.G1,0.U36,base pair,cWW,XXVIII
0.G2,0.C35,base pair,cWW,XIX
0.G3,0.C34,base pair,cWW,XIX
0.G4,0.C33,base pair,cWW,XIX


In [39]:
!head -n 5 ../outputs/annotator/R1107TS128_4/stems.pml

select stem0, 0/1-6/ or 0/31-36/
pseudoatom stem0_centroid0, pos=[13.959, 4.366, 20.108]
pseudoatom stem0_centroid1, pos=[13.389, 1.392, 22.558]
pseudoatom stem0_centroid2, pos=[12.513, -2.742, 23.197]
pseudoatom stem0_centroid3, pos=[12.080, -6.270, 22.320]


## Inspecting stacking interactions

In addition to base pairs, the annotator detects base stacking interactions.
These interactions are reported in the CSV output with type `stacking`
and include information about stacking topology.

We can inspect stacking interactions directly from the previously exported CSV files from the two example structures.


**5G35:**

In [25]:
!grep stacking ../outputs/annotator/5g35/interactions.csv | head -n 5

D.DG1,E.DG1,stacking,upward,
D.DG9,E.DC10,stacking,upward,
E.DC2,E.DT3,stacking,upward,
E.DT3,E.DC4,stacking,upward,
E.DC4,E.DT5,stacking,upward,


**R1107TS128_4:**

In [33]:
!grep stacking ../outputs/annotator/R1107TS128_4/interactions.csv | head -n 5

0.G1,0.G2,stacking,upward,
0.G2,0.G3,stacking,upward,
0.G3,0.G4,stacking,upward,
0.G4,0.G5,stacking,upward,
0.G5,0.C6,stacking,upward,


These output show stacking interactions between nucleotides together with
their relative topology (for example inward, outward, upward or downward),
which reflects the geometric arrangement of the bases.

## Inspecting base–phosphate and base–ribose interactions

The annotator also detects non-canonical interactions involving the RNA backbone:
- base–phosphate interactions,
- base–ribose interactions.

These interactions are important for stabilizing RNA tertiary structure
and are reported separately in the interactions table.


#### base-phosphate interactions

**5G35:**

In [31]:
!grep "base-phosphate" ../outputs/annotator/5g35/interactions.csv | head -n 5

D.DT2,E.DC2,base-phosphate interaction,9BPh,
D.DA4,E.DC4,base-phosphate interaction,0BPh,
E.DA6,D.DG6,base-phosphate interaction,0BPh,
E.8PY8,D.DC8,base-phosphate interaction,1BPh,
E.DT9,D.DG9,base-phosphate interaction,0BPh,


**R1107TS128_4:**

In [34]:
!grep "base-phosphate" ../outputs/annotator/R1107TS128_4/interactions.csv | head -n 5

0.G37,0.C22,base-phosphate interaction,5BPh,


<br><br>

#### base-ribose interactions

**5G35:**

In [35]:
!grep "base-ribose" ../outputs/annotator/5g35/interactions.csv | head -n 5

C.DC7,F.DA7,base-ribose interaction,0BR,
C.DT12,F.DG12,base-ribose interaction,0BR,
D.DG9,E.DT9,base-ribose interaction,0BR,
D.DG12,E.DT12,base-ribose interaction,0BR,
E.DG1,D.DG1,base-ribose interaction,0BR,


**R1107TS128_4:**

In [36]:
!grep "base-ribose" ../outputs/annotator/R1107TS128_4/interactions.csv | head -n 5

0.G1,0.U20,base-ribose interaction,0BR,
0.G2,0.U36,base-ribose interaction,3BR,
0.G3,0.G2,base-ribose interaction,0BR,
0.C6,0.C7,base-ribose interaction,0BR,
0.G11,0.A10,base-ribose interaction,0BR,


Each interaction is classified using established interaction codes,
which allows direct comparison with literature descriptions of RNA
backbone-mediated contacts.