<a href="https://colab.research.google.com/github/tetsufmbio/remolog/blob/main/notebooks/remolog_pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Remolog: identifying remote homologs using structural alignment data

## Quick start

1. Press "Runtime" --> "Run all".
2. In the next cell (*Upload protein structure*), it will appear the bottom "Choose file". Click on it and choose one or more pdb file to be uploaded and analyzed by the pipeline*.
3. After the running, it will download a file named "remolog_final_result.tab". See its description in the end of this notebook.
4. Running time of this pipeline in Colab take ~10 min when using foldseek for screening**.

\* If you have only the amino acid sequence, you can predict its structure using the [AlphaFold Colab notebook](https://github.com/sokrypton/ColabFold)

\** If no remote homolog was found, try to increase the parameter "n" or change the screening method.


In [None]:
%cd /content
! if [ -d input ]; then \
    rm -rf input; \
  fi
! mkdir input


%cd input

from google.colab import files

uploaded = files.upload()
#@markdown  By running this cell, a bottom "Choose files" may appear. Click on it and choose the pdb file to be analyzed (you may upload multiple files).

%cd /content


/content
/content/input


Saving WP_000810942.1_532d7_unrelaxed_rank_001_alphafold2_ptm_model_3_seed_000.pdb to WP_000810942.1_532d7_unrelaxed_rank_001_alphafold2_ptm_model_3_seed_000.pdb
Saving WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb to WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb
Saving WP_0004788811_225e7_unrelaxed_rank_001_alphafold2_ptm_model_4_seed_000.pdb to WP_0004788811_225e7_unrelaxed_rank_001_alphafold2_ptm_model_4_seed_000.pdb
Saving WP_0005313521_670de_unrelaxed_rank_001_alphafold2_ptm_model_3_seed_000.pdb to WP_0005313521_670de_unrelaxed_rank_001_alphafold2_ptm_model_3_seed_000.pdb
Saving WP_0005345391_0cb0e_unrelaxed_rank_001_alphafold2_ptm_model_3_seed_000.pdb to WP_0005345391_0cb0e_unrelaxed_rank_001_alphafold2_ptm_model_3_seed_000.pdb
Saving WP_0006566641_7f8f5_unrelaxed_rank_001_alphafold2_ptm_model_3_seed_000.pdb to WP_0006566641_7f8f5_unrelaxed_rank_001_alphafold2_ptm_model_3_seed_000.pdb
Saving WP_0006721551_94bd2_unrelaxed_r

## Setting parameters

In [None]:
screening = "foldseek" #@param ["foldseek","tmalign", "fatcat"]
#@markdown  - software to be used to retrieve structurely similar proteins in SCOPe database.

n = 200 #@param {type:"integer"}
#@markdown  - analyze n most similar protein structures.

database = "scope40" #@param ["scope40", "scope95"]
#@markdown  - protein structure database to be used.

jobName = "remolog_final_result" #@param {type: "string"}
#@markdown  - prefix for the final output name.


## Installing dependencies and downloading database

In [None]:
# setup environment variables
import os
os.environ['FATCAT'] = '/content/programs/FATCAT-dist'
os.environ['PATH'] += ':/content/programs/FATCAT-dist/FATCATMain:/content/bin'
os.environ['HEADN'] = str(n)
os.environ['SCREEN'] = screening
os.environ['DATABASE'] = database
if (database == "scope40"):
  os.environ['ANNOT'] = "/content/programs/remolog/data/dir.cla.scope.2.08-stable_filtered40.txt"
elif (database == "scope95"):
  os.environ['ANNOT'] = "/content/programs/remolog/data/dir.cla.scope.2.08-stable_filtered95.txt"


In [None]:
%cd /content/
! if [ ! -d bin ]; then mkdir bin; fi
! if [ ! -d programs ]; then mkdir programs; fi
! if [ ! -d view ]; then mkdir view; fi
! if [ ! -d foldseek_data ]; then mkdir foldseek_data; fi

/content


In [None]:
# download and install TMalign
%cd /content/bin
! if [ ! -e TMalign ]; then \
    wget "https://zhanggroup.org/TM-align/TMalign.cpp"; \
    g++ -static -O3 -ffast-math -lm -o TMalign TMalign.cpp; \
  fi


/content/bin
--2023-04-23 14:54:10--  https://zhanggroup.org/TM-align/TMalign.cpp
Resolving zhanggroup.org (zhanggroup.org)... 141.213.137.249
Connecting to zhanggroup.org (zhanggroup.org)|141.213.137.249|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 182097 (178K) [text/plain]
Saving to: ‘TMalign.cpp’


2023-04-23 14:54:10 (2.78 MB/s) - ‘TMalign.cpp’ saved [182097/182097]



In [None]:
# download and install FATCAT
%cd /content/programs
! if [ ! -d FATCAT-dist ]; then \
    git clone https://github.com/GodzikLab/FATCAT-dist.git; \
    cd FATCAT-dist/; ./Install; \
  fi


/content/programs
Cloning into 'FATCAT-dist'...
remote: Enumerating objects: 119, done.[K
remote: Counting objects: 100% (119/119), done.[K
remote: Compressing objects: 100% (104/104), done.[K
remote: Total 119 (delta 17), reused 109 (delta 13), pack-reused 0[K
Receiving objects: 100% (119/119), 2.37 MiB | 11.86 MiB/s, done.
Resolving deltas: 100% (17/17), done.
g++ -O2 -Wall -I./ -c SAlnOpt.C
g++ -O2 -Wall -I./ -c Prot.C
g++ -O2 -Wall -I./ -c AFPchain.C
  872 | [01;35m[K/[m[K/       j1 -|---------------------\
      | [01;35m[K^[m[K
[01m[KAFPchain.C:[m[K In member function ‘[01m[Kvoid AFPCHAIN::MergeAfp(int, int*, int*)[m[K’:
  445 |  int i, j, k, f1, f2, i1, i2, j1, [01;35m[Kj2[m[K, frag;
      |                                   [01;35m[K^~[m[K
[01m[KAFPchain.C:[m[K In member function ‘[01m[Kvoid AFPCHAIN::UpdateScore()[m[K’:
 1154 |  double [01;35m[Kt[m[K, conn, dvar;
      |         [01;35m[K^[m[K
[01m[KAFPchain.C:[m[K In member func

In [None]:
# download and install lovoalign
%cd /content/bin
! if [ ! -e lovoalign ]; then \
    wget "https://github.com/m3g/lovoalign/archive/refs/tags/22.0.0.tar.gz"; \
    tar -xzf 22.0.0.tar.gz; cd lovoalign-22.0.0/src; make; cp ../bin/lovoalign /content/bin; \
  fi

/content/bin
--2023-04-23 14:54:24--  https://github.com/m3g/lovoalign/archive/refs/tags/22.0.0.tar.gz
Resolving github.com (github.com)... 140.82.112.3
Connecting to github.com (github.com)|140.82.112.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/m3g/lovoalign/tar.gz/refs/tags/22.0.0 [following]
--2023-04-23 14:54:24--  https://codeload.github.com/m3g/lovoalign/tar.gz/refs/tags/22.0.0
Resolving codeload.github.com (codeload.github.com)... 140.82.113.9
Connecting to codeload.github.com (codeload.github.com)|140.82.113.9|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘22.0.0.tar.gz’

22.0.0.tar.gz           [ <=>                ]  31.57K  --.-KB/s    in 0.002s  

2023-04-23 14:54:24 (14.6 MB/s) - ‘22.0.0.tar.gz’ saved [32325]

 ------------------------------------------------------ 
 Compiling LovoAlign with gfortran 
 Flags: -O3 -ffast-math -llapack  
 -

In [None]:
# download some scripts and model
%cd /content/programs
! if [ ! -d remolog ]; then \
    git clone https://github.com/tetsufmbio/remolog.git; \
  fi

/content/programs
Cloning into 'remolog'...
remote: Enumerating objects: 164, done.[K
remote: Counting objects: 100% (164/164), done.[K
remote: Compressing objects: 100% (130/130), done.[K
remote: Total 164 (delta 76), reused 104 (delta 34), pack-reused 0[K
Receiving objects: 100% (164/164), 16.72 MiB | 17.90 MiB/s, done.
Resolving deltas: 100% (76/76), done.


In [None]:
%cd /content/programs
! if [ ! -e /content/bin/foldseek ]; then \
    wget https://mmseqs.com/foldseek/foldseek-linux-sse2.tar.gz; tar xvzf foldseek-linux-sse2.tar.gz; \
    cp foldseek/bin/foldseek /content/bin; \
  fi

/content/programs
--2023-04-23 14:54:32--  https://mmseqs.com/foldseek/foldseek-linux-sse2.tar.gz
Resolving mmseqs.com (mmseqs.com)... 141.5.100.26
Connecting to mmseqs.com (mmseqs.com)|141.5.100.26|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 41166698 (39M) [application/octet-stream]
Saving to: ‘foldseek-linux-sse2.tar.gz’


2023-04-23 14:54:34 (20.7 MB/s) - ‘foldseek-linux-sse2.tar.gz’ saved [41166698/41166698]

foldseek/
foldseek/README.md
foldseek/bin/
foldseek/bin/foldseek


In [None]:
# download and format scope database
%cd /content
! if [ ! -d database ]; then \
    mkdir database; \
  fi;
%cd /content/database
! if [ ! -d $DATABASE ]; then \
      mkdir $DATABASE; \
  fi;
%cd /content/database/$database
! if [ ! -f ../list_$DATABASE.tab ]; then \
    if [ $DATABASE = "scope40" ]; then \
      wget https://scop.berkeley.edu/downloads/pdbstyle/pdbstyle-sel-gs-bib-40-2.08.tgz; \
      tar -zxf pdbstyle-sel-gs-bib-40-2.08.tgz; mv pdbstyle-2.08/*/*.ent . ; rm -rf pdbstyle*; \
    elif [ $DATABASE = "scope95" ]; then \
      wget https://scop.berkeley.edu/downloads/pdbstyle/pdbstyle-sel-gs-bib-95-2.08.tgz; \
      tar -zxf pdbstyle-sel-gs-bib-95-2.08.tgz; mv pdbstyle-2.08/*/*.ent . ; rm -rf pdbstyle*; \
    fi; \
    ls *.ent > ../list_$DATABASE.tab; for i in *.ent; do mv $i $i.pdb; done; \
  fi;

/content
/content/database
/content/database/scope40
--2023-04-23 14:54:37--  https://scop.berkeley.edu/downloads/pdbstyle/pdbstyle-sel-gs-bib-40-2.08.tgz
Resolving scop.berkeley.edu (scop.berkeley.edu)... 128.32.236.13
Connecting to scop.berkeley.edu (scop.berkeley.edu)|128.32.236.13|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1040288513 (992M) [application/x-gzip]
Saving to: ‘pdbstyle-sel-gs-bib-40-2.08.tgz’


2023-04-23 14:55:13 (27.8 MB/s) - ‘pdbstyle-sel-gs-bib-40-2.08.tgz’ saved [1040288513/1040288513]



In [None]:
# creating foldseek database
%cd /content/foldseek_data/
! if [ ! -f fs_$database ]; then \
  foldseek createdb /content/database/$database fs_$database ; \
fi

/content/foldseek_data
createdb /content/database/scope40 fs_scope40 

MMseqs Version:        	b9831bc6f3c4ec0fadca5caaae8b01d0da761e70
Chain name mode        	0
Mask b-factor threshold	0
Coord store mode       	2
Write lookup file      	1
Tar Inclusion Regex    	.*
Tar Exclusion Regex    	^$
Threads                	2
Verbosity              	3

Output file: fs_scope40
Time for merging to fs_scope40_ss: 0h 0m 0s 16ms
Time for merging to fs_scope40_h: 0h 0m 0s 9ms
Time for merging to fs_scope40_ca: 0h 0m 0s 43ms
Time for merging to fs_scope40: 0h 0m 0s 16ms
Ignore 0 out of 39611.
Too short: 0, incorrect  0.
Time for processing: 0h 0m 41s 442ms


## Running protein structure alignment

In [None]:
%cd /content
! if [ -d result ]; then rm -rf result ; fi
! mkdir result;
! mkdir result/screening;

/content


In [None]:
# screening for similar proteins using foldseek
%cd /content/input

! if [ $SCREEN = "foldseek" ]; then \
    for f in *; do \
      foldseek easy-search $f /content/foldseek_data/fs_$DATABASE /content/result/screening/tmp.tab.fmt /content/tmpFolder --max-seqs $HEADN -e inf; \
      cut -f 1,2 /content/result/screening/tmp.tab.fmt | sort | uniq | perl -ne '@a = split(/\./, $_); print join(".", @a[0 .. $#a-2])."\n";' > /content/result/screening/$f.tab.fmt; \
      rm /content/result/screening/tmp.tab.fmt; \
    done; \
  fi

/content/input
Create directory /content/tmpFolder
easy-search WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb /content/foldseek_data/fs_scope40 /content/result/screening/tmp.tab.fmt /content/tmpFolder --max-seqs 200 -e inf 

MMseqs Version:              	b9831bc6f3c4ec0fadca5caaae8b01d0da761e70
Seq. id. threshold           	0
Coverage threshold           	0
Coverage mode                	0
Max reject                   	2147483647
Max accept                   	2147483647
Add backtrace                	false
TMscore threshold            	0
TMalign hit order            	0
TMalign fast                 	1
Preload mode                 	0
Threads                      	2
Verbosity                    	3
LDDT threshold               	0
Sort by structure bit score  	1
Substitution matrix          	aa:3di.out,nucl:3di.out
Alignment mode               	3
Alignment mode               	0
E-value threshold            	inf
Min alignment length         	0
Seq. id. mode         

In [None]:
# screening for similar proteins using FATCAT

%cd /content/input
! if [ $SCREEN = "fatcat" ]; then \
    for f in *; do \
      FATCATSearch.pl $f /content/database/list_$DATABASE.tab -b -i1 /content/input -i2 /content/database/$DATABASE | \
      sort -k11nr | \
      head -n $HEADN | \
      perl /content/programs/remolog/scripts/format_result_FATCAT.pl - /content/programs/remolog/data/maxScore_fatcat.tab > /content/result/screening/$f.tab.fmt; \
    done; \
    cat /content/result/screening/*.fmt > /content/result/fatcat_formatted.tab; \
  fi

/content/input


In [None]:
# screening for similar proteins using tmalign

%cd /content/input
! if [ $SCREEN = "tmalign" ]; then \
  for f in *; do \
    if [ -f /content/result/screening/${f}.tab ]; then \
      rm /content/result/screening/${f}.tab; \
    fi; \
    for l in $(cat /content/database/list_$DATABASE.tab); \
      do TMalign /content/input/$f /content/database/$DATABASE/${l}.pdb | perl /content/programs/remolog/scripts/parser_TMalign.pl - >> /content/result/screening/${f}.tab ; \
      done; \
    sort -k3nr /content/result/screening/${f}.tab | grep ${f} | head -n $HEADN > /content/result/screening/${f}.tab.fmt; \
    done; \
  cat /content/result/screening/*.fmt > /content/result/tmalign_formatted.tab; \
fi

/content/input


In [None]:
%cd /content/input
! if [ $SCREEN != "fatcat" ]; then \
  if [ -f /content/result/fatcat_formatted.tab ]; then \
    rm /content/result/fatcat_formatted.tab; \
  fi; \
  for f in *; do \
    for l in $(cut -f2 /content/result/screening/$f.tab.fmt); do \
      FATCAT -p1 $f -p2 $l.ent.pdb -i1 /content/input -i2 /content/database/$DATABASE -b | \
      perl /content/programs/remolog/scripts/format_result_FATCAT.pl - /content/programs/remolog/data/maxScore_fatcat.tab >>  /content/result/fatcat_formatted.tab ; \
    done; \
  done; \
  fi

/content/input


In [None]:
%cd /content/input
! if [ $SCREEN != "tmalign" ]; then \
    if [ -f /content/result/tmalign_formatted.tab ]; then \
      rm /content/result/tmalign_formatted.tab; \
    fi; \
    for f in *; do \
      for l in $(cut -f2 /content/result/screening/$f.tab.fmt); do \
        TMalign /content/input/$f /content/database/$DATABASE/$l.ent.pdb | perl /content/programs/remolog/scripts/parser_TMalign.pl - >> /content/result/tmalign_formatted.tab ; \
      done; \
    done;\
  fi

/content/input


In [None]:
%cd /content/input
! if [ $SCREEN != "lovoalign" ]; then \
    if [ -f /content/result/lovoalign_formatted.tab ]; then \
      rm /content/result/lovoalign_formatted.tab; \
    fi; \
    for f in *; do \
      for l in $(cut -f2 /content/result/screening/$f.tab.fmt); do \
        lovoalign -p1 /content/input/$f -p2 /content/database/$DATABASE/$l.ent.pdb | perl /content/programs/remolog/scripts/parser_lovoalign.pl - >> /content/result/lovoalign_formatted.tab; \
      done; \
    done; \
  fi

/content/input


In [None]:
%cd /content/result
! perl /content/programs/remolog/scripts/join_table.pl /content/result/fatcat_formatted.tab /content/result/tmalign_formatted.tab | \
  perl /content/programs/remolog/scripts/join_table.pl - /content/result/lovoalign_formatted.tab | \
  perl /content/programs/remolog/scripts/add_scope_class.pl - $ANNOT > result.tab

/content/result


## Making prediction and writing the results

In [None]:
# create model
%cd /content/programs/remolog/scripts
!if [ ! -f model.joblib ]; then python3 create_model.py ../data/data.csv; fi

/content/programs/remolog/scripts


In [None]:
# make prediction
%cd /content/programs/remolog/scripts
!python3 classify.py /content/result/result.tab model.joblib

/content/programs/remolog/scripts


In [None]:
# summarize result
%cd /content/programs/remolog/scripts
!perl add_desc.pl result.tab ../data/dir.des.scope.2.08-stable_sf.txt $ANNOT > result_summary.tab

/content/programs/remolog/scripts


In [None]:
import pandas as pd
import numpy as np

In [None]:
data = pd.read_csv("/content/programs/remolog/scripts/result_summary.tab", sep="\t")


In [None]:
data = data.sort_values(by=["query", "pred_proba"], ascending=[True, False])

In [None]:
# write and download result file
from google.colab import files

data.to_csv("/content/result/"+jobName+".tab", index=False)
files.download("/content/result/"+jobName+".tab")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
# function to display table
from google.colab import data_table
data_table.enable_dataframe_formatter()

def hyperlink(path):
	
    # returns the substring of a path

    pathList = path.split("=")
    f_url = pathList[len(pathList)-1]
    path="https://scop.berkeley.edu/search/?ver=2.08&key="+f_url
    #print(f_url)
    
    # convert the path into clickable link
    return '<a target="_blank" href="{}">{}</a>'.format(path, f_url)


In [None]:
# functions to display 3d alignment
! pip install py3Dmol
import py3Dmol
import glob
import matplotlib.pyplot as plt 

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting py3Dmol
  Downloading py3Dmol-2.0.1.post1-py2.py3-none-any.whl (12 kB)
Installing collected packages: py3Dmol
Successfully installed py3Dmol-2.0.1.post1


In [None]:
import ipywidgets as widgets

def print_query(subject):
    print(subject)

def select_subject(query):
    subject_picker.options = list(data[data["query"] == query].subject)

query_list = data.loc[:,"query"].unique()
query_picker = widgets.Dropdown(options=query_list, value=query_list[0])
subject_list = list(data[data["query"] == query_list[0]].subject)
subject_picker = widgets.Dropdown(options=subject_list, value=subject_list[0])
j = widgets.interactive(print_query, subject=subject_picker)
i = widgets.interactive(select_subject, query=query_picker)

#button = widgets.Button(description="Submit")
#output = widgets.Output()
#display(button)

def display_str():
  selected_subject = data[(data["subject"] == subject_picker.value) & (data["query"] == query_picker.value)]
  os.chdir("/content/view")
  os.system("rm *")
  os.system("FATCAT -p1 "+query_picker.value+" -p2 "+subject_picker.value+".ent.pdb -i1 /content/input -i2 /content/database/$DATABASE -t")

  with open("/content/view/tmp.opt.twist.pdb") as ifile:
      system = "".join([x for x in ifile])
      
  view = py3Dmol.view(width=400, height=300)
  view.addModelsAsFrames(system)

  view.setStyle({'chain':'A'}, {'cartoon':{'color':'blue'}})
  view.setStyle({'chain':'B'}, {'cartoon':{'color':'yellow'}})
  view.zoomTo()
  view.show()


  display(pd.DataFrame(selected_subject))


#button.on_click(on_button_clicked)



## Display table 

In [None]:
data2 = data.style.format({'subject': hyperlink, 'sf': hyperlink})
data2

Unnamed: 0,query,subject,superfamily,pred,pred_proba
0,WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb,d1guia_,b.18.1 Galactose-binding domain-like,1,0.615139
1,WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb,d1uwwa_,b.18.1 Galactose-binding domain-like,1,0.604152
2,WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb,d2oo2a1,a.8.11 AF1782-like,1,0.591303
3,WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb,d1wkya1,b.18.1 Galactose-binding domain-like,1,0.584613
4,WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb,d2d29a1,a.29.3 Acyl-CoA dehydrogenase C-terminal domain-like,1,0.576638
5,WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb,d1wcua1,b.18.1 Galactose-binding domain-like,1,0.554656
6,WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb,d4bj0a1,b.18.1 Galactose-binding domain-like,1,0.520274
7,WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb,d1cx1a1,b.18.1 Galactose-binding domain-like,1,0.518248
8,WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb,d1w0na_,b.18.1 Galactose-binding domain-like,1,0.51078
9,WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb,d1gu3a_,b.18.1 Galactose-binding domain-like,1,0.501922


In [None]:
data_table.disable_dataframe_formatter()

Description of the columns
- query: Query name
- subject: Subject name

Lovoalign
- lovo_finalScore: Final score;
- lovo_coverage: Alignment coverage;
- lovo_rmsd: RMSD;
- lovo_gaps: # of gaps;
- lovo_relCov: proportion of the coverage and subject length
- lovo_relGaps: propotion of gaps to coverage
- lovo_finalScoreNorm: Normalized score;

TM-align
- tm_AliLen: Alignment length;
- tm_RMSD: RMSD;
- tm_n_ident/n_aln: proportion of # identical atom and aligned length;
- tm_TM-score (chain 2): TM-score normalized by subject;
- tm_d0 (chain 2): scale factor used to calculate TM-score; 
- tm_cov: coverage of the alignment (subject)

FATCAT
- fatcat_subject-len: subject length;
- fatcat_Twists: # of twists;
- fatcat_ini-len: Initial alignment length;
- fatcat_ini-rmsd: Initial RMSD;
- fatcat_opt-equ: # of equivalent positions in the alignment;
- fatcat_opt-rmsd: RMSD of aligned Cα atoms of the input structures with structural rearragement;
- fatcat_chain-rmsd: RMSD of aligned Cα atoms of the input structures without structural rearragement;
- fatcat_Score: Alignment score
- fatcat_align-len: Alignment length 
- fatcat_Gaps: # of gaps in the alignment
- fatcat_rel_score: proportion of the alignment score and maximum score
- fatcat_rel_align: proportion of the # of aligned position with subject length

Prediction
- pred: prediction (0: not remote homolog; 1: remote homolog)
- pred_proba: prediction probability

SCOPe annotation
- cl: subject SCOPe class
- cf: subject SCOPe fold
- sf: subject SCOPe superfamily

## Display protein structure alignment

In [None]:
print("Choose a query")
display(query_picker)
print("Choose the Subject")
display(subject_picker)

Choose a query


Dropdown(description='query', options=('WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000…

Choose the Subject


Dropdown(description='subject', options=('d1guia_', 'd1uwwa_', 'd2oo2a1', 'd1wkya1', 'd2d29a1', 'd1wcua1', 'd4…

In [None]:
# Choose a query and a subject in the cell above and run this cell to display
# the structure alignement performed by FATCAT.
# chain in blue: Query
# chain in yellow: Subject
display_str()

Unnamed: 0,query,subject,superfamily,pred,pred_proba
0,WP_0004663221_495e4_unrelaxed_rank_001_alphafo...,d1guia_,b.18.1 Galactose-binding domain-like,1,0.615139
