## TANGO Tutorial: Nuclear Pore Complexes

### This Tutorial was designed to demonstrate the usage of cryoCAT and its module tango.py for the affiliation computation of subunits (SUs) of the cytoplasmic ring (CR) of the nuclear pore complex (NPC).

autoreload reloads modules automatically before entering the execution of code typed at the IPython prompt.

In [1]:
%load_ext autoreload
%autoreload 2

Besides the cryoCAT modules to handle motive lists (cryomotl) and the module dedicated to twist-aware neighborhoods for geometric organization (TANGO), several other common python libraries are imported for this demonstration.

In [None]:
import numpy as np

from scipy.spatial import cKDTree

import matplotlib.pyplot as plt

from cryocat import cryomotl, tango

# for color palette
from monet_palette import monet_colors

  "class": algorithms.Blowfish,


### Input

The NPC motive list is loaded. npc_input was preprocessed by cleaning using a mask and by shifting all CR SUs by the CR radius.

vis_motl is used to visualize the affiliation results. Its particles had not been shifted in x-direction.

In [None]:
npc_input = "./inputs/cr_mask_cleaned_shifted.em"

npc_motl = cryomotl.EmMotl(npc_input)

visualization_input = "./inputs/cr_mask_cleaned.em"

vis_motl = cryomotl.EmMotl(visualization_input)

### Parameter Analysis

Nearest neighbors (NNs) and their distances are gained using cKDTree. 

The search radius for the initial TwistDescriptor depends on NN-statistics.

It is chosen so as to have non-empty supports for most subunits (SUs), while not being too large, either. The purpose of choosing a smaller spherical support stems from wanting to compute affiliations.

Large supports may contain many false positives, which exist in large, dense quantities in this data. This can slow down computation.

In [20]:
positions = npc_motl.get_coordinates()

tree = cKDTree(positions)

dd, _ = tree.query(positions, k=2)

print(f"The median NN distance is {np.median(dd[:,1])} voxels.")

The median NN distance is 10.411326530299032 voxels.


### Computation of Twist Features

In [21]:
npc_twist_desc = tango.TwistDescriptor(input_motl= npc_motl, nn_radius= 30)

display(npc_twist_desc.df)

Unnamed: 0,qp_id,nn_id,tomo_id,twist_so_x,twist_so_y,twist_so_z,twist_x,twist_y,twist_z,nn_inplane,geodesic_distance_rad,euclidean_distance,product_distance,rot_angle_x,rot_angle_y,rot_angle_z
0,1.0,2.0,2.0,1.797865e-01,2.090918e-01,2.299244,13.531632,-5.908503,16.211919,180.0,2.315721,21.928105,22.050042,122.380049,120.700979,0.944084
1,1.0,3.0,2.0,2.378535e-01,-2.291136e-02,0.718671,0.249111,0.351212,0.625374,90.0,0.757356,0.759275,1.072421,29.765285,42.080563,2.216453
2,1.0,4.0,2.0,2.293776e-01,1.218139e-16,1.460299,3.208845,-3.563783,3.615385,132.0,1.478204,6.005685,6.184928,71.552475,84.694846,1.025885
3,1.0,13.0,2.0,-5.458377e-02,1.421956e-01,-0.830710,5.651752,5.831943,7.228840,0.0,0.844558,10.872442,10.905194,45.262190,40.242403,0.793427
4,1.0,425.0,2.0,-5.762018e-17,-1.152404e-16,0.942478,-7.492079,1.401135,17.882562,102.0,0.942478,19.439148,19.461982,54.000000,54.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
86563,93649.0,93191.0,295.0,-8.115863e-01,-1.008619e+00,-0.335238,1.741716,7.591980,-17.572752,42.0,1.337299,19.221690,19.268153,30.121118,18.831985,57.413882
86564,93649.0,93248.0,295.0,-1.391590e-01,-8.862864e-01,0.741861,-3.752608,-14.560707,2.569989,102.0,1.164142,15.254544,15.298900,58.727188,15.919942,24.194922
86565,93649.0,93508.0,295.0,1.154101e+00,3.219911e-01,2.057776,26.336748,-1.645933,-6.569484,162.0,2.381191,27.193593,27.297648,70.307046,117.983453,18.530313
86566,93654.0,93348.0,295.0,2.029154e+00,-1.100237e+00,-0.453392,20.985810,6.644445,-4.847355,276.0,2.352350,22.539958,22.662375,18.517781,71.740810,108.802254


Among the support options offered by TANGO, the cylindrical support is a well-suited support to crop the initial one to.

In the context of a given CR, a subunit's intrinsic z-axis serves as an normal to the nuclear envelope.

Thus, a cylinder extending from a SU inwards in the oppsite direction should ideally contain mostly SUs of the same VLP.

In [22]:
height = 10 # try cylindrical support with symmetric = True.

# cylinder statistics also require the choice for an axis of rotation; here, a query particle's intrinsic z-axis is chosen.
# This choice is inspired by the NPC subunits having z-normals pointing in approximately the same direction.
axis = np.array([0, 0, 1])

cylinder_supp = tango.Cylinder(npc_twist_desc, radius=30, height=height, axis = axis, symmetric= True)

display(cylinder_supp.support.df)

Unnamed: 0,qp_id,nn_id,tomo_id,twist_so_x,twist_so_y,twist_so_z,twist_x,twist_y,twist_z,nn_inplane,geodesic_distance_rad,euclidean_distance,product_distance,rot_angle_x,rot_angle_y,rot_angle_z
1,1.0,3.0,2.0,0.237854,-2.291136e-02,0.718671,0.249111,0.351212,0.625374,90.0,0.757356,0.759275,1.072421,29.765285,42.080563,2.216453
2,1.0,4.0,2.0,0.229378,1.218139e-16,1.460299,3.208845,-3.563783,3.615385,132.0,1.478204,6.005685,6.184928,71.552475,84.694846,1.025885
3,1.0,13.0,2.0,-0.054584,1.421956e-01,-0.830710,5.651752,5.831943,7.228840,0.0,0.844558,10.872442,10.905194,45.262190,40.242403,0.793427
12,3.0,4.0,2.0,0.038381,9.998632e-02,0.732356,-0.005849,-4.161344,3.963655,132.0,0.740146,5.746945,5.794411,40.208151,36.678434,0.446318
13,3.0,13.0,2.0,-0.246638,8.191195e-02,-1.562973,8.305401,1.820836,5.526083,0.0,1.584432,10.140647,10.263681,76.649941,86.088062,1.229507
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
86546,93621.0,93647.0,295.0,-1.724982,7.623500e-01,-0.251258,-13.350932,-1.017321,-0.464529,204.0,1.902596,13.397690,13.532109,10.176526,65.331278,94.614687
86550,93630.0,93156.0,295.0,1.043533,-5.202203e-01,0.182831,-7.275807,-2.297263,-8.843884,162.0,1.180261,11.680286,11.739765,7.833976,37.817569,57.148537
86552,93631.0,93299.0,295.0,1.737882,-7.952418e-01,1.233643,7.179711,25.748072,-3.490293,48.0,2.274757,26.957255,27.053061,30.760654,84.769963,59.651433
86565,93649.0,93508.0,295.0,1.154101,3.219911e-01,2.057776,26.336748,-1.645933,-6.569484,162.0,2.381191,27.193593,27.297648,70.307046,117.983453,18.530313


Using a filter, one can further zoom in on the cytoplasmic rings by reducting the data to those particles, for which the rotation transporting their orientation 

to that of a neighboring one is close to being a rotation around the intrinsic z-axis.

Furthermore, a focus on the eight-fold symmetry of the NPC is implemented in the form of restrictions on the geodesic distance in radians. It is restricted to what is expected for the relative orientation between neighboring SUs i, i+1 (2pi/8), with some room for noise.

In [None]:
max_angle = np.degrees(0.5) # tolerance

z_axis_filtered = tango.AxisRot(twist_desc= cylinder_supp.support, max_angle= max_angle)

# focus on required eight-fold symmetry of the CR.

df = z_axis_filtered.filter.df.copy()

# By removing the comment in the line of code below, SUs which are included which have a relative orientation close to 4pi/8, which is expected for SUs i, i+2.
df = df[((df['geodesic_distance_rad'] > 0.7) & (df['geodesic_distance_rad'] < 0.9))] # | ((df['geodesic_distance_rad'] > 1.4) & (df['geodesic_distance_rad'] < 1.7))]

# update the descriptor's data frame in order to use the built-in methods more easily.

z_axis_filtered.filter.df = df

Intersecting supports can be deduced from a data frame by treating subtomogram ids as nodes in a graph and connecting them whenever they form a 'qp_id'--'nn_id'--pair in a given row of that data frame.

The resulting graph decomposes into connected components which are computed from a twist descriptor using the proximity clustering method.

Here, it is applied to the most recent cleaning results. The parameter size_connected_components represents a lower bound for the amount of particles (nodes) per connected component.

This is chosen as 3, meaning that data is grouped into a CR if there are at least 3 SU in the same connected component.

In [31]:
S = z_axis_filtered.filter.proximity_clustering(size_connected_components= 3)

Each connected component is a networkx Graph object, the nodes of which are subtomogram ids, which can be used to get subsets of the input motivelist in order to label that sublist, before concatenating them into the output motive list.

In [None]:
out_motl = cryomotl.Motl()

for i, G in enumerate(S):

    subtomo_indices = list(set(G.nodes()))

    sub_motl = vis_motl.get_motl_subset(subtomo_indices, feature_id= 'subtomo_id')

    sub_motl.df['geom1'] = i * np.ones(sub_motl.df['geom1'].shape[0])

    out_motl = out_motl + sub_motl

out_motl.write_out('cr_components_tutorial.em')