Network generation #105

saramoein372 · 2021-10-05T15:42:22Z

Hi Kelvin,

I have 2 question about the way the network is generated:

1- how connections between different cells is generated? Because for some of the cells we don't have data, but still I see that there is connection between them. Is there any way of replacement in the cells based on the colon id? If not how the connections are made?

2- Also, I wanted to ask is there any way to know which node was the start of the mutation? Is there any way to know about the roots?

Thanks,
Sara

zktuong · 2021-10-12T19:18:03Z

Hi Sara,

This is an excerpt of what i wrote for the method's section of https://www.nature.com/articles/s41591-021-01329-2#Sec8

B cell clone/clonotype network
Single-cell BCR networks were constructed using adjacency matrices computed from pairwise Levenshtein distance of the full amino acid sequence alignment for BCR(s) contained in every pair of cells. Construction of the Levenshtein distance matrices were performed separately for heavy-chain and light-chain contigs, and the sum of the total edit distance across all layers/matrices was used as the final adjacency matrix. To construct the BCR neighborhood graph, a minimum-spanning tree was constructed on the adjacency matrix for each clone/clonotype, creating a simple graph with edges indicating the shortest edit distance between a B cell and its nearest neighbor. Cells with identical BCRs, that is, cells with a total pairwise edit distance of zero, were then connected to the graph to recover edges trimmed off during the minimum-spanning-tree construction step. Fruchterman–Reingold graph layout was generated using a modified method to prevent singletons from flying out to infinity in ‘networkx’ (v2.5). Visualization of the resulting single-cell BCR network was achieved via transfer of the graph to relevant ‘anndata’ slots, allowing for access to plotting tools in scanpy.

As this is reliant oncell_id in the airr.tsv (or parenthetically dandelion.data.cell_id/dandelion.metadata.index) matching with anndata.obs_names, there would be a connection if the contigs was present in corresponding cells. I would suggest for you to check through your cell barcodes (cell ids, contig ids, sequence ids etc) and ensure that they are named correctly. Perhaps try it just for one sample first where minimal formatting of the barcodes is required and see if it works.

there no way to know which cell is closest to the germline from dandelion visualization -> you would have to run lineage trees separately (as per immcantation's suite, or with some other methods).

saramoein372 · 2021-10-13T20:51:47Z

Thanks Kelvin.
I have another question about the BCR clustering clone_id. There are four parts {A}{B}{C}_{D}.

So I did the clustering and I want to know how each value is assigned to each part of the clone_id.
For example I have for samples with same {A}= 11:

11_10_4_47
11_10_4_47
11_10_4_47
11_10_4_47

My question is how 11 is calculated? And for other parts of the clone_id? How the numbers are calculated?

Thanks,
Sara

zktuong · 2021-10-13T21:14:03Z

My question is how 11 is calculated? And for other parts of the clone_id? How the numbers are calculated?

Hi Sara, this is already described in detail in the documentation/tutorial:

https://sc-dandelion.readthedocs.io/en/latest/notebooks/3_dandelion_findingclones-10x_data.html

saramoein372 · 2021-10-14T14:10:15Z

Hi Kelvin. I already rad all the tutorial. But it is not clear for me if a clone_id is: 11_10_4_47;
then how "11" is CALCULATED. I know the meaning of each sub_id. But how it is calculated?

saramoein372 · 2021-10-14T14:12:09Z

In other words, how I should interpret the "11"? or other sub_ids?

zktuong · 2021-10-14T14:38:09Z

It’s just a random number - you don’t have to overinterpret it. Just know that if a cell/contig has 11, it means it’s shares the same sub-id as other contigs that have 11.

saramoein372 · 2021-10-14T14:55:22Z

O, okay. That was good to know. Thank you so much.

saramoein372 · 2021-10-14T15:35:03Z

Kelvin,

Thank you again.

I have two other questions after reading the tutorial, and other references you provided:
1- Is it correct to say: In the process of generating the BCR network, ANY nodes which their cdr3 junction sequences have at least 85% similarity with other sequences, will generate an edge between them?
From my understanding first all "inter cluster edges" are generated, and then "intra clusters edges" are generated IF ANY sequences in a cluster has more than 85% similarity with ANY of the sequences in other clusters.

Is this correct?

2- Also, in the visualization of the network we could see that each node is probably representing more than a cell. Is there any way that we make the node size larger according to the number of cell it is including?

Thank you so much.

Sara

zktuong · 2021-10-14T16:14:54Z

Hi Sara,

the networks are constructed only within each clone/cluster, hence there’s no intercluster edges.
This is controlled by the ‘clone_id’ column - so for example, a single network will be constructed between cells that are tagged as clone ‘1_1_1_1’ and a separate network is constructed for clone ‘1_2_1_2’.

The construction of the edges is as described above:
A) for a given clone, a minimum spanning tree is constructed and only these edges are kept.
B) if two bcrs have 100% identity, then there would be additional edges that are added to the network. The 85% similarity is only for clone definition.

the only time you would see edges between ‘1_1_1_1’ and ‘1_2_1_2’ is if a cell contains more than one pair of contigs i.e. the cell’s clone_id is ‘1_1_1_1 | 1_2_1_2’ because there’s two possible combinations.

hence, to partly answer your 2nd question, each node is a cell, and not a contig. There’s no immediate plans to construct a version of the plot that you described but it’s potentially through scirpy. However, there’s a couple of things i will need to implement for it to work properly. See scverse/scirpy#286

saramoein372 · 2021-10-14T17:45:15Z

Thank you Kelvin. Related to my first question: in the network generated by dandelion, I could see part of the network that there is no clone_id for some of the nodes. How is this possible? I am confused actually. I actually don't know how the nodes and edges are connected, when there is no clone_id for them. Thanks, Sara

…

On Thu, Oct 14, 2021 at 12:15 PM Kelvin ***@***.***> wrote: Hi Sara, the networks are constructed only within each clone/cluster, hence there’s no intercluster edges. This is controlled by the ‘clone_id’ column - so for example, a single network will be constructed between cells that are tagged as clone ‘1_1_1_1’ and a separate network is constructed for clone ‘1_2_1_2’. The construction of the edges is as described above: A) for a given clone, a minimum spanning tree is constructed and only these edges are kept. B) if two bcrs have 100% identity, then there would be additional edges that are added to the network. The 85% similarity is only for clone definition. the only time you would see edges between ‘1_1_1_1’ and ‘1_2_1_2’ is if a cell contains more than one pair of contigs i.e. the cell’s clone_id is ‘1_1_1_1 | 1_2_1_2’ because there’s two possible combinations. hence, to partly answer your 2nd question, each node is a cell, and not a contig. There’s no immediate plans to construct a version of the plot that you described but it’s potentially through scirpy. However, there’s a couple of things i will need to implement for it to work properly. See scverse/scirpy#286 <scverse/scirpy#286> — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#105 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AVVJONU5B3PFZLAR7DIOMYLUG36YTANCNFSM5FL7PPBQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

zktuong · 2021-10-14T18:05:27Z

Can you show me what the plot looks like, and dataframe? It’s difficult for me to imagine how that is possible unless they have the same clone id

saramoein372 · 2021-10-14T19:05:03Z

Thanks Kelvin. Unfortunately, the data is confidential and I can not share. But I think I can ask my question in different way: When I read the dandelion object " dandelion_results.h5" I get the below keys: ['/data', '/edges', '/metadata', '/metadata/meta/values_block_0/meta', '/graph/graph_0', '/graph/graph_1', '/distance/VDJ_1', '/distance/VDJ_2', '/distance/VJ_1', '/distance/VJ_2'] When I read: f = pd.read_hdf('/Users/saramoein/Documents/BCR/dandelion_results.h5', key='/edges') Is it true that I claim the f.edges defines the network edges? Thanks again Kelvin.

…

On Thu, Oct 14, 2021 at 2:05 PM Kelvin ***@***.***> wrote: Can you show me what the plot looks like, and dataframe? It’s difficult for me to imagine how that is possible unless they have the same clone id — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#105 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AVVJONQWWRZ3QCM3KZC3K6DUG4LXFANCNFSM5FL7PPBQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

zktuong · 2021-10-14T20:46:57Z

Sure. you can also inspect it in:

vdj = ddl.read_h5('dandelion_results.h5'
vdj.edges

As i'm unable to see what's wrong with your plot/data and you can not provide me with the requisite info that i asked for, i will close this issue now.

zktuong closed this as completed Oct 14, 2021

zktuong reopened this Oct 14, 2021

zktuong closed this as completed Oct 14, 2021

Repository owner locked and limited conversation to collaborators Oct 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Network generation #105

Network generation #105

saramoein372 commented Oct 5, 2021

zktuong commented Oct 12, 2021 •

edited

saramoein372 commented Oct 13, 2021

zktuong commented Oct 13, 2021

saramoein372 commented Oct 14, 2021

saramoein372 commented Oct 14, 2021

zktuong commented Oct 14, 2021

saramoein372 commented Oct 14, 2021

saramoein372 commented Oct 14, 2021 •

edited

zktuong commented Oct 14, 2021

saramoein372 commented Oct 14, 2021 via email

zktuong commented Oct 14, 2021

saramoein372 commented Oct 14, 2021 via email

zktuong commented Oct 14, 2021

This issue was moved to a discussion.

This issue was moved to a discussion.

Network generation #105

Network generation #105

Comments

saramoein372 commented Oct 5, 2021

zktuong commented Oct 12, 2021 • edited

saramoein372 commented Oct 13, 2021

zktuong commented Oct 13, 2021

saramoein372 commented Oct 14, 2021

saramoein372 commented Oct 14, 2021

zktuong commented Oct 14, 2021

saramoein372 commented Oct 14, 2021

saramoein372 commented Oct 14, 2021 • edited

zktuong commented Oct 14, 2021

saramoein372 commented Oct 14, 2021 via email

zktuong commented Oct 14, 2021

saramoein372 commented Oct 14, 2021 via email

zktuong commented Oct 14, 2021

This issue was moved to a discussion.

zktuong commented Oct 12, 2021 •

edited

saramoein372 commented Oct 14, 2021 •

edited