Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network generation #105

Closed
saramoein372 opened this issue Oct 5, 2021 · 13 comments
Closed

Network generation #105

saramoein372 opened this issue Oct 5, 2021 · 13 comments

Comments

@saramoein372
Copy link

Hi Kelvin,

I have 2 question about the way the network is generated:

1- how connections between different cells is generated? Because for some of the cells we don't have data, but still I see that there is connection between them. Is there any way of replacement in the cells based on the colon id? If not how the connections are made?

2- Also, I wanted to ask is there any way to know which node was the start of the mutation? Is there any way to know about the roots?

Thanks,
Sara

@zktuong
Copy link
Owner

zktuong commented Oct 12, 2021

Hi Sara,

  1. This is an excerpt of what i wrote for the method's section of https://www.nature.com/articles/s41591-021-01329-2#Sec8

B cell clone/clonotype network
Single-cell BCR networks were constructed using adjacency matrices computed from pairwise Levenshtein distance of the full amino acid sequence alignment for BCR(s) contained in every pair of cells. Construction of the Levenshtein distance matrices were performed separately for heavy-chain and light-chain contigs, and the sum of the total edit distance across all layers/matrices was used as the final adjacency matrix. To construct the BCR neighborhood graph, a minimum-spanning tree was constructed on the adjacency matrix for each clone/clonotype, creating a simple graph with edges indicating the shortest edit distance between a B cell and its nearest neighbor. Cells with identical BCRs, that is, cells with a total pairwise edit distance of zero, were then connected to the graph to recover edges trimmed off during the minimum-spanning-tree construction step. Fruchterman–Reingold graph layout was generated using a modified method to prevent singletons from flying out to infinity in ‘networkx’ (v2.5). Visualization of the resulting single-cell BCR network was achieved via transfer of the graph to relevant ‘anndata’ slots, allowing for access to plotting tools in scanpy.

As this is reliant oncell_id in the airr.tsv (or parenthetically dandelion.data.cell_id/dandelion.metadata.index) matching with anndata.obs_names, there would be a connection if the contigs was present in corresponding cells. I would suggest for you to check through your cell barcodes (cell ids, contig ids, sequence ids etc) and ensure that they are named correctly. Perhaps try it just for one sample first where minimal formatting of the barcodes is required and see if it works.

  1. there no way to know which cell is closest to the germline from dandelion visualization -> you would have to run lineage trees separately (as per immcantation's suite, or with some other methods).

@saramoein372
Copy link
Author

Thanks Kelvin.
I have another question about the BCR clustering clone_id. There are four parts {A}{B}{C}_{D}.

So I did the clustering and I want to know how each value is assigned to each part of the clone_id.
For example I have for samples with same {A}= 11:

11_10_4_47
11_10_4_47
11_10_4_47
11_10_4_47

My question is how 11 is calculated? And for other parts of the clone_id? How the numbers are calculated?

Thanks,
Sara

@zktuong
Copy link
Owner

zktuong commented Oct 13, 2021

My question is how 11 is calculated? And for other parts of the clone_id? How the numbers are calculated?

Hi Sara, this is already described in detail in the documentation/tutorial:

https://sc-dandelion.readthedocs.io/en/latest/notebooks/3_dandelion_findingclones-10x_data.html

@saramoein372
Copy link
Author

Hi Kelvin. I already rad all the tutorial. But it is not clear for me if a clone_id is: 11_10_4_47;
then how "11" is CALCULATED. I know the meaning of each sub_id. But how it is calculated?

@saramoein372
Copy link
Author

In other words, how I should interpret the "11"? or other sub_ids?

@zktuong
Copy link
Owner

zktuong commented Oct 14, 2021

It’s just a random number - you don’t have to overinterpret it. Just know that if a cell/contig has 11, it means it’s shares the same sub-id as other contigs that have 11.

@saramoein372
Copy link
Author

O, okay. That was good to know. Thank you so much.

@saramoein372
Copy link
Author

saramoein372 commented Oct 14, 2021

Kelvin,

Thank you again.

I have two other questions after reading the tutorial, and other references you provided:
1- Is it correct to say: In the process of generating the BCR network, ANY nodes which their cdr3 junction sequences have at least 85% similarity with other sequences, will generate an edge between them?
From my understanding first all "inter cluster edges" are generated, and then "intra clusters edges" are generated IF ANY sequences in a cluster has more than 85% similarity with ANY of the sequences in other clusters.

Is this correct?

2- Also, in the visualization of the network we could see that each node is probably representing more than a cell. Is there any way that we make the node size larger according to the number of cell it is including?

Thank you so much.

Sara

@zktuong
Copy link
Owner

zktuong commented Oct 14, 2021

Hi Sara,

the networks are constructed only within each clone/cluster, hence there’s no intercluster edges.
This is controlled by the ‘clone_id’ column - so for example, a single network will be constructed between cells that are tagged as clone ‘1_1_1_1’ and a separate network is constructed for clone ‘1_2_1_2’.

The construction of the edges is as described above:
A) for a given clone, a minimum spanning tree is constructed and only these edges are kept.
B) if two bcrs have 100% identity, then there would be additional edges that are added to the network. The 85% similarity is only for clone definition.

the only time you would see edges between ‘1_1_1_1’ and ‘1_2_1_2’ is if a cell contains more than one pair of contigs i.e. the cell’s clone_id is ‘1_1_1_1 | 1_2_1_2’ because there’s two possible combinations.

hence, to partly answer your 2nd question, each node is a cell, and not a contig. There’s no immediate plans to construct a version of the plot that you described but it’s potentially through scirpy. However, there’s a couple of things i will need to implement for it to work properly. See scverse/scirpy#286

@saramoein372
Copy link
Author

saramoein372 commented Oct 14, 2021 via email

@zktuong
Copy link
Owner

zktuong commented Oct 14, 2021

Can you show me what the plot looks like, and dataframe? It’s difficult for me to imagine how that is possible unless they have the same clone id

@saramoein372
Copy link
Author

saramoein372 commented Oct 14, 2021 via email

@zktuong zktuong closed this as completed Oct 14, 2021
@zktuong zktuong reopened this Oct 14, 2021
@zktuong
Copy link
Owner

zktuong commented Oct 14, 2021

Sure. you can also inspect it in:

vdj = ddl.read_h5('dandelion_results.h5'
vdj.edges

As i'm unable to see what's wrong with your plot/data and you can not provide me with the requisite info that i asked for, i will close this issue now.

@zktuong zktuong closed this as completed Oct 14, 2021
Repository owner locked and limited conversation to collaborators Oct 28, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants