Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster_counts interpretation #210

Closed
sunsetyerin opened this issue Feb 2, 2023 · 1 comment
Closed

cluster_counts interpretation #210

sunsetyerin opened this issue Feb 2, 2023 · 1 comment

Comments

@sunsetyerin
Copy link

I have a question aboutcluster_counts column from nanocompore sampcomp result.
I saw previous issues that cluster_counts is the number of reads assigned to each cluster.
but wonder which one is number of reads and which one is number of clusters.

control_1:32/12__control_2:21/9__test_1:60/23__test_2:38/16

From here, control_1:32/12, 32 is the number of reads and 12 is the number of clusters?

@lmulroney
Copy link
Collaborator

Hi @sunsetyerin,

Nanocompore is limited to 2 clusters for the gmm. If 1 cluster fits the data better than 2, no further processing is done and the site is considered unmodified. If 2 clusters fit the data better than 2, the statical test (usually the logistical regression test) is performed.

From this example there are 32 reads from control 1 assigned to cluster 1 (c1) and 12 reads assigned to cluster 2 (c2). Control 2 has 21 reads assigned to c1 and 9 reads assigned to c2. Test 1 has 60 reads assigned to c1 and 23 reads assigned to c2, and test 2 has 38 reads assigned to c1 and 16 reads assigned to c2.

The basic way to read those lines is:
[sample name]:[number of reads assigned to cluster 1]/[number of reads assigned to cluster 2]__repeated for each further sample.

I hope this explanation helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants