Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

identify lines where modifications for bubble mismatch can be identified #19

Open
GavinHuttley opened this issue Dec 1, 2022 · 1 comment
Assignees

Comments

@GavinHuttley
Copy link
Collaborator

we want to collapse bubbles with size k since those will just be single-base mismatch

What lines in dbga are most pertinent to this?

@xingjianleng
Copy link
Owner

Currently, there is one attribute in debruijn_pairwise.py and debruijn_msa.py named self.expansion, which includes the merge k-mer indices and node indices in bubbles. However, as suggested in #17, we should move this calculation after alignment() is called.

By using the expansion variable, we can obtain the correspondence between bubbles from each sequence (they should appear at the same index in the expansion for each sequence, i.e., we can use [expansion[j][i] for j in range(num_seqs)] to extract bubbles for all sequences).

In debruijn_pairwise.py, current implementation didn't use the expansion. We should refactor the alignment() function to take the advantage of expansion (similar to alignment() in debris_msa.py). Then, change according to the aforementioned approach.

In debruijn_msa.py, we should change the block of code below with the approach mentioned above. We should compare the length of bubbles for each sequence. If their difference is 1, we may be able to collapse the bubble rather than calling cogent3 alignment.

if type(self.expansion[0][i]) == list:
# extract bubbles
bubble_indices = []
for j in range(len(self.names)):
bubble_indices.append(deepcopy(self.expansion[j][i]))
# align the bubble
# include the tail merge node
for j in range(len(self.names)):
bubble_indices[j].append(self.expansion[j][i + 1])
bubble_alignment = self.bubble_aln(
bubble_indices=bubble_indices,
prev_edge_reads=merge_edge_read,
model=model,
dm=dm,
indel_rate=indel_rate,
indel_length=indel_length,
prev_merge=prev_merge_str,
)
for j in range(len(self.names)):
aln[j].append(
bubble_alignment[j][1:]
if prev_merge_str
else bubble_alignment[j]
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants