identify lines where modifications for bubble mismatch can be identified #19

GavinHuttley · 2022-12-01T00:39:02Z

we want to collapse bubbles with size k since those will just be single-base mismatch

What lines in dbga are most pertinent to this?

The text was updated successfully, but these errors were encountered:

xingjianleng · 2022-12-03T03:54:53Z

Currently, there is one attribute in debruijn_pairwise.py and debruijn_msa.py named self.expansion, which includes the merge k-mer indices and node indices in bubbles. However, as suggested in #17, we should move this calculation after alignment() is called.

By using the expansion variable, we can obtain the correspondence between bubbles from each sequence (they should appear at the same index in the expansion for each sequence, i.e., we can use [expansion[j][i] for j in range(num_seqs)] to extract bubbles for all sequences).

In debruijn_pairwise.py, current implementation didn't use the expansion. We should refactor the alignment() function to take the advantage of expansion (similar to alignment() in debris_msa.py). Then, change according to the aforementioned approach.

In debruijn_msa.py, we should change the block of code below with the approach mentioned above. We should compare the length of bubbles for each sequence. If their difference is 1, we may be able to collapse the bubble rather than calling cogent3 alignment.

DBGA/src/dbga/debruijn_msa.py

Lines 485 to 509 in 0da8dca

    
           if type(self.expansion[0][i]) == list: 
        
               # extract bubbles 
        
               bubble_indices = [] 
        
               for j in range(len(self.names)): 
        
                   bubble_indices.append(deepcopy(self.expansion[j][i])) 
        
               # align the bubble 
        
               # include the tail merge node 
        
               for j in range(len(self.names)): 
        
                   bubble_indices[j].append(self.expansion[j][i + 1]) 
        
               bubble_alignment = self.bubble_aln( 
        
                   bubble_indices=bubble_indices, 
        
                   prev_edge_reads=merge_edge_read, 
        
                   model=model, 
        
                   dm=dm, 
        
                   indel_rate=indel_rate, 
        
                   indel_length=indel_length, 
        
                   prev_merge=prev_merge_str, 
        
               ) 
        
               for j in range(len(self.names)): 
        
                   aln[j].append( 
        
                       bubble_alignment[j][1:] 
        
                       if prev_merge_str 
        
                       else bubble_alignment[j] 
        
                   )

GavinHuttley assigned xingjianleng Dec 1, 2022

xingjianleng assigned GavinHuttley and unassigned xingjianleng Dec 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

identify lines where modifications for bubble mismatch can be identified #19

identify lines where modifications for bubble mismatch can be identified #19

GavinHuttley commented Dec 1, 2022

xingjianleng commented Dec 3, 2022

identify lines where modifications for bubble mismatch can be identified #19

identify lines where modifications for bubble mismatch can be identified #19

Comments

GavinHuttley commented Dec 1, 2022

xingjianleng commented Dec 3, 2022