-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ctg_hex_pos and ctg_hex_dist is buggy when the contig has indels #10
Comments
Add an example
|
A temporary compromise, ideally use 'ctg_hex_dist', but it's still buggy see #10
…r + strand #10 The actually seq length wst. contig may be shorter or longer than window, depending on the indels on the contig
Actually, ctg_hex_dist could be correct, it's just the ctg_hex_pos is not consistent with genomics coordinates and should be dropped. |
Insight: It's that ctg_hex_pos isn't interpretable wst. genome coordinate. But ctg_hex_dist is probably valid. |
Happy to see the PAS hexamers finally match between contig and reference in the third case. Conclusion:
|
This is because the coordinate information is currently lost when the sequence is extracted from contig.
Currently, this search function is used to search for hexamer in the contig, it only takes into account the extracted sequence with cigar information missing.
kleat/kleat/hexamer/search.py
Line 66 in 933e240
Maybe it's easier to define the searching window (e.g. 50bp) wst. to the reference, the actual sequence length wst. contig could be a few bp more or less than 50bp.
The text was updated successfully, but these errors were encountered: