Multi letter alleles (e.g. indels) in haplotypes() #427
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was discussion of this in the following PR: #426. @jeromekelleher is keen not to open the multi-letter-allele can of worms, unless there is demand from users. @hyanwong thinks that (in the case of haplotype output) the semantics are reasonably clear, and being able to output haplotype strings containing indels will be needed by users quite soon.
Some of this is probably only relevant once finite site tree sequences have been implemented. But (IMO) if we think it is reasonable to concatenate widely spaced SNPs into a string of letters, it is also reasonable to concatenate multi-letter haplotypes from an infinite-sites TS, as long as each allele takes the same number of characters. This is what this PR does.
Also from that PR, a quick summary - do we think it useful to be able to output small indels as aligned haplotypes, e.g.
Note that the current code does not output the
.between sites (an optional convention to represents non-variable regions between variants) - I put them there for clarity, and also because I think it might be a useful addition to address #353 (comment)