-
Notifications
You must be signed in to change notification settings - Fork 79
Added the write_ms function to write out ms-style output from a tree sequence #854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
petrelharp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very nice! You've dealt with a lot of things here. See comments, but my main comments are:
- maybe this should be a top-level method, since it can act on many tree seuqence (thus getting rid of
num_replicates)? - we need more testing (eg testing for equality of positions, and of haplotypes)
- right now it uses
haplotypes, and so requires the alleles to be 0/1, but really we just need them to be biallelic; usinggenotypesorgenotype_matrixinstead would allow that.
Let me know what you think? Happy to help with something if you're not sure of the best way forward.
python/tests/test_ms.py
Outdated
| @@ -0,0 +1,188 @@ | |||
| # MIT License | |||
| # | |||
| # Copyright (c) 2018-2019 Tskit Developers | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and omit the next line, as it wasn't present in 2016
python/tskit/ms.py
Outdated
| # | ||
| # MIT License | ||
| # | ||
| # Copyright (c) 2019 Tskit Developers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2020!
|
|
||
| def verify_num_haplotypes(self, ts, mutation_rate, num_replicates): | ||
| if num_replicates == 1: | ||
| with tempfile.TemporaryDirectory() as temp_dir: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about using the file-like io.StringIO(), like for instance here?
python/tests/test_ms.py
Outdated
| quantities["num_sites"] = num_sites | ||
| quantities["num_positions"] = num_positions | ||
| quantities["num_haplotypes"] = num_haplotypes | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also test for equality of the positions themselves, no? And, haplotypes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you have written here is almost an ms file parsing function: might as well go all the way, and just return the whole dict? Then you can test the various aspects of it, like below. This wouldn't be for distribution, just for testing, so no need to worry about making it fast or documented, just simple and obviiously correct.
python/tskit/ms.py
Outdated
| # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
| # SOFTWARE. | ||
| """ | ||
| Convert tree sequences to ms output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that we're not converting the whole tree seuqence: just writing out the genotypes - how about "Write the genotypes in a tree sequence in ms format."
python/tskit/ms.py
Outdated
| recombination_rate=0, | ||
| migration_rate=0, | ||
| num_loci=1, | ||
| num_replicates=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm - looks like you don't need most of these options. Do you need any of them? Other than num_replicates, maybe?
python/tskit/ms.py
Outdated
| variant.position / (tree_sequence.sequence_length) | ||
| for variant in tree_sequence.variants() | ||
| ] | ||
| positions.sort() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the positions are guaranteed to be sorted.
python/tskit/ms.py
Outdated
| file=output, | ||
| ) | ||
| print(file=output) | ||
| for h in tree_sequence.haplotypes(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm - so, haplotypes returns the actual alleles, which are thus required to be 0/1. But the genotype_matrix gives the indexes into arrays of alleles, with 0 always the ancestral state. So, more generally, could we do something like this:
genotypes = tree_sequence.genotype_matrix()
for k in range(tree_sequence.num_samples):
print("".join(genotypes[:, k]), file=output)
(but, what's the - for?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(The "-" is for missing data @petrelharp )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's what I guessed... but does ms output missing data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I wouldn't imagine it does. But, I'm not sure how strict we should be here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not opposed to including missing data! Just wanted to be sure what was happening.
python/tskit/trees.py
Outdated
| >>> tree_sequences = msprime.simulate(<simulation arguments>, num_replicates=num_replicates) | ||
| >>> with open('output.ms', 'w') as ms_file: | ||
| >>> for tree_sequence in tree_sequences: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation
python/tskit/trees.py
Outdated
| >>> tree_sequences = msprime.simulate(<simulation arguments>, num_replicates=num_replicates) | ||
| >>> with open('output.ms', 'w') as ms_file: | ||
| >>> for tree_sequence in tree_sequences: | ||
| >>> tree_sequence.write_ms(ms_file, mutation_rate=mutation_rate, num_replicates=num_replicates) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've got a nice solution to this. But really, write_ms is not a property of a TreeSequence, it's a thing that you can do to one, or maybe many, tree sequences. What if instead of ts.write_ms you did tskit.write_ms(ts)? That way if ts is a TreeSequence, you'd write it out, and if ts is a generator (of tree sequences) then you have replicates, and write them out?
jeromekelleher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can simplify this down quite a bit @saurabhbelsare, and it would actually be better to not have a class here. My main question really is, "what is this for and who will use it"; once this is clearer, we can see better what the requirements are in terms of what kind of data we try to represent and what options we need.
python/tskit/ms.py
Outdated
| same tree. Therefore, we must keep track of all breakpoints from the | ||
| simulation and write out a tree for each one. | ||
| """ | ||
| breakpoints = list(self._tree_sequence.breakpoints(True)) + [self._num_loci] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this code makes sense in the msprime implementation but not here. In msprime, we know there are extra breakpoints not present in the tree sequence and we use these to output extra copies of the trees appropriately (otherwise we don't have the same distribution of the number of trees as ms). We can delete all the stuff about breakpoints here.
python/tskit/ms.py
Outdated
| simulation and write out a tree for each one. | ||
| """ | ||
| breakpoints = list(self._tree_sequence.breakpoints(True)) + [self._num_loci] | ||
| if self._num_loci == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be if ts.num_trees == 1
python/tskit/ms.py
Outdated
| print(newick, file=output) | ||
| else: | ||
| j = 1 | ||
| for tree in self._tree_sequence.trees(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can just be;
for tree in ts.trees():
newick = tree.newick(precision=self._precision)
print("[{}]".format(tree.span), newick, file=output)
python/tskit/ms.py
Outdated
|
|
||
| def __write_header(self, output): | ||
| print( | ||
| "ms {} {} # This file is an ms-style output file generated from tskit. The two arguments written are sample size and number of replicates".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should put comments in the output - it's not part of ms's output, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there are no comments in ms's output.
python/tskit/ms.py
Outdated
|
|
||
| def write(self, output): | ||
|
|
||
| if os.path.getsize(output.fileno()) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a good idea I think - it means that you can't write to file-like objects.
python/tskit/ms.py
Outdated
| file=output, | ||
| ) | ||
| print(file=output) | ||
| for h in tree_sequence.haplotypes(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(The "-" is for missing data @petrelharp )
python/tskit/ms.py
Outdated
| # Introducing an error to exit if the sequence is not compatible with the ms format # | ||
| ##################################################################################### | ||
| else: | ||
| sys.exit("This tree sequence contains non-biallelic SNPs and is incompatible with the ms format!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An exception is more appropriate here, .e.g. raise ValueError("not compatible with ms output")
|
Thanks for all these suggestions! A lot of the points you've listed are coming from the fact that I used the existing write_vcf method from tksit as a template, and used existing code from msprime/cli.py to create the write function. I'll work on these points. Also, I'm not sure of the exact target application. Should I message on the github issue thread where this has been requested and tag and ask the people who have requested it? Thanks. |
|
Thanks @petrelharp for mentioning this. I wasn't aware of #854. PartialSHIC takes a single ms-style output that contains multiple replicates in it, for example I use 1000 replicates. // // p.s. I will be away for hiking on weekends, but will be back soon and do testing. |
|
I did some quick testing. Maybe this is not the right way but here what I did. and then, I loaded my previously simulated data stored in tree sequence Perhaps, I am doing something wrong here. Should learn what is git-hub all about, like what is pull-request and staff |
|
Hi, @yunusbb! What you did sounded like it should work, but it depends on what version of tskit you're copying the file over. Here's a quick summary of how to do the git thing: Now, everything you do from this directory only will use the local version of tskit, matching this pull request. If you want to make edits, I'd recommend something different, but this is a quick and easy way to test out what's going on. |
|
Hi, @petrelharp! Thank you for your help with this! |
|
I've updated a new version of the write_ms function, that addresses the points raised above. The write_ms function is now a function of tskit, and not tree_sequence. Hence it no longer needs the two different calls depending on whether num_replicates is used or not. All the breakpoints related code which was relevant to msprime has now been removed. All the other minor points have been fixed. The new tests for positions and genotypes have been added. Let me know how it looks now. |
|
@saurabhbelsare Great stuff! Could you rebase and run pre-commit (see here and here). This will then make CI green. I'll do a proper review tomorrow. Thanks! |
|
Hi @benjeffery, I've run all the pre-commit checks and performed the corresponding modifications, and pushed that version. However, when I tried to do the rebasing, following these instructions, when I run I'm not sure how to fix this. Sorry, I'm still not super on top of working with github. |
|
Sounds like you don't have this fork set as a remote. |
|
Or it might be called |
|
@petrelharp, git remote -v did not show me the tskit-dev repo, so I added it as per @benjeffery's instructions. However, when I run |
|
Hi @saurabhbelsare. Try using the |
|
Hi @grahamgower, That worked, thanks! I got another error when I tried to run When I open trees.py, it is showing me the version of trees.py from the first commit, not the one from my latest commit, which is what I would have expected from the squashing. Hence I'm not sure how to resolve this. Sorry that I need step by step instructions for this, I haven't really done this before and I don't want to break anything. |
|
In general, what you need to do in this case is go through the files with conflicts and find the places like this: and edit them to be the way you want. That'll be a bit annoying here, since you will probably have to do it four times (once for each of your commits). This has maybe got to be difficult because the main branch has moved a good bit since you started this. Want me to do the rebase this time? |
|
Thanks @petrelharp, I've rebased and pushed it. Let me know if it looks right now. |
|
Something went wrong there, @saurabhbelsare - this is not rebased to main. Looking at Maybe you forgot to push? You'll have to do |
|
I did a git push -f after I did the rebase. And I tried it again right now, and I get the message |
|
Hi @saurabhbelsare, thanks for persevering with this! Your branch is still not rebased to then rebase your work on top: |
Hi @petrelharp, I've tried to follow this recipe to do testing but was stuck at the compilation step with errors: #just in case, this was my error during compilation (I've tried to read some gcc documentation and some other things but could not solve this issue) make |
|
@yunusbb sounds like you need Docs at https://tskit.readthedocs.io/en/latest/development.html can help when building tskit. |
|
Thanks @benjeffery ! It worked now. |
|
Hi, To have #892 locally, I followed steps provided by @petrelharp and @benjeffery, i.e. git clone https://github.com/tskit-dev/tskit.git make python3.8 with open('output_1.ms', 'w') as ms_file: my original tree sequence was generated using SLIM and then it was recapitated & mutated using msprime |
|
@yunusbb - great! Let us know if the output works for your pipeline? |
|
@saurabhbelsare Would you like me to rebase this to |
|
Hi @benjeffery, is it possible for you to quickly do the rebase? I'm still getting merge conflicts when I try to do it and can't figure out what's going wrong. Sorry about that. Hi @yunusbb, were you able to generate ms-style output with replicates the write_ms function from this pull request now? The same function should work with and without replicates. Let me know if everything is working the way you need it to. |
|
Hi, @saurabhbelsare! So I did some testing this morning. here is the first successful try:
wc -l test_msprime_3_sim_iters.ms now the failed attempt, with one parameter changed (sample_size=100):reps = msprime.simulate( with open("test_msprime_3_sim_iters.ms", "w") as ms_file: wc -l test_msprime_3_sim_iters.ms As far as I remember, I did not change anything between these attempts. |
|
Hi @yunusbb, I wasn't able to reproduce the problem you are seeing with changing the parameter. Here is my script: Here are my outputs: wc -l test_msprime_3_sim_iters.ms Both the output files have data written out. Is there a mismatch in the tree_sequence object you are giving the write_ms function? In your first example, the output of msprime is to I can add the parameter to manually turn off |
|
Hi @saurabhbelsare , with open('output_1.ms', 'w') as ms_file: Sorry for taking your time on this. Must have been able to spot this myself. Anyway, I am sending you three SLIM generated tree sequence files. I have compressed them using right-click 'Compress' menu in MACOSX finder. SLIM_generated_tree_sequences.zip Here are my commands to load tree sequences and process with write_ms: ts1=tskit.load('slim_neutral_reps_1_Est_Dem_decap_subset_1800.trees') |
Codecov Report
@@ Coverage Diff @@
## main #854 +/- ##
==========================================
- Coverage 93.47% 93.43% -0.05%
==========================================
Files 25 25
Lines 20029 20065 +36
Branches 796 808 +12
==========================================
+ Hits 18723 18747 +24
- Misses 1272 1281 +9
- Partials 34 37 +3
Continue to review full report at Codecov.
|
|
@saurabhbelsare I've rebased to master - there was only one small conflict. I'd suggest reading https://docs.github.com/en/free-pro-team@latest/github/collaborating-with-issues-and-pull-requests/resolving-a-merge-conflict-using-the-command-line for how to do this next time. |
|
@saurabhbelsare I forgot to say that to fetch the rebased changes, you should not |
|
📖 Docs for this PR can be previewed here |
benjeffery
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking this up @saurabhbelsare! I think the main method needs some refactoring and simplifying, so I haven't reviewed the tests yet as they will change.
python/tskit/ms.py
Outdated
| """ | ||
|
|
||
|
|
||
| class msWriter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this deserves a class it has no life-time or identity. I also don't think it deserves its own module as a single function, it be rolled into the write_ms function in trees.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the class entirely and moved all the functions to trees.py
python/tskit/ms.py
Outdated
| tree_sequence, | ||
| print_trees=False, | ||
| precision=4, | ||
| mutation_rate=0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mutation_rate has no purpose here, only made sense in the msprime code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the mutation_rate argument
python/tskit/ms.py
Outdated
| "ms {} {}".format( | ||
| self._tree_sequence.sample_size, max(self._num_replicates, 1) | ||
| ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is right - my understanding is that the first line of an ms file is the command line used to generate it. Should we be putting " ".join(sys.argv) here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two are the first two arguments in the ms file format. Including the rest is a complicated procedure, since the tree_sequence could be generated from different software like msprime or SLiM, and it is not straightforward to recreate all the ms-style command line arguments from that. I discussed this with @petrelharp a while ago and we thought that only having these two basic arguments makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is: what do downstream users expect to see here? If no-one is parsing this line, then it would make sense to put something about provenance here, as @benjeffery suggests. But I do worry that people might be parsing it a bit (well, hopefully just looking for a line starting with ms X Y to skip), and they might be pulling the sample size and number of replicates out of it. So, I guess I vote for leaving it as-is?
python/tskit/ms.py
Outdated
| newick = tree.newick(precision=self._precision) | ||
| print(f"[{tree.span}]", newick, file=output) | ||
|
|
||
| def __write_header(self, output): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think this needs to be a separate function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged with the main function.
python/tskit/ms.py
Outdated
| ), | ||
| file=output, | ||
| ) | ||
| print("{}".format(999), file=output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the random seed line in the ms file? If it has to be numeric then using 0 would be better, at first I assumed 999 had a special meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've replaced 999 with 0.
python/tskit/ms.py
Outdated
| ] | ||
| for position in positions: | ||
| print( | ||
| "{0:.{1}f}".format(position, self._precision), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use an f string here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've replaced the printing with f strings.
python/tskit/ms.py
Outdated
| if set(tmp_str).issubset({"0", "1", "-"}): | ||
| print(tmp_str, file=output) | ||
| ################################################# | ||
| # Introducing an error to exit if the sequence # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment isn't needed now we have the exception text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the comment.
python/tskit/trees.py
Outdated
| writer = ms.msWriter( | ||
| tree_seq, | ||
| print_trees=print_trees, | ||
| precision=precision, | ||
| mutation_rate=mutation_rate, | ||
| num_replicates=num_replicates, | ||
| write_header=True, | ||
| ) | ||
| else: | ||
| writer = ms.msWriter( | ||
| tree_seq, | ||
| print_trees=print_trees, | ||
| precision=precision, | ||
| mutation_rate=mutation_rate, | ||
| num_replicates=num_replicates, | ||
| write_header=False, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you can have one call to msWriter with write_header=(i==0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented this change.
|
📖 Docs for this PR can be previewed here |
|
Hi @benjeffery, I've implemented all the fixes you've suggested. The tests have also been modified to work with the new structure. Thanks for all the suggestions! Let me know if everything looks good now. Hi @yunusbb, The function now has a write_header argument. Your example runs as follows: Let me know if this is what you are looking for, and if the output is as you expect. |
|
Hi @saurabhbelsare! Fantastic! That is exactly what is needed to process "non-msrpime" tree sequences. Thank you for taking the time to do this! |
benjeffery
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heading in the right direction @saurabhbelsare! I still think this could be simpler as just one function though.
python/tskit/trees.py
Outdated
| ) | ||
|
|
||
|
|
||
| def print_ms_file_trees(tree_seq, precision, output): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this function is only called in one place it doesn't have a life of it's own. Best to inline it into print_ms_file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inlined the function
python/tskit/trees.py
Outdated
| print(newick, file=output) | ||
|
|
||
|
|
||
| def print_ms_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function too can be inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inlined the function
python/tskit/trees.py
Outdated
| are sample size and number of replicates. The second line has a 0 as a substitute | ||
| for the random seed. | ||
| """ | ||
| if isinstance(tree_sequence, collections.Iterable): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you could do tree_sequence=[tree_sequence] if the argument wasn't an iterable. This would let you inline print_ms_file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done this modification.
python/tskit/trees.py
Outdated
| Print out the trees in ms-format from the specified tree sequence. | ||
| """ | ||
| tree = next(tree_seq.trees()) | ||
| newick = tree.newick(precision=precision) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only printing one tree, is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, this was a mistake. I've fixed this now. Also comparing with the ms output, when there is no recombination, and therefore only one tree, ms does not write out the span, while when there are multiple trees, it does. The new print_trees part of the function does that. Also, I looked carefully at the ms manual, and the -T argument which prints trees suppresses the output of genotypes. I have modified the write_ms function to behave accordingly.
…function from print genotypes, and added iterator to print all trees
8b48a3f to
6e3a671
Compare
|
📖 Docs for this PR can be previewed here |
|
Hi @benjeffery, I've done all the latest modifications you suggested and (hopefully correctly) rebased and squashed the new commits. Let me know how things look now. |
jeromekelleher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @saurabhbelsare! I think there's a few things we can improve a little bit, but the basic functionality is here so I think we can merge and file some issues to track the rest.
|
Thanks @saurabhbelsare! |
This is the write_ms function discussed in #727 I have created a new ms.py file with the function, included the write_ms function header in trees.py, and created a test_ms.py in the tests directory. In line 118-125 of ms.py, I'm introducing a hard exit if the tree sequence contains anything incompatible with the ms format. Let me know if that is the optimal way to do it. Also, if there should be any other modifications in any of the functions. Thanks!