A few issues with `genomic_converter()` function #10

IdoBar · 2017-10-02T06:12:08Z

Hi Thierry,

Working with the package, mainly to clean, import and convert SNP data to different formats, I've been trying to use genomic_converter() function and came up with a few issues with its behaviour:

When exporting to SNPRelate format, it ignores the provided output filename and creates a date-signature based one (see related pull request).
Using vcf.metadata=TRUE argument with a VCF file resulted in an error (object DP not found).
Confusing inconsistency with function argument rules - blacklist.id argument can accept either a file or a data.frame object, while blacklist.genotype can only a filename containing a data.frame. I know it appears in the function documentation, but this inconsistency got me confused for a while until I double checked the fine details. I suggest making both arguments work with R objects, it makes much more sense than relying on files.
snp.ld lets you choose the first, last or random SNP, while to me it makes sense to allow choosing a SNP that is NOT first nor last, because the ones at the tag ends are often supported by fewer reads and are less usable in validation (if flanking primers are to be designed).

That's it for now, thanks, Ido

The text was updated successfully, but these errors were encountered:

thierrygosselin · 2017-10-02T13:52:26Z

Awesome, thanks Ido!

SNPRelate issue is fixed. However, the output option will be remove today with the next release, see my comment on this.
vcf.metadata=TRUE I've seen this problem yesterday, and I think it originate from my last fix trying to overcome the problem with metadata provided by stacks. The GL is still present in the vcf header but not in the format filed of each genotype. The packages I'm using vcfR and pegas are confused by this. I thought I had fix the issue with remaining arguments, but will have to do further test.
blacklist.id and blacklist.genotype: good catch! I was currently incrementally including the functionality to have an object or a file! I was about to test with blacklist.genotype. I've finished the test with whitelist.markers (it's not in the doc, but works with object...). This will be in the next release today.
snp.ld: I'll implement something. Although, having more than 2-3 SNPs on a 100pb read is not really a good sign it will be a nice addition to test. What would be the default behaviour if you only have 2 SNPs ? use the first or the last one ? (I would opt for the first). And if more than 3 SNPs are present use one at random.

thanks
Thierry

thierrygosselin · 2017-10-02T20:14:24Z

Hi Ido,

SNPRelate : done
vcf.metadata = TRUE: fixed
blacklist.id, blacklist.genotype and whitelist.markers : all behaving the same now.
snp.ld: a new option is implemented and called middle. Details in the function doc.

Cheers
Thierry

IdoBar · 2017-10-03T09:44:12Z

The new version (0.0.6), now completely fails to import data while applying filters, with the following error:

Error in UseMethod("filter_") : 
   no applicable method for 'filter_' applied to an object of class "character"

I suspect it has something to do with the changes made to accommodate blacklist.genotype as a data.frame, but I haven't looked into that yet.

Reverting back to the older version for now.
Cheers, Ido

thierrygosselin · 2017-10-03T16:18:35Z

Currently checking this along another problem I've detected.

This mainly affect VCF file.
To have unique markers I am combining CHROM__LOCUS__POS into MARKERS column.
The separator used is 2 underscores (it's the only one I've found that doesn't interfere with other package).This is what is currently used in
radiator and grur to export whitelists.

Since stacks version 1.44, the position of the SNP on the haplotype/read is included in the ID column in VCF file. Now the ID column is no longer unique and no longer correspond to LOCUS, the column requires parsing to get back to the LOCUS info. Which is really a pain. This should have been included in the POS column (the problem was raised on google group).

The whitelists and blacklists created were intended to be used in R with the packages and a tidy dataset and before this stacks update, it could also be used with a stacks vcf file.

I suspect the problem is related to whitelist, blacklist and blacklist.genotype with locus info that are used back to filter the VCF file and not the tidy dataset. I'll have to check this...

Otherwise, using a tidy dataset it works as intended.
Thierry

thierrygosselin · 2017-10-03T19:35:29Z

Works with the latest commit
Re-open the issue if you have problem

Best
Thierry

thierrygosselin closed this as completed Oct 2, 2017

thierrygosselin reopened this Oct 3, 2017

thierrygosselin closed this as completed Oct 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A few issues with `genomic_converter()` function #10

A few issues with `genomic_converter()` function #10

IdoBar commented Oct 2, 2017

thierrygosselin commented Oct 2, 2017

thierrygosselin commented Oct 2, 2017

IdoBar commented Oct 3, 2017

thierrygosselin commented Oct 3, 2017

thierrygosselin commented Oct 3, 2017

A few issues with genomic_converter() function #10

A few issues with genomic_converter() function #10

Comments

IdoBar commented Oct 2, 2017

thierrygosselin commented Oct 2, 2017

thierrygosselin commented Oct 2, 2017

IdoBar commented Oct 3, 2017

thierrygosselin commented Oct 3, 2017

thierrygosselin commented Oct 3, 2017

A few issues with `genomic_converter()` function #10

A few issues with `genomic_converter()` function #10