Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more flexible file designation for sailfish #5

Closed
stephenturner opened this issue Dec 9, 2015 · 6 comments
Closed

more flexible file designation for sailfish #5

stephenturner opened this issue Dec 9, 2015 · 6 comments

Comments

@stephenturner
Copy link

Trying to import sailfish results that aren't in the typical dir/quant.sf and dir/stats.tsv convention, because I've moved/renamed some things:

> sf <- tximport("data/sailfish-txps.txt", type="salmon", gene2tx=grch38_gt)
reading in files
1 Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'data/stats.tsv': No such file or directory

Coming from this bit of code:

tmp2 <- read.table(file.path(dirname(x), "stats.tsv"), 

tximport is strictly looking in the same path as the results file for stats.tsv. Suggest allowing explicitly specifying this file as an argument, but default back to this?

@mikelove
Copy link
Collaborator

mikelove commented Dec 9, 2015

Yeah ill have to rework this

I just commented it out for now

@rob-p
Copy link
Collaborator

rob-p commented Dec 9, 2015

Moving forward I may be able to help simplify this. The reason for the (somewhat tortured) structure of the output (e.g. effective lengths being in a separate file) was to maintain backward compatibility with prior versions of Sailfish & Salmon. However, if it would be helpful here (and / or in other contexts), starting in the next release I'd be willing to break backward compatibility of the output format and put the effective length (and any other useful information) directly into the quant.sf file. Also, I think it might be useful to remove the comment character # in front of the line that names the columns so that they can be read in more easily with the typical tools (e.g. read.table and pandas.read_table). The default could then be to read the effective lengths directly from the quant.sf file and fallback to this strategy if e.g. the input is from an older version of Sailfish or Salmon. Thoughts?

@roryk
Copy link
Contributor

roryk commented Dec 9, 2015

+1 for both of those suggestions.

@mikelove
Copy link
Collaborator

mikelove commented Dec 9, 2015

+1 for both as well.

Simplifying here will help me with something else i want to do, which is make it easy for users to swap in readr::read_table which is 50x faster

@stephenturner
Copy link
Author

I was also going to suggest readr. Faster, no stringsAsFactors, tbl_df goodness, etc.

@mikelove
Copy link
Collaborator

I think we're set here. Rather than go looking for stats.tsv, the effective length will be in the quant.sf file for future versions of Sailfish/Salmon, and tximport will now autodetect if its a old or new version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants