Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for unstranded data #32

Open
dominikburri opened this issue Nov 5, 2021 · 2 comments
Open

Support for unstranded data #32

dominikburri opened this issue Nov 5, 2021 · 2 comments
Labels
enhancement New feature or request future will not be fixed for NOW

Comments

@dominikburri
Copy link
Contributor

This issue is a transferal from the previous issue on gitlab (number 89).

Original description

It seems that the v0.1 milestone will support only stranded data.
We need to make sure that in later versions we support unstranded data as well.

Comments

@fgypas
It seems that we might not support it in the end, so closing for now.

@uniqueg
I disagree strongly and think it should be part of the published version. The very first test with external real-world data (SARS-related samples) demonstrated that there are still plenty of relevant unstranded libraries around. Besides, whether a library is stranded or not is not something that is (unfortunately) typically reported when uploading samples to SRA, so whenever anyone wants to run any sample from there, it's a gamble we will accept it (after checking first with another pipeline) or not. That's a really poor user experience...

@mzavolan
Well.. I think the generating unstranded data makes no sense, especially now. When analyzing such data people make choices that are not warranted and introduce errors for sure (e.g. cumulating reads from plus and minus strand, and of course, discarding regions where there is transcription in both directions. I don't want to spend time implementing and testing choices like this. How do you want to proceed?

@uniqueg
I understand the sentiment but then that argument applies more or less to any data that were obtained with inferior protocols or outdated technology. And to my taste that depends a little bit too much on personal opinion and what kind of resources you have access to. Unstranded data has been proven useful in the past, so I can't really see how analyzing them makes no sense, despite all the obvious drawbacks.
Anyway, how about we leave this issue open and defer the discussion until we know how to proceed with Rhea? If we want to allow our users to analyze samples straight from SRA, I think Rhea should handle the vast majority of samples on there, and that would probably mean it should handle unstranded libraries. If we don't care too much about that particular use case and mostly concern ourselves with how Rhea can serve ourselves, then we should probably drop it for the reasons mentioned.

@mkatsanto
This feature will not be implemented in the near future according to our scope. If the public requires it we can implement it in later versions.

@uniqueg
Do if reviewers ask for that, otherwise wait until users ask for it.

@dominikburri dominikburri added the enhancement New feature or request label Nov 5, 2021
@dominikburri
Copy link
Contributor Author

I created a new branch support-unstranded and commited 6f7f52c a first fix to support unstranded paired-end libraries.

The fix includes only the appropriate keywords for salmon and kallisto. The results for the samples I'm using seem good so far, the multiqc report seems fine in that the majority of reads map in STAR, salmon and kallisto.

ALFA is not yet corrected, it needs a new rule to properly work. Right now, ALFA runs and reports the biotypes, but as expected, half of the reads map to "opposite strand".
It needs a new rule because when running unstranded data, only one bedgraph file can be supplied.

Other tools and output files are not tested for correctness.

@mkatsanto mkatsanto added this to To do in ZARP Hackathon Mar 14, 2022
@mkatsanto mkatsanto removed this from To do in ZARP Hackathon Mar 14, 2022
@ninsch3000 ninsch3000 added the future will not be fixed for NOW label Apr 20, 2022
@mkatsanto mkatsanto self-assigned this Oct 28, 2022
@mkatsanto mkatsanto removed the future will not be fixed for NOW label Oct 28, 2022
@mkatsanto mkatsanto added this to the submission_related_updates milestone Oct 28, 2022
@mkatsanto
Copy link
Collaborator

Revisiting this issue:

Tasks to apply this feature

  • star_rpm rule : there is an option for unstranded
  • alfa can support unstranded data
  • build appropriate test
  • there is need for a subworkflow that will be unstranded specific

@mkatsanto mkatsanto removed this from the submission_related_updates milestone Nov 28, 2022
@mkatsanto mkatsanto added the future will not be fixed for NOW label Nov 28, 2022
@mkatsanto mkatsanto removed their assignment Nov 28, 2022
@ninsch3000 ninsch3000 added this to ideas in ZARP publication Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request future will not be fixed for NOW
Projects
No open projects
Development

No branches or pull requests

3 participants