Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add transcript fastas to wiki #34

Closed
sr320 opened this issue Feb 2, 2024 · 12 comments
Closed

Add transcript fastas to wiki #34

sr320 opened this issue Feb 2, 2024 · 12 comments
Assignees

Comments

@sr320
Copy link
Member

sr320 commented Feb 2, 2024

https://github.com/urol-e5/deep-dive/wiki/Species-Characteristics-and-Genomic-Resources

@shedurkin
Copy link
Contributor

For P.evermanni, I'm pretty sure I have a functioning script to extract all the CDS lines from the gff, get fastas for each, and concatenate and label by parent! The only downside is it's quite slow (I don't know enough bash tricks to make it any more efficient) -- it's processed ~15% of the gff in the last hour, so it'll be running for the rest of the day. I'll let y'all know when it's done

@shedurkin
Copy link
Contributor

Okie dokie, the rendered code for generating a transcriptome fasta (and running kallisto) for P.evermanni, and the transcriptome fasta itself are both pushed to the deep-dive repo!

@kubu4
Copy link
Collaborator

kubu4 commented Feb 7, 2024

Nice work! Impressive.

I think your .md file didn't actually get re-rendered (commit is still from last week).

The HTML version got rendered, though:

https://htmlpreview.github.io/?https://github.com/urol-e5/deep-dive/blob/main/E-Peve/code/12-Peve-RNAseq-kallisto.html

I only glanced through the code, but figured I should ask this. Did you take into account that GFF files are 1-based (i.e. start is 1) and BED files are 0-based (i.e. start is 0)? Meaning, if using the GFF as input to bedtools getfasta, you should subtract 1 from the GFF coordinates so bedtools pulls out the proper sequence.

Admittedly, for alignment-free gene expression analysis, this is likely not an issue?

@shedurkin
Copy link
Contributor

shedurkin commented Feb 7, 2024

Hmm, i was just assuming that since the bedtools getfasta doc lists gff files as one of the accepted inputs it would distinguish bed and gff files and process them appropriately -- it sounds like that's not the case?

@kubu4
Copy link
Collaborator

kubu4 commented Feb 7, 2024

i was just assuming

Looks like you're not following the "golden rule" of bioinformatics... 😉

I'd definitely like to make the same assumption, but...

@shedurkin
Copy link
Contributor

haha good point -- it should be straightforward to add a gff -> bed conversion before generating the transcriptome, it'll just take a while to rerun

@shedurkin
Copy link
Contributor

I see bedops has a feature to convert gff to bed, and I saw in the handbook that it's been used before in the lab, but I don't see bedops in the /home/shared directory of tools on Raven -- is it stored somewhere else or would it need to be installed?

@kubu4
Copy link
Collaborator

kubu4 commented Feb 7, 2024

I've gone ahead and installed bedops on Raven:

/home/shared/bedops_linux_x86_64-v2.4.41/bin

@sr320
Copy link
Member Author

sr320 commented Feb 23, 2024

@zbengt @shedurkin status on this?

and if / when done, provide details on how file was selected / derived

@zbengt
Copy link
Contributor

zbengt commented Feb 23, 2024

Transcripts fastas for Pocillopora and Acropora are added to the wiki. Acropora is the transcripts fasta from NCBI (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_013753865.1/). Pocillopora is from the CDS on the Rutgers server (http://cyanophora.rutgers.edu/Pocillopora_meandrina/). These links are included in the wiki as well.

@sr320
Copy link
Member Author

sr320 commented Feb 25, 2024

@shedurkin can you finish up by adding evermanni?

@shedurkin
Copy link
Contributor

Added P.evermanni transcripts fasta, as well as links and download dates for original CDS gff and scaffolds fasta files and a link to the code used to generate the transcripts fasta: https://github.com/urol-e5/deep-dive/wiki/Species-Characteristics-and-Genomic-Resources#transcripts-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants