Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tximeta "couldn't find matching transcriptome, returning un-ranged SummarizedExperiment" #38

Open
mikelove opened this issue Jun 24, 2020 · 27 comments

Comments

@mikelove
Copy link
Collaborator

mikelove commented Jun 24, 2020

Note to users:

tximeta is updated with the Bioconductor release cycle (every 6 months). Older, non-release versions of tximeta cannot be modified (there are frozen once the release/devel cycle moves on). So if on your machine you are using a version of tximeta that is not the release version (or the devel version, hosted here), it will not be able to recognize the latest releases from various sources (GENCODE, Ensembl, etc.).

First, check your version:

packageVersion("tximeta")

Then, you can examine in what version specific txomes are added here:

https://github.com/mikelove/tximeta/blob/master/NEWS

If you have a discrepancy, you should update Bioconductor and tximeta to the latest version in order to match with the latest releases in this table:

https://bioconductor.org/packages/release/bioc/vignettes/tximeta/inst/doc/tximeta.html#Pre-computed_checksums

@MustafaElshani
Copy link

MustafaElshani commented Aug 30, 2020

Hi Mike

Updated both tximeta and bio-conductor however still got the message

importing quantifications reading in files with read_tsv 1 2 3 4 5 6 7 8 couldn't find matching transcriptome, returning non-ranged SummarizedExperiment

using gencode.v35 and run salmon on alignment-based mode

@MustafaElshani
Copy link

MustafaElshani commented Aug 30, 2020

I think my issue is that I am using the alignment-based mode (which I require) in salmon and this does not have the "index":"/path/to/genecode.v35_salmon_1.3.0" metadata in cmd_info.json file of counts, as this mode does not require the index. I guess that this is where tximeta gets the infomration from.

for aligned-based method salmon produces a "target": "path/to/ReferenceGenome/gencode.v35.transcripts.fa" maybe this can be used added as a fix later updates?
I'm not entirely sure

Meanwhile in a addition to the above i did created salmon index for gencore.v35 and got the hash strings from the info.json and added them to the meta_info.json to each of the counts and tximeta worked.
Maybe not the best was way to go about but it worked

@mikelove
Copy link
Collaborator Author

This is a perfectly valid solution.

I’m not sure if the target file would have the same hash as the transcripts from the source. Can you check for your example? For now our hash is sensitive to sequence order for example.

@MustafaElshani
Copy link

MustafaElshani commented Aug 31, 2020

How would you go about getting the checksum of the source file. The file I used on - t on salmon quant was downloaded from GENCODE ftp, however this does not input the hashs onto meta_info.json. Does salmon need to index to retrieve the hash or can it do it form the target file ?

Kind Regards

@mikelove
Copy link
Collaborator Author

You could run salmon index on the file and then look into the directory that is created to find the hash.

The lightweight version is to run compute_fasta_digest which can be installed with pip. This is what I use to compute reference transcriptome hashes.

https://github.com/COMBINE-lab/FastaDigest

@MustafaElshani
Copy link

Apologies in delay in replying yes running salmon index on the same reference does indeed have the same hash. and so does the fasta digest . Just wondering in salmon alignment-based how could this be automated

Mustafa

@mikelove
Copy link
Collaborator Author

mikelove commented Sep 4, 2020

Let me ask @rob-p, is it possible to have an option to index the target file as part of quant? Indexing is fairly fast, and then reads quantified with the alignment mode would also benefit from tximeta magic.

@mbergins
Copy link

Note to users:

tximeta is updated with the Bioconductor release cycle (every 6 months). Older, non-release versions of tximeta cannot be modified (there are frozen once the release/devel cycle moves on). So if on your machine you are using a version of tximeta that is not the release version (or the devel version, hosted here), it will not be able to recognize the latest releases from various sources (GENCODE, Ensembl, etc.). You should update Bioconductor and tximeta to the latest version in order to match with the latest releases in this table:

https://bioconductor.org/packages/release/bioc/vignettes/tximeta/inst/doc/tximeta.html#pre-computed_checksums

Just a quick note, looks like the direct link to the Pre-computed checksums table isn't working because the P in pre-computed isn't capitalized. I think the corrected link should be:

https://bioconductor.org/packages/release/bioc/vignettes/tximeta/inst/doc/tximeta.html#Pre-computed_checksums

Feel free to delete this comment after editing the original post and thanks for all the work on tximeta.

@mikelove
Copy link
Collaborator Author

Thanks @mbergins

@Ermela1
Copy link

Ermela1 commented May 14, 2021

Hello,

I am running into a similar issue with tximeta I think my issue is that i used v38 of gencode for the transcriptome and its not part of the pre-computed checksums mentioned in the table. I am wondering if there is a workaround this until tximeta is updated?

Thank you,
Ermela

@mikelove
Copy link
Collaborator Author

What is your

packageVersion("tximeta")

Your first place to check is the NEWS file.

If you have a version less than when it was added, then your local version of the package won't autodetect:

https://github.com/mikelove/tximeta/blob/master/NEWS#L19-L21

@Ermela1
Copy link

Ermela1 commented May 14, 2021

My bad! I have an older version

package.version("tximeta")
[1] "1.8.5"

Thank you so much for your help! I will make sure to check the news next time.

@varunviswanath
Copy link

varunviswanath commented Aug 17, 2021

Hi Mike,

After generating quantification data and attempting to run tximeta, I keep encountering this error.

did not find matching TxDb via 'AnnotationHub'
building TxDb with 'GenomicFeatures' package
Import genomic features from the file as a GRanges object ... trying URL 'ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/gencode.v38.annotation.gtf.gz'
Content type 'unknown' length 46556621 bytes (44.4 MB)

Error in download.file(resource(con), destfile) :
cannot open URL 'ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/gencode.v38.annotation.gtf.gz'
In addition: Warning messages:
1: In download.file(resource(con), destfile) :
downloaded length 11213312 != reported length 46556621
2: In download.file(resource(con), destfile) :
URL 'ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/gencode.v38.annotation.gtf.gz': Timeout of 60 seconds was reached

I'm wondering why there is no matching TxDb found, or why the download will not proceed after multiple attempts.

Thanks.

@mikelove
Copy link
Collaborator Author

URL 'ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/gencode.v38.annotation.gtf.gz': Timeout of 60 seconds was reached

This means that your connection was too slow to download this resource from EBI. Is it possible for you to try again on a faster connection?

@pkerrwall
Copy link

I'm getting the same error - "couldn't find matching transcriptome, returning non-ranged SummarizedExperiment"
Output of packageVersion("tximeta") = '1.10.0'
I'm using source="Ensembl", organism="Drosophila melanogaster", release="104", genome="BDGP6.32"
I tried the makeLinkedTxome discussed in the tximeta vignette with no luck
I am quantifying in alignment-based mode (used minimap2) and see that MustafaElshani figured out a work around but I don't understant what he did and can't implement. I did create a salmon index and tried to supply it to makeLinkedTxome, but still getting the same error.

@mikelove
Copy link
Collaborator Author

Hi @pkerrwall

If you see my comment from 9/4/2020:

#38 (comment)

...I don't think you have a seqhash in the quantification metadata files in alignment mode (there is no transcriptome to index, right?). Hence there is nothing to match on. I had a proposal, we can bring this up with @rob-p and see his thoughts. If one wanted to have the transcriptome hash be included in quant with alignment mode, one option would be to point salmon quant to a Salmon indexed txome, not used for quant but only for the metadata. Curious everyone's thoughts.

@pkerrwall
Copy link

pkerrwall commented Aug 28, 2021

Thanks Mike for the quick reply. I'm not sure I understand your reply about the seqhash in the quantification metadata files? I guess we will wait for @rob-p thoughts on this error

To give a little background - I'm running into this error trying to run swish from the fishpond package (just following the swish vignette). I ran into an issue trying to run drimseq with the flybase gtf and ended up just providing my own gene to transcript mapping file and got it to work. Is there a way to manually provide the gene to transcript mapping file for swish? Is this the only metadata that swish needs?

I also tried the following (from the tximeta vignette):

-# to load from local source
indexDir <- file.path('/home/shared/db/Dmel/ensembl/Drosophila_melanogaster.BDGP6.32.cdna.all.fa_salmon_index') # still generated a salmon index even though I'm not using this because I use alignment-based mode
fastaPath <- file.path('/home/shared/db/Dmel/ensembl/Drosophila_melanogaster.BDGP6.32.cdna.all.fa')
gtfPath <- file.path('/home/shared/db/Dmel/ensembl/Drosophila_melanogaster.BDGP6.32.104.gtf')
suppressPackageStartupMessages(library(tximeta))
makeLinkedTxome(indexDir=indexDir, source="Ensembl", organism="Drosophila melanogaster", release="104", genome="BDGP6.32", fasta=fastaPath, gtf=gtfPath, write=FALSE)

-# Read in quants with tximeta
library(tximeta)
-# both the following lines generate error "couldn't find matching transcriptome, returning non-ranged SummarizedExperiment"
-#se <- tximeta(coldata) # not working
se <- tximeta(coldata, dropInfReps=TRUE, useHub=FALSE) # not working

@mikelove
Copy link
Collaborator Author

Oh, if you just want to combine transcripts to gene, I believe you can do:

se <- tximeta(coldata, skipMeta=TRUE, txOut=FALSE, tx2gene=tx2gene)

You need the inferential replicates to run swish, so don't use dropInfReps.

Let me know how this goes. If it works (and I think it should) I should add this to the tximeta/swish vignettes.

@pkerrwall
Copy link

That worked!

I'm now getting an error at
y <- scaleInfReps(y)
Error in infRepError(infRepIdx) : there are no inferential replicates in the assays of 'y'

This is the same error as #35 (comment)
Should I create an issue at https://github.com/mikelove/fishpond/issues for this issue? Before I do, I will read the vignette a little more closely regarding the inferential replicates as you directed the other person in that thread.

@mikelove
Copy link
Collaborator Author

This means that you need to have run Salmon with Gibbs samples or bootstraps (a requirement for Swish).

For future questions feel free to post to Bioc support site and tag eg tximeta or fishpond (whichever is relevant or both).

I’ll add these details to the vignettes.

@pkerrwall
Copy link

Thanks for your help with this. Good to know about the gibbs sampling & boostrap options for salmon and that they are a requirement for swish. You might want to add some simple salmon examples at the beginning of the swish vignette that show how to do this. Thanks for all your hard work in this area!

@mikelove
Copy link
Collaborator Author

We do have at the beginning, “Importantly, --numGibbsSamples 20 was used to generate 20 inferential replicates with Salmon’s Gibbs sampling procedure. Inferential replicates, either from Gibbs sampling or bootstrapping of reads, are required for the swish method shown below.”

but maybe this needs to be in Quick Start

@pkerrwall
Copy link

Thanks for pointing that out - I guess I never made it that far down the page :) Yeah - having that message at the beginning of the quick start would be helpful for idiots like me :)

@mikelove
Copy link
Collaborator Author

Thanks for the feedback @pkerrwall I've updated both vignettes to provide more information as discussed above.

@Pancreas-Pratik
Copy link

Pancreas-Pratik commented Sep 16, 2022

Note to users:

tximeta is updated with the Bioconductor release cycle (every 6 months). Older, non-release versions of tximeta cannot be modified (there are frozen once the release/devel cycle moves on). So if on your machine you are using a version of tximeta that is not the release version (or the devel version, hosted here), it will not be able to recognize the latest releases from various sources (GENCODE, Ensembl, etc.).

First, check your version:

packageVersion("tximeta")

Then, you can examine in what version specific txomes are added here:

https://github.com/mikelove/tximeta/blob/master/NEWS

If you have a discrepancy, you should update Bioconductor and tximeta to the latest version in order to match with the latest releases in this table:

https://bioconductor.org/packages/release/bioc/vignettes/tximeta/inst/doc/tximeta.html#Pre-computed_checksums

Checking the NEWS: https://github.com/mikelove/tximeta/blob/master/NEWS
was just "the key" for me. I realized I had the Bioconductor release version 1.14 or so, I needed the newest 1.15 which has the mouse M30. Ahhh - mazing. Thank you Dr. Love!

EDIT1: just tried with devel, didn't work, but realized, that I may have downloaded the gtf and fa from different sources (ENSEMBL and GENCODE)... So I went through the tximeta code, and found these lines:

hashfile <- file.path(system.file("extdata",package="tximeta"),"hashtable.csv")
hashtable <- read.csv(hashfile,stringsAsFactors=FALSE)

Plugged them into RStudio to and found M30 to see what the hashes were, and the ones in the hashtable above did not match with me. It was sweet that the gtf and fa links from the ebi are included. Going to re-run the pipeline with those two files from ebi!

@MolyWang
Copy link

MolyWang commented Jun 5, 2023

New to fishpond, getting the same message for a slightly different reason and looking for help here!
I feel the issue rises from file transfer. The .json files no longer have accurate paths to everything.

R version: 4.2.0
package.version("tximeta")
[1] "1.14.1"
package.version("fishpond")
[1] "2.2.0"

What I did:

  1. I uploaded sequencing files to the Niagara cluster of Compute Canada.
  2. Generated salmon index with decoy following the Salmon tutorial (https://combine-lab.github.io/alevin-tutorial/2019/selective-alignment/) with the command "salmon-1.9.0/bin/salmon index -t <Dmel_gentrome.fa.gz> -d <Dmel_decoys.txt> -p 12 -i <salmon_index_SAF DIR>"
  3. ran salmon 1.9.0 with "salmon quant -i <salmon_index_SAF DIR> -l A -1 <Paired reads 1> -2 <Paired reads 2> --useEM --mimicBT2 --validateMappings --numGibbsSamples 30 --gcBias -o salmon_out/IP_1_S1"
  4. I then downloaded the <salmon_index_SAF_DIR> and salmon outputs <salmon_out> to local and tried to analyze the with fishpond. (downloading because I could not figure out how to do it on the Niagara cluster. The salmon folder is complete, the folder for each sample contains the aux_info, libParams, quant.sf etc)

Question: can I somehow let tximeta know the index folder that was used for salmon quantification? Or is there more to change for tximeta to work? (I've tried to update the "index":___ inside cmd_info.json, but that does not seem to help.)

Lots lots of thanks.

-------------- in case needed, here is files and commands I used to generate Salmon index -----
From flybase, genome file and transcriptome file from release 6.23
http://ftp.flybase.net/genomes/Drosophila_melanogaster/dmel_r6.23_FB2018_04/fasta/
1. dmel-all-chromosome-r6.23.fasta.gz
2. dmel-all-transcript-r6.23.fasta.gz

grep "^>" <(gunzip -c dmel-all-chromosome-r6.23.fasta.gz) | cut -d " " -f 1 > Dmel_decoys.txt
sed -i.bak -e 's/>//g' Dmel_decoys.txt
cat dmel-all-transcript-r6.23.fasta.gz dmel-all-chromosome-r6.23.fasta.gz > Dmel_gentrome.fa.gz
salmon-1.9.0/bin/salmon index -t <Dmel_gentrome.fa.gz> -d <Dmel_decoys.txt> -p 12 -i <salmon_index_SAF DIR>

@mikelove
Copy link
Collaborator Author

mikelove commented Jun 5, 2023

Tximeta will "work" for your case that it will generate an un-ranged SummarizedExperiment. As you are not using GENCODE, Ensembl or RefSeq, it won't automatically download the matching transcriptome metadata. This is all you need to continue analysis with fishpond, etc.

However, if you want to have tximeta populate genomic ranges on the SummarizedExperiment, we have developed tools for this. You obviously need to provide the ranges of the transcripts / genes, which would involve having a custom GTF file.

If you have this already, you can follow the linkedTxome instructions in the vignette to have tximeta populate the ranges, but again it's not necessary for using fishpond to have ranges metadata on the SummarizedExperiment.

Perhaps if you have follow-up questions you could post here:

https://support.bioconductor.org

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants