Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gtf file format error #26

Closed
yashsondhi opened this issue Sep 30, 2021 · 2 comments
Closed

Gtf file format error #26

yashsondhi opened this issue Sep 30, 2021 · 2 comments

Comments

@yashsondhi
Copy link

yashsondhi commented Sep 30, 2021

Hi Zhang,
I am having issues with the gtf file format, I assume this is something I could fix by changing the index? I have attached the output of the cluster run, but I am not sure where I should edit this parameter?

ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
The specified gene identifier attribute is 'gene_id'
An example of attributes included in your GTF annotation is 'transcript_id "evm.model.chr1.34";'
The program has to terminate.

First few lines of the gtf file

GWHABGR00000001 EVM transcript 15372 30018 . - . transcript_id "evm.model.chr1.1"; gene_id "evm.TU.chr1.1"
GWHABGR00000001 EVM exon 15372 15520 . - . transcript_id "evm.model.chr1.1"; gene_id "evm.TU.chr1.1";
GWHABGR00000001 EVM exon 16212 16351 . - . transcript_id "evm.model.chr1.1"; gene_id "evm.TU.chr1.1";
GWHABGR00000001 EVM exon 17501 17758 . - . transcript_id "evm.model.chr1.1"; gene_id "evm.TU.chr1.1";
GWHABGR00000001 EVM exon 18192 18405 . - . transcript_id "evm.model.chr1.1"; gene_id "evm.TU.chr1.1";
GWHABGR00000001 EVM exon 18529 18690 . - . transcript_id "evm.model.chr1.1"; gene_id "evm.TU.chr1.1";
GWHABGR00000001 EVM exon 20641 20838 . - . transcript_id "evm.model.chr1.1"; gene_id "evm.TU.chr1.1";
GWHABGR00000001 EVM exon 22769 22861 . - . transcript_id "evm.model.chr1.1"; gene_id "evm.TU.chr1.1";
GWHABGR00000001 EVM exon 23546 23685 . - . transcript_id "evm.model.chr1.1"; gene_id "evm.TU.chr1.1"
serial_test_9441360.log
;

@zhxiaokang
Copy link
Owner

Hi, as the error message explains, featureCounts expects the 9th field to be "gene_id" but in your gtf file it's "transcript_id". Since you're counting genes, so I suggest to simply remove the transcript part. So the following command should do the job:

cut -d" " -f1-8,11-12 old.gtf > new.gtf

@yashsondhi
Copy link
Author

yashsondhi commented Sep 30, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants