Open
Description
readgff
tries to read sequences as DNA sequences, therefore it fails when reading files containing protein sequences.
Input
GFF file containing a protein sequence downloaded from the ELM database: http://elm.eu.org/downloads.html
Link to the file: http://elm.eu.org/instances.gff?q=SRC_HUMAN
##gff-version 3
P12931 ELM sequence_feature 530 534 . . . ID=LIG_SH2_SFK_CTail_3
P12931 ELM sequence_feature 252 259 . . . ID=LIG_SH3_4
P12931 ELM sequence_feature 72 78 . . . ID=MOD_CDK_SPxK_1
P12931 ELM sequence_feature 1 7 . . . ID=MOD_NMyristoyl
P12931 ELM sequence_feature 526 534 . . . ID=MOD_TYR_CSK
##FASTA
>P12931
MGSNKSKPKDASQRRRSLEPAENVHGAGGGAFPASQTPSKPASADGHRGPSAAFAPAAAEPKLFGGFNSSDTVTSPQRAGPLAGGVTTFVALYDYESRTETDLSFKKGERLQIVNNTEGDWWLAHSLSTGQTGYIPSNYVAPSDSIQAEEWYFGKITRRESERLLLNAENPRGTFLVRESETTKGAYCLSVSDFDNAKGLNVKHYKIRKLDSGGFYITSRTQFNSLQQLVAYYSKHADGLCHRLTTVCPTSKPQTQGLAKDAWEIPRESLRLEVKLGQGCFGEVWMGTWNGTTRVAIKTLKPGTMSPEAFLQEAQVMKKLRHEKLVQLYAVVSEEPIYIVTEYMSKGSLLDFLKGETGKYLRLPQLVDMAAQIASGMAYVERMNYVHRDLRAANILVGENLVCKVADFGLARLIEDNEYTARQGAKFPIKWTAPEAALYGRFTIKSDVWSFGILLTELTTKGRVPYPGMVNREVLDQVERGYRMPCPPECPESLHDLMCQCWRKEPEERPTFEYLQAFLEDYFTSTEPQYQPGENL
Output
julia> p = readgff("elm_instances.gff")
MGSNKSKPKDASQRRRSLEPAENVHGAGGGAFPASQTPSKPASADGHRGPSAAFAPAAAEPKLFGGFNSSDTVTSPQRAGPLAGGVTTFVALYDYESRTETDLSFKKGERLQIVNNTEGDWWLAHSLSTGQTGYIPSNYVAPSDSIQAEEWYFGKITRRESERLLLNAENPRGTFLVRESETTKGAYCLSVSDFDNAKGLNVKHYKIRKLDSGGFYITSRTQFNSLQQLVAYYSKHADGLCHRLTTVCPTSKPQTQGLAKDAWEIPRESLRLEVKLGQGCFGEVWMGTWNGTTRVAIKTLKPGTMSPEAFLQEAQVMKKLRHEKLVQLYAVVSEEPIYIVTEYMSKGSLLDFLKGETGKYLRLPQLVDMAAQIASGMAYVERMNYVHRDLRAANILVGENLVCKVADFGLARLIEDNEYTARQGAKFPIKWTAPEAALYGRFTIKSDVWSFGILLTELTTKGRVPYPGMVNREVLDQVERGYRMPCPPECPESLHDLMCQCWRKEPEERPTFEYLQAFLEDYFTSTEPQYQPGENL
ERROR: Cannot encode byte 0x50 (char 'P') at index 8 to BioSequences.DNAAlphabet{4}()
Stacktrace:
[1] error(s::String)
@ Base .\error.jl:35
[2] throw_encode_error(A::BioSequences.DNAAlphabet{4}, src::Base.CodeUnits{UInt8, String}, soff::Int64)
@ BioSequences C:\Users\dz272503\.julia\packages\BioSequences\Mf23T\src\longsequences\copying.jl:164
[3] encode_chunk
@ C:\Users\dz272503\.julia\packages\BioSequences\Mf23T\src\longsequences\copying.jl:178 [inlined]
[4] encode_chunks!(dst::BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}, startindex::Int64, src::Base.CodeUnits{UInt8, String}, soff::Int64, N::Int64)
@ BioSequences C:\Users\dz272503\.julia\packages\BioSequences\Mf23T\src\longsequences\copying.jl:189
[5] LongSequence
@ C:\Users\dz272503\.julia\packages\BioSequences\Mf23T\src\longsequences\constructors.jl:97 [inlined]
[6] LongSequence
@ C:\Users\dz272503\.julia\packages\BioSequences\Mf23T\src\longsequences\constructors.jl:85 [inlined]
[7] parsechromosome!(input::TranscodingStreams.NoopStream{IOStream}, record::GenomicAnnotations.Record{Gene})
@ GenomicAnnotations.GFF C:\Users\dz272503\.julia\packages\GenomicAnnotations\37yeV\src\GFF\reader.jl:133
[8] tryread!
@ C:\Users\dz272503\.julia\packages\GenomicAnnotations\37yeV\src\GFF\reader.jl:49 [inlined]
[9] iterate(reader::GenomicAnnotations.GFF.Reader{TranscodingStreams.NoopStream{IOStream}}, nextone::GenomicAnnotations.Record{Gene})
@ GenomicAnnotations.GFF C:\Users\dz272503\.julia\packages\GenomicAnnotations\37yeV\src\GFF\reader.jl:42
[10] _collect(cont::UnitRange{Int64}, itr::GenomicAnnotations.GFF.Reader{TranscodingStreams.NoopStream{IOStream}}, ::Base.HasEltype, isz::Base.SizeUnknown)
@ Base .\array.jl:727
[11] collect
@ .\array.jl:716 [inlined]
[12] readgff(input::String)
@ GenomicAnnotations C:\Users\dz272503\.julia\packages\GenomicAnnotations\37yeV\src\utils.jl:87
[13] top-level scope
@ REPL[5]:1
Version
julia> versioninfo()
Julia Version 1.11.3
Commit d63adeda50 (2025-01-21 19:42 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 12 × 13th Gen Intel(R) Core(TM) i7-1365U
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, goldmont)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
(Downloads) pkg> st
Status `C:\Users\dz272503\Downloads\Project.toml`
[4f8a0a0a] GenomicAnnotations v0.4.5
Metadata
Metadata
Assignees
Labels
No labels