read_seqs() function not working for me #171

Rikkiff · 2023-12-09T00:51:52Z

I have attempted to create a seq track using the function read_seqs() with either my .fna or .gff3 file as argument. But in any case get an empty tibble as object. Meanwhile, using read_gff3() function with my .gff3 file works just fine.

iimog · 2023-12-14T21:31:50Z

This is strange. Can you confirm that this returns a tibble with 6 rows:

read_seqs(ex("emales/emales.fna"))

If so, can you share your fasta file so I can have a look what's going wrong?

dmckeow · 2023-12-14T22:47:55Z

I am having the same issue using the emales example data. The resulting tibble has no information in it:
Reading in gff information works though

read_seqs(ex("emales/emales.fna"))

Reading'fasta' withread_seq_len():

* file_id: emales [C:/Users/Dean Mckeown/AppData/Local/R/win-library/4.2/gggenomes/extdata/emales/emales.fna]
# A tibble: 0 × 4
# ℹ 4 variables: file_id , seq_id , seq_desc , length

Rikkiff · 2023-12-15T10:33:51Z

Dear Markus, Thank you so much for your reply. When I run read_seqs(ex("emales/emales.fna")) the result is an empty tibble with four columns. Same thing happens when I run the function on my fasta. I instead managed to create a sequence track using read_fai(myfile.fasta.fai). I have attached my fasta file. Best, Rikki

…

On Thu, Dec 14, 2023 at 10:32 PM Markus J. Ankenbrand < ***@***.***> wrote: This is strange. Can you confirm that this returns a tibble with 6 rows: read_seqs(ex("emales/emales.fna")) If so, can you share your fasta file so I can have a look what's going wrong? — Reply to this email directly, view it on GitHub <#171 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASUI63RQTZ4VT4P5MILZG3LYJNV5BAVCNFSM6AAAAABANMH2FWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJWGY2TOMZUGI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

iimog · 2023-12-15T11:01:47Z

Thank you for checking. I can reproduce the problem on my Windows machine. On Linux it works as expected. My first guess, line endings, does not seem to cause the issue. I'll dig into it.

iimog · 2023-12-15T15:03:29Z

Sequences from fasta are internally processed by gggenomes via the perl script exec/seq-len. The problem is, that perl is not available on Windows by default. I'm not sure whether it would work if perl were available. The way it is invoked might not work on Windows at all. So the problem is not related to any specific fasta file. I don't see an easy fix to make the perl script working across platforms. It is probably easier to implement this functionality in R or using an R dependency (e.g. seqinr). What is your opinion @thackl ?

dmckeow · 2023-12-15T15:45:52Z

I think that I found a way around it.
Instead of read_fasta, I used read_fai on the .fai index of the fasta file, and I get the required tibble, and I could generate the visualisation. One column is missing, the "file_id", but that could be easily added.
I had to use the fasta index that I manually downloaded, as it seems to not be available via ex():

read_fai("C:/Users/Dean Mckeown/Downloads/emales/emales/emales.fna.seqkit.fai")

# A tibble: 33 × 3
seq_id seq_desc length

1 BVI_023A emale_type=EMALE05 is_typespecies=FALSE 19600
2 Cflag_131 emale_type=EMALE03 is_typespecies=FALSE 32544
3 RCC970_025 emale_type=EMALE05 is_typespecies=FALSE 20006
4 RCC970_122 emale_type=EMALE04 is_typespecies=FALSE 5473
5 BVI_055A emale_type=EMALE02 is_typespecies=FALSE 23989
6 BVI_055B emale_type=EMALE04 is_typespecies=TRUE 19849
7 Cflag_215 emale_type=EMALE04 is_typespecies=FALSE 12202
8 RCC970_016A emale_type=EMALE03 is_typespecies=TRUE 19438
9 RCC970_016B emale_type=EMALE01 is_typespecies=FALSE 20152
10 E4-10_053 emale_type=EMALE05 is_typespecies=FALSE 19840
# ℹ 23 more rows
# ℹ Use print(n = ...) to see more rows

Rikkiff · 2023-12-15T15:49:57Z

I think that I found a way around it. Instead of read_fasta, I used read_fai on the .fai index of the fasta file, and I get the required tibble, and I could generate the visualisation. One column is missing, the "file_id", but that could be easily added. I had to use the fasta index that I manually downloaded, as it seems to not be available via ex():

read_fai("C:/Users/Dean Mckeown/Downloads/emales/emales/emales.fna.seqkit.fai")

A tibble: 33 × 3 seq_id seq_desc length 1 BVI_023A emale_type=EMALE05 is_typespecies=FALSE 19600 2 Cflag_131 emale_type=EMALE03 is_typespecies=FALSE 32544 3 RCC970_025 emale_type=EMALE05 is_typespecies=FALSE 20006 4 RCC970_122 emale_type=EMALE04 is_typespecies=FALSE 5473 5 BVI_055A emale_type=EMALE02 is_typespecies=FALSE 23989 6 BVI_055B emale_type=EMALE04 is_typespecies=TRUE 19849 7 Cflag_215 emale_type=EMALE04 is_typespecies=FALSE 12202 8 RCC970_016A emale_type=EMALE03 is_typespecies=TRUE 19438 9 RCC970_016B emale_type=EMALE01 is_typespecies=FALSE 20152 10 E4-10_053 emale_type=EMALE05 is_typespecies=FALSE 19840 # ℹ 23 more rows # ℹ Use print(n = ...) to see more rows

See my earlier post for the same solution :)

iimog · 2023-12-18T07:32:33Z

Thank you, @Rikkiff and @dmckeow, for documenting your workarounds. I still hope to fix the read_seqs function on Windows or at least issue a warning rather than just returning an empty tibble.

iimog · 2024-07-04T06:38:16Z

read_seqs is implemented in R in the latest release (that is also available on CRAN 🎉). So this should no longer be an issue.

iimog closed this as completed Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_seqs() function not working for me #171

read_seqs() function not working for me #171

Rikkiff commented Dec 9, 2023

iimog commented Dec 14, 2023

dmckeow commented Dec 14, 2023 •

edited

Loading

Rikkiff commented Dec 15, 2023 via email

iimog commented Dec 15, 2023

iimog commented Dec 15, 2023

dmckeow commented Dec 15, 2023

Rikkiff commented Dec 15, 2023

iimog commented Dec 18, 2023

iimog commented Jul 4, 2024

read_seqs() function not working for me #171

read_seqs() function not working for me #171

Comments

Rikkiff commented Dec 9, 2023

iimog commented Dec 14, 2023

dmckeow commented Dec 14, 2023 • edited Loading

Rikkiff commented Dec 15, 2023 via email

iimog commented Dec 15, 2023

iimog commented Dec 15, 2023

dmckeow commented Dec 15, 2023

Rikkiff commented Dec 15, 2023

iimog commented Dec 18, 2023

iimog commented Jul 4, 2024

dmckeow commented Dec 14, 2023 •

edited

Loading