Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing sequence breaks? #161

Closed
trnpl opened this issue Oct 12, 2023 · 8 comments
Closed

Introducing sequence breaks? #161

trnpl opened this issue Oct 12, 2023 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@trnpl
Copy link

trnpl commented Oct 12, 2023

Hi! First, thanks for this excellent package -- it really generates beautiful plots and I appreciate that it works with eukaryotic datasets as well.

I am using this to generate some microsynteny plots for a locus I am working on. One of the odd features of this locus is that there is about ~10-15kb of non-coding region upstream of my GOI, which I would like to 'trim' so that I can make the useful information bigger. I am imagining a sequence break of at least 5kb or so in the middle of the 'contigs' that I show (see boxed parts of the screenshot below). I browsed the issues page and it seems like you were working on adding some tools to do this but I'm not sure if this particular use case has been implemented. I can likely hack something once I export the image as an svg but looking to hardcode as much of this as possible.
Screen Shot 2023-10-12 at 10 52 29 AM

Thanks for your help!!

@thackl
Copy link
Owner

thackl commented Oct 12, 2023

Interesting case. Do you have some (dummy) data for a reproducible example that I can play with?

@trnpl
Copy link
Author

trnpl commented Oct 12, 2023

Here is a file with info on the sequences and genes. Don't think links are really necessary here for this particular case.

seqsandgenes.zip

@thackl thackl self-assigned this Oct 12, 2023
@thackl thackl added the enhancement New feature or request label Oct 12, 2023
@thackl
Copy link
Owner

thackl commented Oct 12, 2023

How about something like this.

library(gggenomes)

s0 <- read_csv("seqs.csv")
g0 <- read_csv("genes.csv")
g0

# full loci
p0 <- gggenomes(g0, s0) +
  geom_seq() +
  geom_gene(aes(fill=str_sub(feature_id, 1,3)))

# magic zoom in on "default" which mean all genes
p1 <- p0 |> focus(.expand = 1e3) 
p1

# decorate sequence breaks
geom_break <- function(mapping_start = NULL, mapping_end = NULL,
    data_start = seqs(start > 1), data_end=seqs(end < length),
    label="//", size=4, hjust=.75, family="bold", stat="identity",
    na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ...){

  aes_start <- aes(x=x, y=y)
  aes_start <- gggenomes:::aes_intersect(mapping_start, aes_start)
  
  aes_end <- aes(x=xend, y=y)
  aes_end <- gggenomes:::aes_intersect(mapping_end, aes_end)

  list(
    geom_text(aes_start, data=data_start, label=label, size=size, hjust=hjust,
      family=family, ...),
    geom_text(aes_end, data=data_end, label=label, size=size, hjust=1-hjust,
      family=family, ...)
  )
}

p2 <- p1 + geom_break(label="/")
p2

library(patchwork)
p0 + p1 + p2 + plot_layout(guides="collect") & theme(legend.position = "bottom")
ggsave("break-at-long-non-coding.png", width=10, height=5)

break-at-long-non-coding

@thackl
Copy link
Owner

thackl commented Oct 12, 2023

oh, and in case you want to flip things :)

p3 <- p0 |> 
  focus(.expand=1e3) |> 
  flip(where(~any(str_sub(.x$feature_id,1,3) == "hea" & .x$strand == "-")), .bin_track = genes)
p3

image

@trnpl
Copy link
Author

trnpl commented Oct 13, 2023

Thanks! This I think is workable for my purposes :)

@trnpl trnpl closed this as completed Oct 13, 2023
@trnpl
Copy link
Author

trnpl commented Oct 13, 2023

Wait -- one more question actually. Is there a way to print the amount of nucleotides excluded in the breaks?

@trnpl trnpl reopened this Oct 13, 2023
@thackl
Copy link
Owner

thackl commented Oct 14, 2023

Two options:

  1. locate() does the same as focus, but instead of cutting seqs in the end, it adds a track with the coordinates of the things that focus would cut out. From that table, you can get exact coordinates an compute what was excluded (see code below)
p4 <- p0 |> 
  locate(.expand = 1e3) +
  geom_feat(position = position_nudge(y=-.3))
p4$data$feats$loci

# A tibble: 30 × 15
       y     x  xend bin_id   seq_id start   end     i locus_score locus_id locus_length
   <int> <dbl> <dbl> <chr>    <chr>  <dbl> <dbl> <int>       <int> <glue>          <dbl>
 1    13     0 21286 Dano_sm… Dano_…     1 21286     1           7 Dano_sm…        22101
 2    13 37943 53399 Dano_sm… Dano_… 37944 53399     2           3 Dano_sm…        15456
 3    14     0 20276 Dpal_sm… Dpal_…     1 20276     1           7 Dpal_sm…        21108
 4    14 35778 51267 Dpal_sm… Dpal_… 35779 51267     2           3 Dpal_sm…        15489
 5    12   323 17521 Dpan_sm… Dpan_…   324 17521     1           3 Dpan_sm…        17198
 6    12 33051 54735 Dpan_sm… Dpan_… 33052 54735     2           6 Dpan_sm…        22462
 7    15     0 24170 dana_sm… dana_…     1 24170     1           8 dana_sm…        24985
 8    15 38128 53604 dana_sm… dana_… 38129 53604     2           3 dana_sm…        15476
 9    11     0 16792 datr_sm… datr_…     1 16792     1           5 datr_sm…        17663
10    11 29803 45264 datr_sm… datr_… 29804 45264     2           3 datr_sm…        15461

image

  1. You can manually set the coordinates to focus() on via focus(.loci=<tibble(seq_id, start, end)>). With that you would know the sizes of gaps ahead of time.
# first 0-23k and 35-55k for first two seqs
loci <- tribble(
  ~seq_id, ~start, ~end,
  "dana_smallsynteny", 1, 23e3,
  "dana_smallsynteny", 35e3,55e3,
  "Dano_smallsynteny", 1, 23e3,
  "Dano_smallsynteny", 35e3,55e3
)

p5 <- p0 |> focus(.loci=loci) + geom_seq_label()
p5

image

@thackl thackl closed this as completed Oct 14, 2023
@trnpl
Copy link
Author

trnpl commented Oct 16, 2023

Extremely helpful, thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants