Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

order of the genome names #26

Closed
ramadatta opened this issue Jul 2, 2020 · 5 comments
Closed

order of the genome names #26

ramadatta opened this issue Jul 2, 2020 · 5 comments

Comments

@ramadatta
Copy link

ramadatta commented Jul 2, 2020

Hi,

Thanks for wonderful package.

I am trying to align according to a specific gene using make_alignment_dummies but the order of the genomes in the final figure are sorted by genome names. This results me a figure, where a few genomes on top are similar and a few genomes with the same gene contents due to different names are sorted by names are seen at the bottom.

Is there a way to sort the genomes based on the gene content? or else is there a way ggplot can plot the data exactly in the same order passed by dummies and example_genes if have sorted the contents?

Thanks much!

@wilkox
Copy link
Owner

wilkox commented Jul 4, 2020

Hi ramadatta, you can set the order of the genomes (or indeed any categorical variable mapped to an axis) by making that variable a factor.

If this doesn't work or you need some more help, could you post the code you already have as a reprex?

@ramadatta
Copy link
Author

ramadatta commented Jul 6, 2020

Thanks much @wilkox . Will come back to you soon on this.

@ramadatta
Copy link
Author

ramadatta commented Jul 7, 2020

library(ggplot2)
library(gggenes)

dummies <- make_alignment_dummies(
  example_genes,
  aes(xmin = start, xmax = end, y = molecule, id = gene),
  on = "genE"
)

ggplot(example_genes, aes(xmin = start, xmax = end, y = factor(molecule), fill = gene)) +
  geom_gene_arrow() +
  geom_blank(data = dummies) +
  facet_wrap(~ molecule, scales = "free", ncol = 1) +
  scale_fill_brewer(palette = "Set3") +
  theme_genes()

# Creating a presence/absence matrix for example genes

PA_matrix <- as.data.frame(with(example_genes, table(molecule, gene)) > 0L) +0L
PA_matrix
#>         genA genB genC genD genE genF protA protB protC protD protE protF
#> Genome1    1    1    1    1    1    1     0     0     1     1     1     1
#> Genome2    1    1    1    1    1    1     1     1     0     0     0     0
#> Genome3    1    1    1    1    1    1     1     1     0     0     0     0
#> Genome4    1    1    1    1    1    1     0     0     0     0     0     0
#> Genome5    1    1    1    1    1    1     0     0     1     1     1     1
#> Genome6    1    1    1    1    1    1     1     1     0     0     0     0
#> Genome7    0    1    1    1    1    1     1     1     1     1     1     1
#> Genome8    0    1    1    1    1    1     1     1     1     1     1     1

# Sorting the presence/absence matrix for example genes

sorted_PA_matrix <- PA_matrix[do.call(order,as.data.frame(PA_matrix)),]
sorted_PA_matrix
#>         genA genB genC genD genE genF protA protB protC protD protE protF
#> Genome7    0    1    1    1    1    1     1     1     1     1     1     1
#> Genome8    0    1    1    1    1    1     1     1     1     1     1     1
#> Genome4    1    1    1    1    1    1     0     0     0     0     0     0
#> Genome1    1    1    1    1    1    1     0     0     1     1     1     1
#> Genome5    1    1    1    1    1    1     0     0     1     1     1     1
#> Genome2    1    1    1    1    1    1     1     1     0     0     0     0
#> Genome3    1    1    1    1    1    1     1     1     0     0     0     0
#> Genome6    1    1    1    1    1    1     1     1     0     0     0     0

sorted_genomes <- row.names(sorted_PA_matrix)
sorted_genomes
#> [1] "Genome7" "Genome8" "Genome4" "Genome1" "Genome5" "Genome2" "Genome3"
#> [8] "Genome6"

# Creating sorted_dummies and sorted_example_genes which the final output figure should reflect
sorted_dummies <- dummies[order(unlist(sapply(dummies$molecule, function(x) which(sorted_genomes == x)))),]
sorted_example_genes <- example_genes[order(unlist(sapply(example_genes$molecule, function(x) which(sorted_genomes == x)))),]

#head(example_genes)
#head(sorted_example_genes)

ggplot(sorted_example_genes, aes(xmin = start, xmax = end, y = factor(molecule), fill = gene)) +
  geom_gene_arrow() +
  geom_blank(data = sorted_dummies) +
  facet_wrap(~ molecule, scales = "free", ncol = 1) +
  scale_fill_brewer(palette = "Set3") +
  theme_genes()

Hi @wilkox ,

Thank you very much for passing the reprex link. It was useful.

Accordingly, I need the final output figure above in the order of the genomes found in the "sorted_PA_matrix". Passing sorted_examples_genes and sorted_dummies and using factor(molecule), could not help me generate the correct order of genomes intended. Can request to know if I am missing something here? Many thanks in advance.

@wilkox
Copy link
Owner

wilkox commented Jul 8, 2020

library(ggplot2)
library(gggenes)

dummies <- make_alignment_dummies(
  example_genes,
  aes(xmin = start, xmax = end, y = molecule, id = gene),
  on = "genE"
)

# Creating a presence/absence matrix for example genes
PA_matrix <- as.data.frame(with(example_genes, table(molecule, gene)) > 0L) +0L

# Sorting the presence/absence matrix for example genes
sorted_PA_matrix <- PA_matrix[do.call(order,as.data.frame(PA_matrix)),]

sorted_genomes <- row.names(sorted_PA_matrix)

# Creating sorted_dummies and sorted_example_genes which the final output figure should reflect
sorted_dummies <- dummies[order(unlist(sapply(dummies$molecule, function(x) which(sorted_genomes == x)))),]
sorted_example_genes <- example_genes[order(unlist(sapply(example_genes$molecule, function(x) which(sorted_genomes == x)))),]

# Convert molecule variable to a factor
sorted_example_genes$molecule <- factor(sorted_example_genes$molecule, levels = unique(sorted_example_genes$molecule))
sorted_dummies$molecule <- factor(sorted_dummies$molecule, levels = unique(sorted_dummies$molecule))

ggplot(sorted_example_genes, aes(xmin = start, xmax = end, y = factor(molecule), fill = gene)) +
  geom_gene_arrow() +
  geom_blank(data = sorted_dummies) +
  facet_wrap(~ molecule, scales = "free", ncol = 1) +
  scale_fill_brewer(palette = "Set3") +
  theme_genes()

Created on 2020-07-08 by the reprex package (v0.3.0.9001)

@ramadatta
Copy link
Author

Thank you so much for this @wilkox. This is what I needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants