Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

circlepack layout struggles with large number of nodes #345

Closed
jhjlee opened this issue Jun 13, 2023 · 5 comments
Closed

circlepack layout struggles with large number of nodes #345

jhjlee opened this issue Jun 13, 2023 · 5 comments

Comments

@jhjlee
Copy link

jhjlee commented Jun 13, 2023

Hi Thomas,

Thanks for a great tool!
I have a question about the circlepack layout that I'm trying to use on my data that has a single level of hierarchy (hundreds of groups, ~15 groups containing 100 or more nodes). When trying to plot, it runs for a while, and eventually throws a "enc3 error" , which I believe is related to the size of the enclosing circle?
Is there an upper limit to the number of nodes that can be included in a circlepack plot? Thank you very much.

@schochastics
Copy link
Contributor

Can you post a reproducible example? That would make it easier to check what is going on

@dn-ra
Copy link

dn-ra commented Jul 25, 2023

Hi David,

I'm working with @jhjlee on this data.

Here is a reproducible example that approximates the data we have.

Unfortunately it isn't consistently reproducible. I can run it maybe 5 times, and twice it will execute properly, once it will produce an empty plot, and twice it will result in the problem Hyun Jae describes above. It will hang for a very long time and then produce an enc3 error, or a nonsensical plot. I haven't been able to get it to spit out the enc3 error with this toy data yet, but the hanging will happen after a few tries.

I note that it's normal for it to take some time to actually print out the plot, but creating the plot with ggraph(graph, layout = 'circlepack', weight = size) + geom_node_circle(aes(fill = label, color = factor(depth))) is where the issue comes from.

It seems there is some randomness to the results as it figures out the plot co-ordinates.
Does this have something to do with the inefficiencies of the layout algorithms as per #234 ?
And are there any solutions you can suggest?

Thanks,
Dan

#n of groupings is 50
#n of items is 3204
#n of labels is 10


library(igraph)
library(ggraph)
library(dplyr)
library(magrittr)
library(uuid)
library(RColorBrewer)

items <- uuid::UUIDgenerate(n = 3204) #identifiers of invidual items
labels <- c('boston','brisbane','london','paris','newyork','tokyo','moscow','bogota','beijing','johannesburg') #categories for colouring individual items
groupings <- replicate(paste(sample(LETTERS, 4, TRUE), collapse = ''), n = 50) #grouping variable to pack items in

data <- data.frame(items = items, label = sample(labels, 3204, replace = T), group = sample(groupings, 3204, replace = T)) %>% group_by(group)

#root edges
root_edges =  data.frame('a' = 'root', 'b' = group_keys(data)$group)

#item edges
item_edges = mapply(FUN = function(x, y) {
  cbind('a' = x, 'b' = y$items) } ,
  group_keys(data)$group, group_split(data), USE.NAMES = F
) %>% with(do.call(rbind,.)) %>% as.data.frame()

#this object contains all edges for circlepack plot. items within groupings within 'root'
edge_list = as.matrix(rbind(root_edges, item_edges))

#need to add entries in the original data to accommodate `grouping` and `root` as nodes, so get list of missing node names. This is just for code functionality
missing_node_names = data.frame('items' = do.call(c, root_edges) %>% unique())

#this object contains all the metadata required for the graph
vertex_meta <- data %>% mutate(size = 1) %>% full_join(missing_node_names)  %>% 
  mutate(label = replace(as.character(label), is.na(label), 'A-None'), group = replace(as.character(group), is.na(group), 'A-None')) %>% 
  mutate(size = replace(size, is.na(size), 0),
         label = factor(label, exclude = NULL) )

#init graph
graph = igraph::graph_from_data_frame(edge_list, vertices = vertex_meta)

#colors of lables
label_colors = RColorBrewer::brewer.pal(n = 10, name = 'BrBG')
#colors need to be a named vector for plotting to work
names(label_colors) = levels(as.factor(data$label))
#Add background color of white for all `A-None` data points
colors_for_labels = c("A-None" = 'white', label_colors)

#now create plot
graph_by_label = ggraph(graph, layout = 'circlepack', weight = size) + 
  geom_node_circle(aes(fill = label, color = factor(depth))) +
  theme_void() + coord_fixed() +
  scale_fill_manual(values=colors_for_labels, breaks = names(label_colors)) +
  scale_color_manual(values = c("0" = "white", "1" = "black", "2" ="black"), breaks = c()) + guides(fill = guide_legend('Labels'))

#and show
graph_by_label

@thomasp85
Copy link
Owner

Thanks - I can reproduce the behaviour based on the provided code. Will investigate

@dn-ra
Copy link

dn-ra commented Jan 16, 2024

Hi Thomas, thanks for fixing.
Out of interest could you let me know what caused it?

Thanks,
Dan

@thomasp85
Copy link
Owner

it was due to rounding errors that lead to catastrophic failures as it is so often with computational geometry. It was fixed in the D3 version some time ago but I hadn't kept up to date with it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants