Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle pre-summed trees #62

Closed
timelyportfolio opened this issue Feb 25, 2018 · 6 comments
Closed

handle pre-summed trees #62

timelyportfolio opened this issue Feb 25, 2018 · 6 comments

Comments

@timelyportfolio
Copy link
Owner

@timelyportfolio timelyportfolio commented Feb 25, 2018

promoting #60 (comment) started by @jstrin


@timelyportfolio I was playing with the d2b sunburst example you posted and noticed that the way the data is aggregated and formatted using the treemap::treemap function, it is causing some of the data to be aggregated resulting in inaccurate counts. For example, if you take a look at the "C" nodes in the example, the raw data (from the JSON) is:

Var Value
C 3.0517
C.1 3.0517
C.1.a. 0.8251
C.1.b. 1.6427
C.1.c 0.5839

C.1.a + C.1.b + C.1.c = C.1 = C

However, the javascript appears to aggregate the data moving down the hierarchy for the sunburst. When you look at the plotted value for C it is displaying as 9; C.1 is displaying as 6. In other words -
C.1.a + C.1.b + C.1.c + C.1 = C.1'
C.1.a + C.1.b + C.1.c + C.1 + C = C'

Where C.1' and C' are the displayed values in the final visualization.

I found that aggregating my data then passing it to the treemap::treepallette function was a quick fix to this:

hier_json <- df %>%
    group_by( index1, index2, index3, index4) %>%
    summarise( size = n( ) ) %>%
    treepalette( ) %>%
    select( index1, index2, index3, index4, size, color = HCL.color) %>%
    d3_nest( value_cols = c( "size", "color"))

If you are looking for assistance putting together htmlwidgets for the d2b sunburst (and/or the d2b bubble chart) I'm happy to do what I can.

@timelyportfolio
Copy link
Owner Author

@timelyportfolio timelyportfolio commented Feb 25, 2018

@jstrin, thanks so much for pointing this out as I have been meaning to clarify, since this can be both confusing and misleading. Your solution is very nice, so I appreciate both taking the time to find a solution and sharing it here.

The problem arises when a tree is pre-summed (generally not the case in my experience, but definitely the case with treemap) and can be traced to this line. My proposed fix would be to only use size at the leaf level, so the sums would be recalculated at all non-leaf nodes.

var root = hierarchy(json)
        .sum(function(d) {
 		// only sum if no children (or is leaf) 
 		if(!(d.children && d.children.length > 0)) return d[x.options.valueField || "size"];
	});

This would be a potentially breaking change, so I'll let this sit for a while and seek input in case someone has an opinion. Please let me know if you see anything amiss.

For testing, we can use the following code.

library(sunburstR)
library(d3r)

df <- data.frame(
  index1 = c(rep("A",3),"B"),
  index2 = c(NA,"A.1","A.1",NA),
  index3 = c(NA, NA, "A.1.1", NA),
  size = c(5,5,5, 10),
  stringsAsFactors = FALSE
)

sunburst(
  d3_nest(df, value_cols="size"),
  count = TRUE
)

... and should expect

image

This would be much easier if we can assume all data passed to sunburst is pre-summed, but I think case this does not fit reality, so I prefer the above approach.

@timelyportfolio
Copy link
Owner Author

@timelyportfolio timelyportfolio commented Feb 25, 2018

More for testing...

library(sunburstR)
library(d3r)
library(treemap)
library(dplyr)


df <- data.frame(
  index1 = c(rep("A",3),"B"),
  index2 = c(NA,"A.1","A.1",NA),
  index3 = c(NA, NA, "A.1.1", NA),
  size = c(5,5,5, 10),
  stringsAsFactors = FALSE
)

sunburst(
  d3_nest(df, value_cols="size"),
  count = TRUE
)


rhd <- random.hierarchical.data()

sunburst(
  d3_nest(rhd, value_cols="x"),
  valueField = "x"
)

tm <- treemap(
  rhd,
  index = paste0("index", 1:3),
  vSize = "x"
)$tm


sunburst(
  tm %>%
    select(index1, index2, index3, vSize) %>%
    d3_nest(value_cols = "vSize"),
  valueField = "vSize"
)
@timelyportfolio
Copy link
Owner Author

@timelyportfolio timelyportfolio commented Feb 25, 2018

@jstrin b8a18d1 implements the proposed change. Would you mind testing with a couple trees on your side to see if it works as expected?

@timelyportfolio
Copy link
Owner Author

@timelyportfolio timelyportfolio commented Feb 25, 2018

from @jstrin


@timelyportfolio sorry for the delayed response. I ran through a handful of tests including using data with truncated paths (e.g., NA values for one of the indices) and it looks good on my end. I'll re-run my test scripts after you push the updated d2b code.

@timelyportfolio
Copy link
Owner Author

@timelyportfolio timelyportfolio commented Feb 25, 2018

Test code now

library(sunburstR)
library(d3r)
library(treemap)
library(dplyr)


df <- data.frame(
  index1 = c(rep("A",3),"B"),
  index2 = c(NA,"A.1","A.1",NA),
  index3 = c(NA, NA, "A.1.1", NA),
  size = c(5,5,5, 10),
  stringsAsFactors = FALSE
)

sunburst(
  d3_nest(df, value_cols="size"),
  count = TRUE,
  valueField = "size",
  sumNodes = FALSE
)


rhd <- random.hierarchical.data()

sunburst(
  d3_nest(rhd, value_cols="x"),
  valueField = "x"
)

tm <- treemap(
  rhd,
  index = paste0("index", 1:3),
  vSize = "x"
)$tm


sunburst(
  tm %>%
    select(index1, index2, index3, vSize) %>%
    d3_nest(value_cols = "vSize"),
  valueField = "vSize",
  sumNodes = FALSE
)
@timelyportfolio
Copy link
Owner Author

@timelyportfolio timelyportfolio commented Mar 1, 2018

going to close, since I believe the new sumNodes argument introduced in 8bc0b1c will satisfy this need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.