Floating stacked bar segments - adding padding to stacked bars
Proposing a new feature to add to position_stack & position_fill, which adds a padding between the stack segments. I believe this can improve readability and accessibility of stacked bar charts with certain colour palettes.
I have created a PR to make a start at implementing this. I couldn't find this feature in any open or closed issues & PRs, and I couldnt see this in any ggplot extensions. In the alternatives section at the end I show how I was able to do this without any ggplot internals changes, by manually specifying segment min and max coordinates in the data and using geom_rect - but it would be great if it was built in to ggplot! :)
The problem
I have come across a situation where I would have liked to be able to make the distinction between stacked bar sections/segments more clear. When using a sequential palette (like a smoother colour scale) rather than a categorical palette (disctinctly different colours) for stacked bars can lead to some segments having very similar colours.
This kind of palette makes sense for when the stacking var is sequential, and so you want to show the reader an increase/decrease through the stack items.
Example explained
The example problem shows some survey question data. This question has 3 categories, A, B and C, and users are asked to place a vote on each category, but the vote is placed on which position they think that category should sit in. There are 6 values on the position scale. This isn't the best example, but I have done it this way for the sake of showing that both fill and stack work with my changes, and for showing some failing cases.
(A more realistic example would be the results from a question like "Here are 6 categories, A to F. Rank them in order of first to last place." Then we have how many votes each category got for each position.)
library(ggplot2)
df <- data.frame(
value = c(
rep("A", 6),
rep("B", 6),
rep("C", 4)
),
rank = factor(c(
1, 2, 3, 4, 5, 6,
1, 2, 3, 4, 5, 6,
3, 4, 5, 6)
),
n = c(
8, 4, 3, 2, 1, 1,
4, 4, 3, 2, 2, 1,
1, 2, 3, 3
)
)
Here is the colour palette, taken from https://analysisfunction.civilservice.gov.uk/policy-store/data-visualisation-colours-in-charts/#section-6.
six_col_pal <- c(
"#092135",
"#12436D",
"#2073BC",
"#6BACE6",
"#ADD1F1",
"#F2F2F2"
)
The first plot shows what happens if we produce a basic stacked bar chart with this data and palette.
df |>
ggplot() +
geom_col(
aes(y = value, x = n, fill = rank),
position = position_stack(),
width = 0.5
) +
theme_minimal() +
scale_fill_discrete(palette = six_col_pal) +
labs(title = "Simply stack")
Whilst you may see no issues here and you may be able to clearly differentiate between the 2 darkest colours and the 2 lightest colours, in certain situations these can be problematic. For certain people, or certain screens, or when displayed at a certain size, these may not be so clear.
With what is included in ggplot the best I could find was to draw a border. The colour chosen here is the suggested border colour within the guidance https://analysisfunction.civilservice.gov.uk/policy-store/data-visualisation-colours-in-charts/.
This works well for the most part but can still be tricky for the darkest two colours.
df |>
ggplot() +
geom_col(
aes(y = value, x = n, fill = rank),
position = position_stack(),
width = 0.5,
colour = six_col_pal[2]
) +
theme_minimal() +
scale_fill_discrete(palette = six_col_pal) +
labs(title = "Stack with a dark border")
Using a light border can also help, but there can be similar difficulty with the lightest two colours.
df |>
ggplot() +
geom_col(
aes(y = value, x = n, fill = rank),
position = position_stack(),
width = 0.5,
colour = six_col_pal[6]
) +
theme_minimal() +
scale_fill_discrete(palette = six_col_pal) +
labs(title = "Stack with a light border")
Solution
I added a new argument to position_stack, called padding (to make it similar to other usage, like in position_dodge2). The padding is a percentage of the panel size, calculated using the largest stack size.
This will 'pad' stack items, creating a gap between them. It also has the side effect of making the items overlap when the passing value is greater than 1, again like with position_dodge2, although that was not what I need.
This is how it looks.
df |>
ggplot() +
geom_col(
aes(y = value, x = n, fill = rank),
position = position_stack(padding = 0.01),
width = 0.5
) +
theme_minimal() +
scale_fill_discrete(palette = six_col_pal) +
labs(title = "'Floating' stacks, with new padding param")
This shows that it works with both the existing stack position types (position_stack, position_fill).
df |>
ggplot() +
geom_col(
aes(y = value, x = n, fill = rank),
position = position_fill(padding = 0.01),
width = 0.5
) +
theme_minimal() +
scale_fill_discrete(palette = six_col_pal) +
labs(title = "'Floating' stacks work with both fill and stack")
The final version of this that I chose for the graphic looked something like this, with borders.
df |>
ggplot() +
geom_col(
aes(y = value, x = n, fill = rank),
position = position_fill(padding = 0.01),
width = 0.5,
colour = six_col_pal[2]
) +
theme_minimal() +
scale_fill_discrete(palette = six_col_pal) +
labs(title = "'Floating' stacks, works well with border col")
Most of the changes are made in pos_stack. There is also a change in compute_panel where we find the panel_span, which looks at all the data to get the size of the tallest stack. The intention here is to make the gap size consisten across all the stacks, and for the argument (padding = X) to reflect a similar size across different scale data sets.
We can't do this in pos_stack, as it only sees a single stack.
Failing cases
This does not work for empty states. The example below includes some zeroes. The gap is still calculated for those stack segments, and they are then offset, and a rect is drawn, making 'zero' sections appear when they shouldn't.
df2 <- data.frame(
value = c(rep("A", 6), rep("B", 6), rep("C", 6)),
rank = factor(rep(c(1, 2, 3, 4, 5, 6), 3)),
n = c(8, 4, 3, 2, 1, 1, 5, 4, 3, 2, 2, 1, 0, 0, 1, 2, 3, 3)
)
df2 |>
ggplot() +
geom_col(
aes(y = value, x = n, fill = rank),
position = position_stack(padding = 0.01),
width = 0.5
) +
theme_minimal() +
scale_fill_discrete(palette = six_col_pal) +
labs(title = "'Floating' stacks, breaks on zeroes")
This same problem appears again in the case of very tiny stack segments. Here there are n values in the thousands, but a couple n = 2, for category C.
df3 <- data.frame(
value = c(rep("A", 6), rep("B", 6), rep("C", 6)),
rank = factor(rep(c(1, 2, 3, 4, 5, 6), 3)),
n = 4000 * c(8, 4, 3, 2, 1, 1, 5, 4, 3, 2, 2, 1, 0, 0, 1, 2, 3, 3) + 2
)
df3 |>
ggplot() +
geom_col(
aes(y = value, x = n, fill = rank),
position = position_stack(padding = 0.01),
width = 0.5
) +
theme_minimal() +
scale_fill_discrete(palette = six_col_pal) +
labs(title = "'Floating' stacks, breaks on tiny segments")
With negatives the spacing is applied properly, apart from at the axis because the code avoids the first index on the ymins, and the largest index on the ymaxs (# taken from pos_stack; lo[-1] <- lo[-1] + half_gap; hi[-nrow(df)] <- hi[-nrow(df)] - half_gap);). So only a minor fail?
df4 <- data.frame(
value = c(rep("A", 6), rep("B", 6), rep("C", 6)),
rank = factor(rep(c(1, 2, 3, 4, 5, 6), 3)),
n = c(8, 4, 3, 2, 1, 1, 5, 4, 3, 2, 2, 1, -5, -2, 1, 2, 3, 3)
)
df4 |>
ggplot() +
geom_col(
aes(y = value, x = n, fill = rank),
position = position_stack(padding = 0.01),
width = 0.5
) +
theme_minimal() +
scale_fill_discrete(palette = six_col_pal) +
labs(title = "'Floating' stacks, breaks on negatives")
Alternatives
I had a look at creating new positions for this, but I feel like it made the position-* namespace a bit cluttered. I was looking at doing position_stackfloat() and position_fillfloat() - this would be similar to how position_jitterdodge() is a mash up of other positions. But it seems less clean, and I don't think adding these gaps feels like its own position.
Also for reference this is how I first attempted this, without touching ggplot2 internals. I calculated some rectangles, then used geom_rect to plot specifically using my new coordinates.
df |>
arrange(value, rank |> as.character() |> as.numeric()) |>
group_by(value) |>
mutate(
xend = cumsum(n),
xstart = lag(xend, default = 0),
xstart = xstart + 0.075,
xend = xend - 0.075
) |>
ungroup() |>
geom_rect(
aes(
xmin = xstart, xmax = xend,
ymin = as.numeric(value) - 0.25,
ymax = as.numeric(value) + 0,25,
fill = rank
)
).....
Floating stacked bar segments - adding padding to stacked bars
Proposing a new feature to add to
position_stack&position_fill, which adds a padding between the stack segments. I believe this can improve readability and accessibility of stacked bar charts with certain colour palettes.I have created a PR to make a start at implementing this. I couldn't find this feature in any open or closed issues & PRs, and I couldnt see this in any ggplot extensions. In the alternatives section at the end I show how I was able to do this without any ggplot internals changes, by manually specifying segment min and max coordinates in the data and using geom_rect - but it would be great if it was built in to ggplot! :)
The problem
I have come across a situation where I would have liked to be able to make the distinction between stacked bar sections/segments more clear. When using a sequential palette (like a smoother colour scale) rather than a categorical palette (disctinctly different colours) for stacked bars can lead to some segments having very similar colours.
This kind of palette makes sense for when the stacking var is sequential, and so you want to show the reader an increase/decrease through the stack items.
Example explained
The example problem shows some survey question data. This question has 3 categories, A, B and C, and users are asked to place a vote on each category, but the vote is placed on which position they think that category should sit in. There are 6 values on the position scale. This isn't the best example, but I have done it this way for the sake of showing that both fill and stack work with my changes, and for showing some failing cases.
(A more realistic example would be the results from a question like "Here are 6 categories, A to F. Rank them in order of first to last place." Then we have how many votes each category got for each position.)
Here is the colour palette, taken from https://analysisfunction.civilservice.gov.uk/policy-store/data-visualisation-colours-in-charts/#section-6.
The first plot shows what happens if we produce a basic stacked bar chart with this data and palette.
Whilst you may see no issues here and you may be able to clearly differentiate between the 2 darkest colours and the 2 lightest colours, in certain situations these can be problematic. For certain people, or certain screens, or when displayed at a certain size, these may not be so clear.
With what is included in ggplot the best I could find was to draw a border. The colour chosen here is the suggested border colour within the guidance https://analysisfunction.civilservice.gov.uk/policy-store/data-visualisation-colours-in-charts/.
This works well for the most part but can still be tricky for the darkest two colours.
Using a light border can also help, but there can be similar difficulty with the lightest two colours.
Solution
I added a new argument to
position_stack, calledpadding(to make it similar to other usage, like inposition_dodge2). Thepaddingis a percentage of the panel size, calculated using the largest stack size.This will 'pad' stack items, creating a gap between them. It also has the side effect of making the items overlap when the passing value is greater than 1, again like with
position_dodge2, although that was not what I need.This is how it looks.
This shows that it works with both the existing stack position types (
position_stack,position_fill).The final version of this that I chose for the graphic looked something like this, with borders.
Most of the changes are made in
pos_stack. There is also a change incompute_panelwhere we find thepanel_span, which looks at all the data to get the size of the tallest stack. The intention here is to make the gap size consisten across all the stacks, and for the argument (padding = X) to reflect a similar size across different scale data sets.We can't do this in
pos_stack, as it only sees a single stack.Failing cases
This does not work for empty states. The example below includes some zeroes. The gap is still calculated for those stack segments, and they are then offset, and a rect is drawn, making 'zero' sections appear when they shouldn't.
This same problem appears again in the case of very tiny stack segments. Here there are n values in the thousands, but a couple n = 2, for category C.
With negatives the spacing is applied properly, apart from at the axis because the code avoids the first index on the ymins, and the largest index on the ymaxs (
# taken from pos_stack; lo[-1] <- lo[-1] + half_gap; hi[-nrow(df)] <- hi[-nrow(df)] - half_gap);). So only a minor fail?Alternatives
I had a look at creating new positions for this, but I feel like it made the
position-*namespace a bit cluttered. I was looking at doingposition_stackfloat()andposition_fillfloat()- this would be similar to howposition_jitterdodge()is a mash up of other positions. But it seems less clean, and I don't think adding these gaps feels like its own position.Also for reference this is how I first attempted this, without touching ggplot2 internals. I calculated some rectangles, then used
geom_rectto plot specifically using my new coordinates.