Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong categorical color range when "legend_interactive = FALSE" #43

Closed
SimonSchuepbach opened this issue Apr 5, 2023 · 3 comments · Fixed by #46
Closed

Wrong categorical color range when "legend_interactive = FALSE" #43

SimonSchuepbach opened this issue Apr 5, 2023 · 3 comments · Fixed by #46
Assignees
Labels
bug Something isn't working

Comments

@SimonSchuepbach
Copy link

Describe the bug
When plotting with a categorical color range and no interactive legend ("legend_interactive = FALSE"), the color range is biased, some values appear as if they belong to a different category than they actually are in.

Thanks for considering the issue.
I'll be ready for eventual questions.
Cheers, Simon

To Reproduce
To reproduce you can use the available "vbz" data.
The code to reproduce is given below. Execute it onec as it is and an other time with the option "legend interactive = TRUE":

data("vbz")
df <- vbz[[3]]

catmaply(
df,
x = trip_id,
x_order = trip_seq,
y = stop_name,
y_order = stop_seq,
z = occupancy,
categorical_color_range = TRUE,
categorical_col = occ_category,
legend_interactive = FALSE,
hover_template = paste(
"Time:", departure_time,
"
Stop:", stop_name,
"
Occupancy:", occupancy,
"
Occupancy Category:", occ_cat_name,
"
No Of Measurements:", number_of_measurements,
""
),
legend_col = occ_cat_name
)

Expected behavior
I would expect the colors in the plot to stay the same for each square within the heatmap plot, since neither the values nor their categories change. It's just the legend type which changes. Instead, the colors are shifted such that for some of the squares the category seems to change, which is not true.
With the interactive legend the color category and the variable category seem to match, whereas with the non-interactive legend, they seem to be offset.

Screenshots
Screenshot of a non-interactive example, where a square (courseno.: 17020, station name: Zuerich, Kreuzplatz) with a value of the category "medium high" appears in a color which clearly is within the color palette of the category "high":
image

The very same example but this time with the interactive legend shows, that the same square now belongs to the correct category "medium high":
image

Environment:
Append result of sessionInfo().
sessionInfo.txt

@christophbaur christophbaur added the bug Something isn't working label Apr 5, 2023
@christophbaur
Copy link
Member

Hey @yvesmauron! Can you take a look at this bug in the next few days?

@yvesmauron
Copy link
Collaborator

Hi @SimonSchuepbach

It seems that the value ranges of occupancy per occupancy_cat overlap. Is this intended/possible in the vbz dataset @christophbaur, or did I miss something?

library(dplyr)

data("vbz")
df <- vbz[[3]]

df %>% 
  group_by(occ_category) %>% 
  summarize(
    min_occupancy = min(occupancy),
    max_occupancy = max(occupancy)
  )
# A tibble: 4 × 3
  occ_category min_occupancy max_occupancy
         <int>         <dbl>         <dbl>
1            1           0            31  
2            2          23            61.9
3            3          45.7          93  
4            4          69.2          96.2

Nonetheless, there seems to be still a small issue with the current logic when ranges are not evenly balanced. So for example, evenly distributed z values per category as shown below work without issue:

df_test <- tibble(
  x=as.integer(c(1,1,1,1,2,2,2,2)),
  y=as.integer(c(1,2,3,4,1,2,3,4)),
  z=as.integer(c(1,3,5,7,8,6,4,2)),
  z_cat=as.factor(as.integer(c(1,2,3,4,4,3,2,1)))
)

df_test %>%
  group_by(z_cat) %>%
  summarize(
    min_z = min(z),
    max_z = max(z),
    category_range = max(z) - min(z)
  )

# A tibble: 4 × 4
  z_cat min_z max_z category_range
  <fct> <int> <int>          <int>
1 1         1     2              1
2 2         3     4              1
3 3         5     6              1
4 4         7     8              1

catmaply(
  df_test,
  x=x,
  y=y,
  z=z,
  categorical_color_range = TRUE,
  categorical_col = z_cat,
  legend_interactive = FALSE,
  x_range = 2
)

image

However, if ranges are uneven, the legend is off as min/max values of z do not align with the legend:

df_test <- tibble(
  x=as.integer(c(1,1,1,1,2,2,2,2)),
  y=as.integer(c(1,2,3,4,1,2,3,4)),
  z=as.integer(c(1,3,5,7,11,6,4,2)),
  z_cat=as.factor(as.integer(c(1,2,3,4,4,3,2,1)))
)

df_test %>%
  group_by(z_cat) %>%
  summarize(
    min_z = min(z),
    max_z = max(z),
    category_range = max(z) - min(z)
  )

# A tibble: 4 × 4
  z_cat min_z max_z category_range
  <fct> <int> <int>          <int>
1 1         1     2              1
2 2         3     4              1
3 3         5     6              1
4 4         7    11              4

catmaply(
  df_test,
  x=x,
  y=y,
  z=z,
  categorical_color_range = TRUE,
  categorical_col = z_cat,
  legend_interactive = FALSE,
  x_range = 2
)

image

We need to investigate the best option to fix this; such as e.g. drawing the ranges of the legend based on ranges of values per category in the dataset or other options. Possible solutions will be posted in this thread in the following weeks.

@yvesmauron yvesmauron self-assigned this Apr 29, 2023
@christophbaur
Copy link
Member

Hey @yvesmauron!
Yes, the categories (occ_category) can overlap, as they depend on the vehicles in this example. This is intended. e.g. a long train with no free seats has a different total number of passengers as a small bus with no free seats.

Also the values of occupancy are based on "real" measurements, so the calculated min/max-values per category are random.

library(catmaply)
library(dplyr)

data("vbz")
df <- vbz[[3]]

df %>% 
  group_by(occ_category,
           vehicle) %>% 
  summarize(
    min_occupancy = min(occupancy),
    max_occupancy = max(occupancy)
  )%>%
  ungroup()%>%
  arrange(vehicle)
# A tibble: 8 × 4
  occ_category vehicle min_occupancy max_occupancy
         <int> <fct>           <dbl>         <dbl>
1            1 DGT             0              31  
2            2 DGT            31.0            61.9
3            3 DGT            62.2            93  
4            4 DGT            93.5            96.2
5            1 GT              0.389          22.5
6            2 GT             23              44.2
7            3 GT             45.7            64.2
8            4 GT             69.2            84.1

Adding the category (occ_category) is part of the preprocessing of the data and catmaply does not know anything about how categories are calculated. And this is also intended.

The mentioned Issue from @SimonSchuepbach depends on the switch between legend_interactive = FALSE or legend_interactive = TRUE. Catmaply should render the correct category no matter of the state of legend_interactive, shouldn't it? May the vbz-example is a bit overloaded and tricky due to the overlapping cateogories.

Let's try with this one.
Please note: each z has its own category z_cat with the same name in z_cat_name. The only difference is legend_interactive = FALSE or legend_interactive = TRUE

df_test <- tibble(
  x=as.integer(c(1,1,1,1,2,2,2,2)),
  y=as.integer(c(1,2,3,4,1,2,3,4)),
  z=as.integer(c(1,3,5,11,1,3,5,11)),
  z_cat=as.integer(c(1,3,5,11,1,3,5,11)),
  z_cat_name=as.character(c(1,3,5,11,1,3,5,11))
)


catmaply(
  df_test,
  x=x,
  y=y,
  z=z,
  categorical_color_range = TRUE,
  color_palette = viridis::inferno,
  categorical_col = z_cat,
  legend_interactive = TRUE,
  legend_col = z_cat_name,
  x_range = 2
)

'5' is one of the orange colors, looks like expected
image

VS.

df_test <- tibble(
  x=as.integer(c(1,1,1,1,2,2,2,2)),
  y=as.integer(c(1,2,3,4,1,2,3,4)),
  z=as.integer(c(1,3,5,11,1,3,5,11)),
  z_cat=as.integer(c(1,3,5,11,1,3,5,11)),
  z_cat_name=as.character(c(1,3,5,11,1,3,5,11))
)


catmaply(
  df_test,
  x=x,
  y=y,
  z=z,
  categorical_color_range = TRUE,
  color_palette = viridis::inferno,
  categorical_col = z_cat,
  legend_interactive = FALSE,
  legend_col = z_cat_name,
  x_range = 2
)

'5' is not in the orange colors, but i would expect it there.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants