Skip to content

Issue with NA handling in scatter plots: Two NAs per category cause incorrect line connection #2419

Open
@marik-ku

Description

@marik-ku

Issue Summary

When using plot_ly() in R with a scatter plot (mode = "lines+markers"), missing (NA) values are expected to create gaps in the line plot. However, if exactly two NA values exist per category, the missing values are incorrectly connected by a line instead of creating a gap.

Interestingly, when the hovertemplate is removed, the line plot behaves as expected (i.e., creating a gap for NA values). This issue only occurs when there are exactly two NA values per category; the code works with any other number of NA values.

Additional Discovery:
The issue is resolved if I include the argument split = ~Category, but I cannot find documentation for split in Plotly, which makes me think it may be deprecated. Moreover, when the hovertemplate is removed, the inclusion of split does not work as expected and does not resolve the issue.

Reproducible Example

The following R code demonstrates the issue:

library(plotly)

df <- data.frame(
  Category = rep(c("A", "B"), each = 6),
  Date = c(2020, 2021, 2022, 2023, 2024, 2025, 2020, 2021, 2022, 2023, 2024, 2025),
  Value = c(10, 15, NA, NA, 20, 25, 12, 14, NA, 22, NA, 27)
)

df$Date <- factor(df$Date, levels = unique(df$Date), ordered = TRUE)

plot_ly(
  df,
  x = ~Date,
  y = ~Value,
  color = ~Category,
  type = 'scatter',
  mode = 'lines+markers',
  text = ~Category,
  hovertemplate = paste0("Date: %{x}<br>Category: %{text}")
)

Expected Behaviour

  • NA values should create a gap in the line plot, i.e., they should not be connected.
  • This works correctly when there is any number of NA values other than exactly two in any category.

Actual Behavior

  • When there are exactly two NA values per category, the missing values are incorrectly connected by a line instead of creating a gap.
  • When the hovertemplate is removed, the lines create a gap as expected.
  • The issue only arises when there are exactly two NA values per category; any other instance of NA works fine.
  • Including split = ~Category resolves the issue, but:
    • I cannot find any Plotly documentation on split, leading me to believe it may be deprecated.
    • Interestingly, when the hovertemplate is removed, including split does not resolve the issue.

Additional Notes
This seems to be an issue specifically triggered by the combination of NA handling and the hovertemplate. I would appreciate further insight on why this happens or suggestions for a workaround to preserve the gap in the case of two NA values.

System Info

R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows Server 2019 x64 (build 19045)

Matrix products: default

Activity

romanzenka

romanzenka commented on Feb 21, 2025

@romanzenka

I looked in depth into what is happening, and the generated plotly.js data is genuinely missing the null values. Will need to investigate deeper, looks like a genuine bug in the R code to me at the moment.

romanzenka

romanzenka commented on Feb 21, 2025

@romanzenka

I found the culprit... the function traceify in plotly_build would split "traces", looking for trace attributes of a proper length. However, the hidden parameter .plotlyVariableMapping that is used within the trace does not contain actual values, it contains a list of variable names for the trace. If the length of the variable list matches EXACTLY the number of entries in the trace (you have 12 datapoints, with NAs 8 are kept, the number of variables is 8: x, y, text, hoverTemplate, color, .plotlyTraceIndex, .plotlyMissingIndex and .plotlyGroupIndex... so if that happens, the list of variables in a trace gets cut in two halves, which will switch off grouping on one of the traces that no longer contains the .plotlyGroupIndex.

This sounds like very fragile code, I need to think more about how this could be fixed. The culprit is https://github.com/plotly/plotly.R/blob/aa619dc2fbc2fa786e15a8d11444a18863661ed4/R/plotly_build.R#L996C57-L996C71 - this needs to be replaced with a more robust check for "is this entry splittable?"

romanzenka

romanzenka commented on Feb 21, 2025

@romanzenka

Here is a "minimal" counterexample. 7 rows, 1 NAs (total number of rows drops to 6), matches 6 variables without hovertemplate. Add another variable - any variable, it will start working. You can break it by arbitrary number of rows, as long as the number of non-NA rows matches exactly 6.

df <- data.frame(
    Category = c("A", "A", "A", "A", "B", "B", "B"),
    Date = c(2020, 2021, 2022, 2023, 2020, 2021, 2022),
    Value = c(1, 2, NA, 4, 2, 3, 4)
)

df$Date <- factor(df$Date, levels = unique(df$Date), ordered = TRUE)

plot_ly(
  df,
  x = ~Date,
  y = ~Value,
  color = ~Category,
  type = 'scatter',
  mode = 'lines+markers')

Shows incorrectly

Image

Should show

Image
added a commit that references this issue on Feb 21, 2025
cbe30cc
linked a pull request that will close this issue on Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @romanzenka@marik-ku

      Issue actions

        Issue with NA handling in scatter plots: Two NAs per category cause incorrect line connection · Issue #2419 · plotly/plotly.R