Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: cassandraIO ReadAll does not let a pipeline handle or retry exceptions #34160

Open
1 of 17 tasks
VardhanThigle opened this issue Mar 4, 2025 · 3 comments
Open
1 of 17 tasks

Comments

@VardhanThigle
Copy link

VardhanThigle commented Mar 4, 2025

What happened?

If anyone runs CassandraIO to read all rows on a fairly large Cassandra Cluster (~50 Nodes, > 2 TB)
and there are any timeout exceptions a set of rows is never read, CassandraIO only logs the error and proceeds.

Root Cause

cassandraIO ReadAll does not let a pipeline handle or retry exceptions

JDBCIO throws exception which gets retried by dataflow runner on other nodes.
In the most ideal case there should be a way to plug in an exception handler to handle such corner cases in production.

Ref in Code

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@Suvrat1629
Copy link

@VardhanThigle So do we just add the same retry logic that is present in the JdbcIO or do we add make like an enum in CassandraIO with say HandlingAction [RETRY,SKIP, FAIL] and then tweak the ReadFn to handle some retrying logic?

@VardhanThigle
Copy link
Author

VardhanThigle commented Mar 6, 2025

@VardhanThigle So do we just add the same retry logic that is present in the JdbcIO or do we add make like an enum in CassandraIO with say HandlingAction [RETRY,SKIP, FAIL] and then tweak the ReadFn to handle some retrying logic?

There are various ways to go about this

  1. [easy but less ideal] We could just log and re-throw and let the runner (like Dataflow) retry. I would appreciate if we also produce metrics here. This is exactly what JDBCIO does.
  2. We could let the user plug-in an exception handler for reader (writer side, the user has more control as the user wirtes the save or delete functions), we need this since ther's no good way to catch or handle exception in Beam across PTransforms after they are wired into a graph, except possible handling the exception within the PTransform.

I would lean away from enums as different exceptions would need different handling.

Also, side request, there are many things that could be made pluggable in Beam's CassandraIO
The connection manager - currently it's very tied to 3.0 interfaces. and 4.0 driver comes with a rich set of retry level configuration at driver level. There's a meta question here, if and when we shift to 4.0 intefaces in CassandraIO for beam, would we still need an exception handling - IMHO yes, atleast we should rethrow, the current approach of silent logging makes data migration pipeline complete with a success state even if I turn down Cassandra for example. If Cassandra is turned down when the pipeline is running, the pipeline should fail with a few (possibly configurable) retries and not go to a success state.

@shivanshjais22
Copy link

@VardhanThigle If Cassandra times out while reading data, some rows are lost because CassandraIO.ReadAll just logs the error and moves on. To fix this, it should retry failed reads or throw an error so that Dataflow can retry automatically, ensuring no data is missed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants