You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If anyone runs CassandraIO to read all rows on a fairly large Cassandra Cluster (~50 Nodes, > 2 TB)
and there are any timeout exceptions a set of rows is never read, CassandraIO only logs the error and proceeds.
Root Cause
cassandraIO ReadAll does not let a pipeline handle or retry exceptions
JDBCIO throws exception which gets retried by dataflow runner on other nodes.
In the most ideal case there should be a way to plug in an exception handler to handle such corner cases in production.
@VardhanThigle So do we just add the same retry logic that is present in the JdbcIO or do we add make like an enum in CassandraIO with say HandlingAction [RETRY,SKIP, FAIL] and then tweak the ReadFn to handle some retrying logic?
@VardhanThigle So do we just add the same retry logic that is present in the JdbcIO or do we add make like an enum in CassandraIO with say HandlingAction [RETRY,SKIP, FAIL] and then tweak the ReadFn to handle some retrying logic?
There are various ways to go about this
[easy but less ideal] We could just log and re-throw and let the runner (like Dataflow) retry. I would appreciate if we also produce metrics here. This is exactly what JDBCIO does.
We could let the user plug-in an exception handler for reader (writer side, the user has more control as the user wirtes the save or delete functions), we need this since ther's no good way to catch or handle exception in Beam across PTransforms after they are wired into a graph, except possible handling the exception within the PTransform.
I would lean away from enums as different exceptions would need different handling.
Also, side request, there are many things that could be made pluggable in Beam's CassandraIO
The connection manager - currently it's very tied to 3.0 interfaces. and 4.0 driver comes with a rich set of retry level configuration at driver level. There's a meta question here, if and when we shift to 4.0 intefaces in CassandraIO for beam, would we still need an exception handling - IMHO yes, atleast we should rethrow, the current approach of silent logging makes data migration pipeline complete with a success state even if I turn down Cassandra for example. If Cassandra is turned down when the pipeline is running, the pipeline should fail with a few (possibly configurable) retries and not go to a success state.
@VardhanThigle If Cassandra times out while reading data, some rows are lost because CassandraIO.ReadAll just logs the error and moves on. To fix this, it should retry failed reads or throw an error so that Dataflow can retry automatically, ensuring no data is missed.
What happened?
If anyone runs CassandraIO to read all rows on a fairly large Cassandra Cluster (~50 Nodes, > 2 TB)
and there are any timeout exceptions a set of rows is never read, CassandraIO only logs the error and proceeds.
Root Cause
cassandraIO ReadAll does not let a pipeline handle or retry exceptions
JDBCIO throws exception which gets retried by dataflow runner on other nodes.
In the most ideal case there should be a way to plug in an exception handler to handle such corner cases in production.
Ref in Code
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: