Closed
Description
What happened?
If anyone runs CassandraIO to read all rows on a fairly large Cassandra Cluster (~50 Nodes, > 2 TB)
and there are any timeout exceptions a set of rows is never read, CassandraIO only logs the error and proceeds.
Root Cause
cassandraIO ReadAll does not let a pipeline handle or retry exceptions
JDBCIO throws exception which gets retried by dataflow runner on other nodes.
In the most ideal case there should be a way to plug in an exception handler to handle such corner cases in production.
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner