Optimise RowData evolution #13340

aiborodin · 2025-06-18T05:45:36Z

RowDataEvolver recomputes Flink RowType and field getters for every input record that needs to match a destination Iceberg table schema. Cache field getters and column converters to optimise RowData conversion.

mxm

Thanks for improving the performance on the conversion write path @aiborodin! It looks like this PR contains two separate changes:

Adding caching to the conversion write path
Refactoring RowDataEvolver to dynamically instantiate converter classes (quasi code generation)

I wonder if we can do (1) as a first step. RowDataEvolver so far has been static and I understand that it needs to become an object in order to add the cache, but perhaps we can use a central RowDataEvolver instance with a cache for source and target schema first. I'm not sure adding the code generation yields much performance and I would like to minimize the objects getting created.

mxm · 2025-06-18T12:19:03Z

...k/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicRecordProcessor.java

+              data.schema(),
+              dataSchema ->
+                  new RowDataConverter(
+                      FlinkSchemaUtil.convert(dataSchema), FlinkSchemaUtil.convert(schema)));


Have we measured which conversion steps take the most time? Would it suffice to simply cache source and target schema while retaining the static conversion code? My gut feeling is that the schema conversion is the most expensive. Apart from caching the schema, the code here creates a series of objects, which adds to the memory footprint.

mxm · 2025-06-18T12:20:57Z

...v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/convert/RowDataConverter.java

+ */
+public class RowDataConverter implements DataConverter {
+  private final RowData.FieldGetter[] fieldGetters;
+  private final DataConverter[] dataConverters;


I don't quite understand why we need to break apart RowDataEvolver. Could we simply add a cache in RowDataEvolver? I don't think the quasi code generation here leads to much performance gain, apart from adding to the memory footprint.

mxm · 2025-06-18T12:22:16Z

flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/convert/MapConverter.java

+  private final DataConverter keyConverter;
+  private final DataConverter valueConverter;
+
+  public MapConverter(MapType sourceType, MapType targetType) {


Do we need separate classes we instantiate for every schema?

mxm · 2025-06-18T12:23:14Z

...k/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicRecordProcessor.java


 @Internal
 class DynamicRecordProcessor<T> extends ProcessFunction<T, DynamicRecordInternal>
    implements Collector<DynamicRecord> {
+  private static final int ROW_DATA_CONVERTER_CACHE_MAXIMUM_SIZE = 1000;


This should be configurable, similarly to the other caches.

mxm · 2025-06-18T12:29:40Z

....0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicTableUpdateOperator.java

+          new RowDataConverter(
+                  FlinkSchemaUtil.convert(data.schema()), FlinkSchemaUtil.convert(newData.f0))
+              .convert(data.rowData());


There is no caching here, or is there?

Optimise RowData evolution

913c0c6

RowDataEvolver recomputes Flink RowType and field getters for every input record that needs to match a destination Iceberg table schema. Cache field getters and column converters to optimise RowData conversion.

github-actions bot added the flink label Jun 18, 2025

aiborodin mentioned this pull request Jun 18, 2025

Flink: Dynamic Iceberg Sink Contribution #12424

Open

mxm reviewed Jun 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimise RowData evolution #13340

Optimise RowData evolution #13340

aiborodin commented Jun 18, 2025

Uh oh!

mxm left a comment

Uh oh!

mxm Jun 18, 2025

Uh oh!

mxm Jun 18, 2025

Uh oh!

mxm Jun 18, 2025

Uh oh!

mxm Jun 18, 2025

Uh oh!

mxm Jun 18, 2025

Uh oh!

Uh oh!

Optimise RowData evolution #13340

Are you sure you want to change the base?

Optimise RowData evolution #13340

Conversation

aiborodin commented Jun 18, 2025

Uh oh!

mxm left a comment

Choose a reason for hiding this comment

Uh oh!

mxm Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

mxm Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

mxm Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

mxm Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

mxm Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!