Description
Because of the String -> Fixed Binary conversion the readers and writers are both incorrect.
The vectorized reader initializes a FixedBinary reader on a column we report is a String causing an unsupported reader exception.
java.lang.UnsupportedOperationException: Unsupported type: UTF8String
at org.apache.iceberg.arrow.vectorized.ArrowVectorAccessor.getUTF8String(ArrowVectorAccessor.java:82)
at org.apache.iceberg.spark.data.vectorized.IcebergArrowColumnVector.getUTF8String(IcebergArrowColumnVector.java:140)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.sort_addToSorter_0$(Unknown Sour
The writer is broken because it gets String Columns from Spark but needs to write fixed binary.
Something like this needed as a fix
private static PrimitiveWriter<UTF8String> uuids(ColumnDescriptor desc) {
return new UUIDWriter(desc);
}
private static class UUIDWriter extends PrimitiveWriter<UTF8String> {
private ByteBuffer buffer = ByteBuffer.allocate(16);
private UUIDWriter(ColumnDescriptor desc) {
super(desc);
}
@Override
public void write(int repetitionLevel, UTF8String string) {
UUID uuid = UUID.fromString(string.toString());
buffer.rewind();
buffer.putLong(uuid.getMostSignificantBits());
buffer.putLong(uuid.getLeastSignificantBits());
buffer.rewind();
column.writeBinary(repetitionLevel, Binary.fromReusedByteBuffer(buffer));
}
}
Metadata
Metadata
Assignees
Labels
No labels