Spark: Cannot read or write UUID columns

Because of the String -> Fixed Binary conversion the readers and writers are both incorrect.

The vectorized reader initializes a FixedBinary reader on a column we report is a String causing an unsupported reader exception.

```java
java.lang.UnsupportedOperationException: Unsupported type: UTF8String
	at org.apache.iceberg.arrow.vectorized.ArrowVectorAccessor.getUTF8String(ArrowVectorAccessor.java:82)
	at org.apache.iceberg.spark.data.vectorized.IcebergArrowColumnVector.getUTF8String(IcebergArrowColumnVector.java:140)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.sort_addToSorter_0$(Unknown Sour
```
	
The writer is broken because it gets String Columns from Spark but needs to write fixed binary.

Something like this needed as a fix
```java
  private static PrimitiveWriter<UTF8String> uuids(ColumnDescriptor desc) {
    return new UUIDWriter(desc);
  }

  private static class UUIDWriter extends PrimitiveWriter<UTF8String> {
    private ByteBuffer buffer = ByteBuffer.allocate(16);

    private UUIDWriter(ColumnDescriptor desc) {
      super(desc);
    }

    @Override
    public void write(int repetitionLevel, UTF8String string) {
      UUID uuid = UUID.fromString(string.toString());
      buffer.rewind();
      buffer.putLong(uuid.getMostSignificantBits());
      buffer.putLong(uuid.getLeastSignificantBits());
      buffer.rewind();
      column.writeBinary(repetitionLevel, Binary.fromReusedByteBuffer(buffer));
    }
  }

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spark: Cannot read or write UUID columns #4581

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Spark: Cannot read or write UUID columns #4581

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions