Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some INSERT statements are ignored with Iceberg #20092

Closed
ricardopereira33 opened this issue Dec 13, 2023 · 7 comments · Fixed by #20207
Closed

Some INSERT statements are ignored with Iceberg #20092

ricardopereira33 opened this issue Dec 13, 2023 · 7 comments · Fixed by #20207
Labels
bug Something isn't working correctness iceberg Iceberg connector RELEASE-BLOCKER

Comments

@ricardopereira33
Copy link

ricardopereira33 commented Dec 13, 2023

We noticed a weird issue with Trino version 434 (latest). Some of our users do a lot of INSERT INTO ... in an iceberg table with Trino. However, when we changed the version from 428 to 434, seems like some inserts are ignored. It actually writes the file on the object storage, and commits a new snapshot, mentioning that it added X records (we get the same result from the stats in the Trino UI). However, when we query the table, it does not return any records. We tried to query the iceberg table with Spark as well, and it does not return anything.

Screenshot 2023-12-11 at 17 18 57 Screenshot 2023-12-11 at 17 15 35 Screenshot 2023-12-11 at 17 13 28

We analysed the metadata and manifest files, and we found out the following:

Until the sequence-number 15934, everything was fine. Then we have 2 inserts almost at the same time, that committed 2 new snapshots:

29819-ccca636b-3279-4fe2-a733-bc3eed379312.metadata.json  03:56:15
29819-982db761-e158-42ee-8b5e-2c2ac92681ac.metadata.json  03:56:12

We can see a pattern on the metadata files where the prefix increments on every new snapshot, but in this case it is the same (not sure if it is expected). Anyway, if we open both metadata files, we can see the reference to the manifests (Avro files), and the latest metadata file includes the previous one, so I guess it could handle the concurrency.

  • Last manifest in 29819-982db761-e158-42ee-8b5e-2c2ac92681ac.metadata.json:
 {
    "sequence-number" : 15935,
    "snapshot-id" : 2349488274696194790,
    "parent-snapshot-id" : 4184787990992439919,
    "timestamp-ms" : 1702007762898,
    "summary" : {
      "operation" : "append",
      "trino_query_id" : "20231208_035545_04129_jywup",
      "added-data-files" : "1",
      "added-records" : "14",
      "added-files-size" : "14047",
      "changed-partition-count" : "1",
      "total-records" : "2670050571",
      "total-files-size" : "211155960909",
      "total-data-files" : "36273",
      "total-delete-files" : "1616",
      "total-position-deletes" : "5429420",
      "total-equality-deletes" : "0"
    },
    "manifest-list" : "gs://...b2a6894c12fb.avro",
    "schema-id" : 3
  }
  • Last manifest in 29819-ccca636b-3279-4fe2-a733-bc3eed379312.metadata.json:
{
    "sequence-number" : 15935,
    "snapshot-id" : 2349488274696194790,
    "parent-snapshot-id" : 4184787990992439919,
    "timestamp-ms" : 1702007762898,
    "summary" : {
      "operation" : "append",
      "trino_query_id" : "20231208_035545_04129_jywup",
      "added-data-files" : "1",
      "added-records" : "14",
      "added-files-size" : "14047",
      "changed-partition-count" : "1",
      "total-records" : "2670050571",
      "total-files-size" : "211155960909",
      "total-data-files" : "36273",
      "total-delete-files" : "1616",
      "total-position-deletes" : "5429420",
      "total-equality-deletes" : "0"
    },
    "manifest-list" : "gs://...b2a6894c12fb.avro",
    "schema-id" : 3
},
{
    "sequence-number" : 15936,
    "snapshot-id" : 6897514136146417222,
    "parent-snapshot-id" : 2349488274696194790,
    "timestamp-ms" : 1702007773563,
    "summary" : {
      "operation" : "append",
      "trino_query_id" : "20231208_035556_04136_jywup",
      "added-data-files" : "1",
      "added-records" : "6",
      "added-files-size" : "14039",
      "changed-partition-count" : "1",
      "total-records" : "2670050577",
      "total-files-size" : "211155974948",
      "total-data-files" : "36274",
      "total-delete-files" : "1616",
      "total-position-deletes" : "5429420",
      "total-equality-deletes" : "0"
    },
    "manifest-list" : "gs://...5cfc3f50c6c9.avro",
    "schema-id" : 3
  } ],

When we tried to check the Avro files, we could see the problem. For the commit before these 2 inserts, it has the following information about the sequence number 15934:

{'avro.schema': ..., 
'avro.codec': b'deflate', 
'snapshot-id': b'4184787990992439919', 
'format-version': b'2', 
'sequence-number': b'15934', 
'iceberg.schema': ...
'parent-snapshot-id': b'7860648970688177621'}
{'manifest_path': 'gs://...954ea5f5a0a0-m0.avro', 'manifest_length': 13879, 'sequence_number': 15934, 'min_sequence_number': 15934, 'added_snapshot_id': 4184787990992439919, ...}

However, for the 2 commits we have this on each avro file:

  • gs://...b2a6894c12fb.avro:
{'avro.schema': ..., 
'avro.codec': b'deflate', 
'snapshot-id': b'2349488274696194790', 
'format-version': b'2', 
'sequence-number': b'15935', 
'iceberg.schema': ..., 
'parent-snapshot-id': b'4184787990992439919'}
{'manifest_path': 'gs://...954ea5f5a0a0-m0.avro', 'manifest_length': 13879, 'sequence_number': 15934, 'min_sequence_number': 15934, 'added_snapshot_id': 4184787990992439919, ...}
  • gs://...5cfc3f50c6c9.avro:
{'avro.schema': ..., 
    'avro.codec': b'deflate', 
    'snapshot-id': b'6897514136146417222', 
    'format-version': b'2', 
    'sequence-number': b'15936', 
    'iceberg.schema': ..., 
    'parent-snapshot-id': b'2349488274696194790'
}
{'manifest_path': 'gs://...954ea5f5a0a0-m0.avro', 'manifest_length': 13879, 'sequence_number': 15934, 'min_sequence_number': 15934, 'added_snapshot_id': 4184787990992439919, ...}

For some reason, it doesn’t include a -mX.avro file for these inserts…

@findinpath findinpath added bug Something isn't working correctness iceberg Iceberg connector labels Dec 13, 2023
@findinpath
Copy link
Contributor

Slack discussion:

https://trinodb.slack.com/archives/CJ6UC075E/p1702376393926019

@findinpath
Copy link
Contributor

findinpath commented Dec 13, 2023

Snapshot IDs sequence for the table:
7860648970688177621 -> 4184787990992439919 -> 2349488274696194790 -> 6897514136146417222

The snapshots 2349488274696194790 and 6897514136146417222 have:

  • an own sequence number (monotonically ascending) ✔️
  • an own manifest list file ✔️
  • almost identical manifest files ✖️

it just changed the header, with the sequence number and snapshots ids:

{'avro.schema': ..., 
'avro.codec': b'deflate', 
'snapshot-id': b'2349488274696194790', 
'format-version': b'2', 
'sequence-number': b'15935', 
'iceberg.schema': ..., 
'parent-snapshot-id': b'4184787990992439919'}

Relevant Trino code:

AppendFiles appendFiles = isMergeManifestsOnWrite(session) ? transaction.newAppend() : transaction.newFastAppend();
ImmutableSet.Builder<String> writtenFiles = ImmutableSet.builder();
for (CommitTaskData task : commitTasks) {
DataFiles.Builder builder = DataFiles.builder(icebergTable.spec())
.withPath(task.getPath())
.withFileSizeInBytes(task.getFileSizeInBytes())
.withFormat(table.getFileFormat().toIceberg())
.withMetrics(task.getMetrics().metrics());
if (!icebergTable.spec().fields().isEmpty()) {
String partitionDataJson = task.getPartitionDataJson()
.orElseThrow(() -> new VerifyException("No partition data for partitioned table"));
builder.withPartition(PartitionData.fromJson(partitionDataJson, partitionColumnTypes));
}
appendFiles.appendFile(builder.build());
writtenFiles.add(task.getPath());
}
// try to leave as little garbage as possible behind
if (table.getRetryMode() != NO_RETRIES) {
cleanExtraOutputFiles(session, writtenFiles.build());
}
commit(appendFiles, session);
transaction.commitTransaction();

Relevant Iceberg code:
https://github.com/apache/iceberg/blob/3112ec91617ef080604f47ac92c255b517458522/core/src/main/java/org/apache/iceberg/SnapshotProducer.java#L226-L244

Relevant piece of information, for the INSERT which does not have the data file reflected in the table, we see:

      "added-data-files" : "1",
      "added-records" : "6",

which means that the were actually newDataFiles in the snapshot producer.

https://github.com/apache/iceberg/blob/7240752e18278b3454be141c2e4b121079004d8b/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java#L933-L935

However, the manifest file corresponding the new data file is missing 🙄

@yoniiny
Copy link

yoniiny commented Dec 14, 2023

@findinpath I think it's probably a regression introduced in Iceberg 1.4. If it's the same issue, here's the relevant PR that should fix it: apache/iceberg#9230

@findepi
Copy link
Member

findepi commented Dec 18, 2023

@alexjo2144 posted a temporary fix for Trino -- #20159
thank you Alex!

@findepi
Copy link
Member

findepi commented Dec 19, 2023

It's disturbing that io.trino.testing.BaseConnectorTest#testInsertRowConcurrently didn't catch the problem.
i would expect it to catch it, assuming no special setup is needed for repro.
@yoniiny any idea what were we missing in this test?

@findepi
Copy link
Member

findepi commented Dec 20, 2023

#20159 would fix the problem , but the current plan is to have a better fix (#20159 (comment))

@findinpath
Copy link
Contributor

#20207 should address the reported issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctness iceberg Iceberg connector RELEASE-BLOCKER
4 participants