Specify the exception for read failures arising due to partial data loss on the file system #423

abhishek-shanthkumar · 2024-09-23T12:12:26Z

IndexedDB data stored on the file system can be lost due to factors beyond the browser's control, such as when users directly delete the files containing the data.
Total data loss of the database or object store is visible to the script as an absence of data, and can be appropriately handled by it.
However, since the current specification does not mandate consistent data storage, there is a chance of partial data loss where the entry for a record exists in the object store but the contents of the record are lost.
For example, both Chromium and Firefox store blobs and large values in files separate from the main database, which only contains references to these files.
When those files are deleted, the references in the database become dangling and attempts to read the data fail permanently.

The specification does not cover the above scenario, and hence the DOMException thrown varies by browser:

Chromium:
- name: NotFoundError (updated recently)
- message: "Failed to read large IndexedDB value"
Firefox:
- name: UnknownError
- message: "The operation failed for reasons unrelated to the database itself and not covered by any other error code."

Specifying a common DOMException for such cases of partial data loss will help developers handle these scenarios consistently across browsers.

The text was updated successfully, but these errors were encountered:

abhishek-shanthkumar · 2024-09-23T12:18:09Z

It is important to note that a failure to read data stored on the disk can happen due to transient, recoverable reasons as well (low memory, lack of file descriptors, etc.). We can decide whether to specify an error for these scenarios too or to leave it to the implementer, but we should intend to distinguish between the recoverable and unrecoverable cases so that developers can handle the two cases appropriately. More details here - https://chromestatus.com/feature/5140210640486400

abhishek-shanthkumar · 2024-09-23T12:20:39Z

CC @asutherland, @domenic, @inexorabletash, @SteveBeckerMSFT

evanstade · 2024-09-23T18:40:51Z

According to the WebIDL spec UnknownError should be reserved for transient problems. We should probably update the IDB spec to reflect that distinction.

It was pointed out elsewhere that NotFoundError is usually used (in IDB) for cases where the most likely source of the error is the usage of the API, i.e. an index or object store is missing because it was never added.

I'd propose pulling NotReadableError into the IDB spec, which is described by WebIDL as "The I/O read operation failed." (No mention of transient or permanent.) The message text, AFAICT, is left to the implementation; in Chromium we'd likely add some more color such as "A file is missing and the value is irrecoverable. The record should be deleted or updated". Transient read errors should probably be reported as UnknownError (as FF already does -- Chromium uses DataError which seems not correct.)

asutherland · 2024-09-23T22:44:58Z

I agree that it's worth standardizing the error for cases where it's reported, and NotReadableError seems appropriately descriptive.

That said, it's been our belief in Firefox that unless we spec some kind of content opt-in to avoid the behavior, data corruption should result in the containing storage bucket being cleared. Multiple storage buckets seems like the most practical approach since it normalizes a hopefully uncommon case (data corruption) into a common case that code should already be handling (data clearing due to storage pressure).

Related discussions: Storage corruption reporting.

abhishek-shanthkumar · 2024-09-25T11:18:46Z

Thanks, Evan, Andrew. Let me work on a spec PR to:

Specify in the description of UnknownError that it should be used for transient reasons.
Add an entry for NotReadableError under https://w3c.github.io/IndexedDB/#exceptions.
Add an additional step under all applicable retrieval steps that says some version of If the stored value is not readable due to a permanent error (such as data loss), throw a "NotReadableError" DOMException.

Thoughts? Should we specify the conditions for throwing a NotReadableError in a common place instead of for individual methods?

data corruption should result in the containing storage bucket being cleared

We discussed this briefly for Chromium too. Yes, deleting the storage bucket entirely would leave things in as consistent a state as possible, but it is a destructive action whose impact could vary based on how the website is using IndexedDB. Hence, it seemed better to surface the error clearly to the website so that it could take the right corrective action for its use case. We could discuss this further, though, and also perhaps solicit developer feedback.

asutherland · 2024-09-26T00:35:36Z

Thoughts? Should we specify the conditions for throwing a NotReadableError in a common place instead of for individual methods?

This seems good, thank you. I don't have a strong opinion other than, if there aren't many cases of needing the verbiage to throw the error and the verbiage isn't too verbose, having them in-line as opposed to in an extracted algorithm makes it more likely people will see that that can happen?

data corruption should result in the containing storage bucket being cleared

We discussed this briefly for Chromium too. Yes, deleting the storage bucket entirely would leave things in as consistent a state as possible, but it is a destructive action whose impact could vary based on how the website is using IndexedDB. Hence, it seemed better to surface the error clearly to the website so that it could take the right corrective action for its use case. We could discuss this further, though, and also perhaps solicit developer feedback.

Yes, it would be good to discuss this further, and it need not be on this issue.

I agree with the rationale that if a site can take corrective action, it would be good to enable the site to take corrective action. Unfortunately we've found many users of IDB don't really handle errors, so it would be desirable to have some kind of strong signal that the site is taking responsibility for dealing with the error. But I suppose that could be as simple as "the default behavior of a NotReadableError is clearing your bucket, so call preventDefault if you think you can clean things up".

A meta issue for us is that right now, for non-Private Browsing Mode (PBM), we really only have SQLite's concept of detecting corruption and structured deserialization errors from JS, so if corruption happens, it's very hard to tell what still might be good. For PBM we actually use AEAD encryption on a database page level and blob chunk basis so we potentially have a much better idea of what is good and what is bad, and we might potentially move to using AEAD for non-PBM cases to that end in order to be able to better characterize the integrity of data.

abhishek-shanthkumar · 2024-10-16T07:54:00Z

@asutherland, I've created #431 to discuss further on whether we should spec deletion of site data on corruption.

abhishek-shanthkumar mentioned this issue Oct 8, 2024

Specify the error code for failures when reading a value from disk #430

Merged

5 tasks

inexorabletash closed this as completed in #430 Oct 15, 2024

abhishek-shanthkumar mentioned this issue Oct 15, 2024

Delete entire site data on data corruption/loss? #431

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify the exception for read failures arising due to partial data loss on the file system #423

Specify the exception for read failures arising due to partial data loss on the file system #423

abhishek-shanthkumar commented Sep 23, 2024

abhishek-shanthkumar commented Sep 23, 2024

abhishek-shanthkumar commented Sep 23, 2024

evanstade commented Sep 23, 2024 •

edited

Loading

asutherland commented Sep 23, 2024

abhishek-shanthkumar commented Sep 25, 2024

asutherland commented Sep 26, 2024

abhishek-shanthkumar commented Oct 16, 2024

Specify the exception for read failures arising due to partial data loss on the file system #423

Specify the exception for read failures arising due to partial data loss on the file system #423

Comments

abhishek-shanthkumar commented Sep 23, 2024

abhishek-shanthkumar commented Sep 23, 2024

abhishek-shanthkumar commented Sep 23, 2024

evanstade commented Sep 23, 2024 • edited Loading

asutherland commented Sep 23, 2024

abhishek-shanthkumar commented Sep 25, 2024

asutherland commented Sep 26, 2024

abhishek-shanthkumar commented Oct 16, 2024

evanstade commented Sep 23, 2024 •

edited

Loading