Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: Use SQLite to replace AsyncStorage #5309

Merged
merged 11 commits into from
Apr 12, 2022

Conversation

gnprice
Copy link
Member

@gnprice gnprice commented Mar 24, 2022

This adds our own sound, nearly-drop-in replacement for RN's AsyncStorage.

The interface is (at least for now) nearly the same as the upstream AsyncStorage; it skips some methods we don't use, but when you call the methods it does have, it will always satisfy the spec for how the upstream AsyncStorage behaves.

The difference is that it also satisfies a tighter spec: each operation either happens completely or not at all. No operation corrupts the database or has only partial effect, even if the process is killed or encounters I/O errors.

This is accomplished by using SQLite, and doing each operation in a transaction. The upstream AsyncStorage does the same thing on Android; but on iOS, it uses an ad-hoc database which is susceptible to complete corruption if interrupted.

(Somehow I'd thought that the upstream version was unsound even on Android, when it comes to multiSet! But, looking at the implementation again in the past week, it sure seems to use a single transaction; and that's true already in the initial release from 2015.)

So this gives us soundness on both platforms (#4841), which is a big relief and opens up opportunities for more fearless improvements to our schema. It also gives us direct control over the database, which we'll use for migrations and to get more flexibility in the schema (in particular for #5006.)

Fixes: #4841

Copy link
Contributor

@chrisbobbe chrisbobbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!! See a few small comments below, and also a bug I'm seeing on iOS that I've started a CZO thread about: https://chat.zulip.org/#narrow/stream/243-mobile-team/topic/.23M5309.20sound.20storage/near/1352458

@@ -44,6 +44,9 @@ export default class ZulipAsyncStorage {
NativeModules.TextCompressionModule
&& header === NativeModules.TextCompressionModule.header
) {
// TODO: It'd be real nice to do this decompression on the native
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The native (de)compression is already done on the native side, right, it's just that there's this awkward back-and-forth between native and JS. Maybe clearer to say "call this decompression" instead of "do this decompression"?

@@ -5,6 +5,28 @@ import CompressedAsyncStorage from '../CompressedAsyncStorage';
import * as logging from '../../utils/logging';
import * as eg from '../../__tests__/lib/exampleData';

test('smoke-test all methods, without mock', async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you mean "without mock", here and in the commit message? The code still runs through node_modules/@react-native-async-storage/async-storage/jest/async-storage-mock.js, which we set up in our jest/jestSetup.js, doesn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess "without mock" is a better description of how things are at the end of the branch, after switching to the new SQLite-based storage, than it is at this point.

Even at this point in the branch, there is a real difference between these tests and the existing ones: the existing ones are more pure unit tests, inspecting what calls this CompressedAsyncStorage code makes to the underlying layer, while these tests are more integration tests, inspecting the overall results (which reflect the consequences of those calls, albeit simulated by a mock.) I'll reword.

Comment on lines 67 to 68
const row = rows[0];
return row ? row.value : null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Any preference between this and return rows[0]?.value ?? null;?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that happens is:

Warning ----------------------------------------------------------------------------- src/storage/AsyncStorage.js:145:12

This use of optional chaining (`?.`) is unnecessary because `rows[0]` [1] cannot be nullish or because an earlier `?.`
will short-circuit the nullish case. [unnecessary-optional-chain]

   src/storage/AsyncStorage.js:145:12
   145|     return rows[0]?.value ?? null;
                   ^^^^^^^^^^^^^^

References:
   src/storage/AsyncStorage.js:145:12
   145|     return rows[0]?.value ?? null;
                   ^^^^^^^ [1]

I.e., a consequence of Flow's (reasonable) choice to let array accesses unsoundly assume you've separately made sure they're in bounds. So then one needs to suppress that warning and explain.

Given that the root of it is that Flow assumes array accesses are in bounds, perhaps the best solution is to satisfy Flow's expectations and actually check the bounds before making the array access:

    return rows.length > 0 ? rows[0].value : null;

Comment on lines 20 to 25
// But on iOS, it's not so simple. Values over 1024 characters
// (RCTInlineValueThreshold, in RNCAsyncStorage.m) are written as separate
// files. Values up to that threshold go into a single JSON blob (for the
// whole key-value store) that's written as one file. So really our
// sanest path is probably to keep AsyncStorage as a dependency
// indefinitely, and use it to read the old data if the new doesn't exist.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yeah. Should we make a rough plan for how to approach AsyncStorage version upgrades in future? We'll lose stress-testing in our regular dev work for bugs triggered by upgrading it. Er, and I guess also for bugs triggered by not upgrading it; for example, if our RN version advances such that something breaks in AsyncStorage.

Also, a nit: this file itself is named AsyncStorage, but I think the AsyncStorage you mean here is @react-native-async-storage/async-storage; do you think that needs to be disambiguated?

Copy link
Member Author

@gnprice gnprice Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a nit: this file itself is named AsyncStorage, but I think the AsyncStorage you mean here is @react-native-async-storage/async-storage; do you think that needs to be disambiguated?

Looks like in a later commit I had the same thought, and made it "the legacy AsyncStorage". I'll backport that bit.


Hm, yeah. Should we make a rough plan for how to approach AsyncStorage version upgrades in future? We'll lose stress-testing in our regular dev work for bugs triggered by upgrading it. Er, and I guess also for bugs triggered by not upgrading it; for example, if our RN version advances such that something breaks in AsyncStorage.

Hmm. Yeah, I agree that this is a potential source of bugs in principle. I think it's not worth attempting to monitor for such a bug, though, for two reasons:

  • The AsyncStorage library has been quite stable. As far as I've seen, there hasn't been any history of introducing bugs that would prevent reading old data.

  • Our degree of dependence on it is going to rapidly diminish. A few weeks after we release with this change, most of our existing users will have upgraded, and then we'll never invoke the legacy AsyncStorage for them again. For new installs, the only thing we'll need from LegacyAsyncStorage is for it to successfully return an empty list from getAllKeys(), which hopefully keeps it pretty unlikely that it would break.

    Then even for the remaining existing users belatedly upgrading over time, our needs are only slightly more than that: in particular purely read-only, never writing to the legacy storage again. Reads are generally a lot lower-risk than writes, so that should also keep the chances low of a bug that would affect us.

In the long term, my plan is:

  async _migrateFromLegacyAsyncStorage(tx) {
    // TODO: It would be nice to reduce the LegacyAsyncStorage dependency to
    //   be read-only -- in particular, for new installs to stop creating an
    //   empty legacy store, which on Android happens just from initializing
    //   the legacy AsyncStorage module.  This will basically mean vendoring
    //   the library into src/third-party/, and then stripping out
    //   everything not needed for read-only use.

but I think we can do that sometime in the future when we feel like reducing dependencies.


describe('AsyncStorage: migration from legacy AsyncStorage', () => {
beforeAll(async () => {
AsyncStorage.devForgetState();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nit: Since AsyncStorage.devForgetState is marked async, and since we want all this stuff in beforeAll to be done by a certain time, since it's a setup step, I would probably await this, even though it would be NFC with .devForgetState's current implementation. Or not mark .devForgetState as async, if that seems like a good change to its interface.

Same in afterEach, below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I agree. When a function is async / returns a Promise, callers should await it unless they have a specific reason not to.

// AsyncStorage provides its own mock for `.setItem`, but it's
// not set up to simulate failure. So, mock that behavior
// ourselves, and reset to the global mock when we're done.
// Mock `.setItem` to simulate failure, and reset when we're done.
const globalMock = AsyncStorage.setItem;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storage tests [nfc]: Trim comments about AsyncStorage mocks

Nit: In the commit that follows, globalMock won't be a good name for this, because it won't be a mock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we still need to invoke @react-native-async-storage/async-storage/jest/async-storage-mock in our jest/jestSetup.js at the end of this branch?

…Ah, I guess we do end up depending on that mock code, in the "AsyncStorage: migration from legacy AsyncStorage" tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…Ah, I guess we do end up depending on that mock code, in the "AsyncStorage: migration from legacy AsyncStorage" tests.

Yeah, exactly.

Nit: In the commit that follows, globalMock won't be a good name for this, because it won't be a mock.

Hmm true. Maybe I'll call it savedMethod.

@chrisbobbe
Copy link
Contributor

[…] and also a bug I'm seeing on iOS that I've started a CZO thread about: https://chat.zulip.org/#narrow/stream/243-mobile-team/topic/.23M5309.20sound.20storage/near/1352458

We've decided to go with a workaround for now.

@gnprice
Copy link
Member Author

gnprice commented Apr 11, 2022

Thanks for the review! Revision pushed with those changes. I also included a commit on top adding a troubleshooting entry with that workaround for the debugging issue.

gnprice and others added 11 commits April 12, 2022 10:45
This name is clearer about what this code actually does.

We'll want to add other "Zulip" customization in the form of
migrations specific to our app-level schema.  The natural place
for those won't be as part of this same module.
Both so all methods have at least minimal coverage, and for a bit of
integration testing.
This should do everything we need, when installed fresh.
Just needs some code to migrate data from the old AsyncStorage;
we'll add that shortly, before we switch to actually using this.
This sets us up for using constructor arguments to customize
things like the set of migrations.
Hmm, somehow I'd thought that the upstream version was unsound
even on Android when it comes to multiSet!  But, looking at the
implementation now, it sure seems to use a single transaction; and
that's true already in the initial release from 2015.

Anyway, this gives us soundness on both platforms, which is a big
relief and opens up opportunities for more fearless improvements to
our schema.  It also gives us direct control over the database, which
we'll use for migrations and to get more flexibility in the schema.
This is equivalent as long as we're using the upstream AsyncStorage,
where these are just static functions and not methods.  But with our
replacement, as its jsdoc explains, they're actual methods and it
will be necessary to invoke them as such in order to have the
appropriate `this`.
As database migrations typically do, this requires tracking whether
the migration has already been run.  We do that with a scheme that
will naturally extend to handle app-level migrations in the future.
We're about to switch to our own AsyncStorage replacement.  That
will make many of the specifics in these comments no longer true --
it doesn't provide mocks for these, instead it runs the actual code
-- but the upshot for what we want to mock in these tests remains
the same.

So, shorten these comments to what we'll want them to be after that
change.  The resulting comments are all accurate already, too.

Similarly, give a less specific name to the locals where we save the
old values of these methods before mocking them for these tests.
One test case changes: `setItem` now returns void instead of null.

I'm not sure if the upstream AsyncStorage really returns null here,
or if it too returns void and just the mock is off.

Anyway void is clearly the natural thing here -- it's basically a
function that returns no information -- so keep returning that
ourselves, and update the test.  Real callers should never care.

Fixes: zulip#4841
@chrisbobbe chrisbobbe merged commit d017a05 into zulip:main Apr 12, 2022
@chrisbobbe
Copy link
Contributor

Thanks, LGTM! Merged.

@gnprice gnprice deleted the pr-sqlite-storage branch April 12, 2022 18:56
chrisbobbe added a commit to chrisbobbe/zulip-mobile that referenced this pull request Apr 15, 2024
…test)

Changelog:
  https://github.com/react-native-async-storage/async-storage/releases

This new version has a "privacy manifest", which should help us
toward zulip#5847.

For why this upgrade is unlikely to break things, see:
  zulip#5309 (comment)

For why the version was constrained with ~ instead of ^ see
47b7ead, an Expo SDK upgrade:

  [T]he `expo upgrade` command
  went through a phase it described as "Updating packages to
  compatible versions (where known)," and that changed several of our
  dependency ranges, even of non-Expo things.

It would actually be better to remove the dependency entirely. It's
our legacy way of storing data on the device, and we have a
migration to the new way that's now been active for over two years
(see caf3bf9). The proportion of our users who still benefit from
what this dependency offers will be vanishingly small at this point.

But I'm not totally comfortable removing it without more careful
thought than we have time for right now in this legacy codebase.
(One step we'd want to consider, if doing that, is adding and
releasing a migration to clear out the contents of the legacy
storage.)

Tested on my iPhone and on the office Android device, on both a
first and a subsequent startup of the app. No crashes or errors
observed.

Related: zulip#5847
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Store persistent data in a sound way
2 participants