Skip to content

Add some additional details to the pitfalls doc #509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 18 additions & 4 deletions packages/apollo-forest-run/docs/normalization-pitfalls.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,8 +151,8 @@ with all the updates that came from different sources (mutations, subscriptions,
3. Reading data from the `store` to return it back to requester in the shape of the original GraphQL result

But there is more... Clients also keep track of "active" queries and whenever data in the store changes,
they essentially re-read those active queries, using the data from the store and notify any query observers
about the change.
they essentially re-read those active queries, using the data from the store and notify any query observers
about the change. Granularity of re-reading (i.e. at the query level vs the fragment level) has different [performance implications][15])

Read more:

Expand All @@ -169,6 +169,12 @@ Also, _in theory_, normalized caches allow you to reduce the number of network r
if the data requested by your query was already fetched by another query,
the client can just retrieve it from the cache and bypass the network request.

However, note that _even if you cannot avoid a network request_, a normalized cache that integrates
with suspense (such as Relay's) makes it very trivial to achieve the following behavior:
when a user navigates from a list view to a detail view, the transition can occur instantaneously,
and the components for which there is enough data (such as the detail view header) can render
instantly. The rest of the detail view is replaced by a suspense boundary.
Comment on lines +172 to +176
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this section doesn't belong here - it should probably go under Example 1. Missing fields.

We should also clarify that it is only possible with UI libraries like React, supporting this kind of rendering interruption (not sure how it will work with something like Vue or Angular).

Also, I'd argue that the key requirement for this behavior is "fragment-level" binding between components and data, not normalization per se. As long as cache can return data for an individual fragment - this behavior will work (normalization is one possible implementation, but not the only one).


However, normalization in GraphQL clients comes at a cost, and also has surprising pitfalls that we will cover below.

## Performance costs of normalization
Expand Down Expand Up @@ -575,18 +581,24 @@ keep results in sync).

Potential benefits:

- Much simpler architecture with way less data transformations
- Much simpler architecture with way less data transformations.
- Natural garbage collection (dispose the whole query and all associated metadata when not needed)
- `O(1)` `read` complexity (simply returns the object that is already up-to-date)
- `O(1)` `read` complexity (simply returns the object that is already up-to-date). Memoization works out of the box,
and no need for subscriptions.
- `write` performance comparable to `normalization` due to the need of synchronization of different results
(although much better for initial writes and writes that do not affect other results)
- Stale data still available when consistency cannot be achieved by syncing with other operations

Limitations:

- No cross-query data reads. `read` only allowed for queries that were written previously (+reads for subsets of queries).
- Relatedly, no cross-query consistency, meaning that if a user modifies data on a given page and navigates back to a previous
page, the data on the previous page will be inconsistent. Adds a burden on developers to invalidate in response to mutations.
Comment on lines +595 to +596
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this note misses the point of this experimental cache. See the beginning of this section:

query results are stored separately and are kept in sync on write without intermediate normalized store

Our experimental document-based cache keeps all operations up-to-date on incoming operations automatically. We just don't use full data normalization/re-reads for this, but instead use normalized diffs for incoming data, which are applied to all cached operations (currently - eagerly, but could be also applied lazily on reads).

So this example is not quite correct. Admittedly, the implementation turned out to be even more complicated in some aspects (incremental and normalized diffing / updating), but simpler in other (e.g. garbage collection is still rather trivial)

- Potentially higher memory consumption as strings are not de-duplicated from source JSON
(could be mitigated with background compaction, also less important with natural GC)
- One cannot obviously get [data masking][14]
without introducing a step in which data is read from a source, so the simplicity gains may not be as drastic as assumed.
- Inability to reuse data, meaning that transitions from list views to detailed views cannot be perceptually instant.
Comment on lines +599 to +601
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on strictness of data-masking definition I guess. Type-level data-masking is possible today with any client via type-generation (e.g. with graphql codegen).

Full runtime data-masking requires additional filtering of data indeed, but I wonder how much value Exact<FragmentType> vs FragmentType actually provides, as long as types guard us against fields mis-use. A big one is probably higher memoization hit rate for React.memo, but for performance-critical parts one could rely on fragment aliases to isolate frequently changing data (sadly not a part of the spec yet).

I also don't see conceptual blockers for this example with document-based cache:

transitions from list views to detailed views cannot be perceptually instant

(in our case this requires either using the same fragment in list/item view or explicitly annotating which queries could be used a source for a specific item fragment as a fallback)

But I completely agree with this statement:

so the simplicity gains may not be as drastic as assumed

Our experience shows that document-based cache with full syncing and fragment-level component binding is not much simpler than normalized cache in terms of implementation (most of the complexity in the existing implementation comes from a requirement to be a drop-in-replacement for Apollo InMemoryCache).


### 3. Graph cache

Expand All @@ -611,3 +623,5 @@ And complexity is even higher than in existing cache implementations.
[11]: https://en.wikipedia.org/wiki/Idempotence
[12]: https://formidable.com/open-source/urql/docs/basics/document-caching/
[13]: https://github.com/convoyinc/apollo-cache-hermes/blob/master/docs/Motivation.md
[14]: https://relay.dev/docs/principles-and-architecture/thinking-in-relay/#data-masking
[15]: https://quoraengineering.quora.com/Choosing-Quora-s-GraphQL-client