Scope
The purpose of this experiment is to explore some ideas for reducing the size of query result payloads in Relay and GraphQL, which are currently serialized as JSON for delivery over the wire.
Possibilities
- Use of a less redundant format for encoding potentially repetitive tree-shaped data; for example JSON Graph or something like it.
- Given such a format as an object, encoding it in a binary format such as MessagePack or FlatBuffers in order to get a denser format for transmission over the wire.
- Consider teaching the server the semantics of the
RelayRecordStore, so that instead of encoding a tree of data, the server could send a result set across the wire as a series of operations necessary to update a flattened store (NOTE:RelayRecordStorealready is a normalized, flattened store). - Instead of vending materialized objects to views, make client-side proxying readers that could produce data on-demand from one of these dense representations (such as FlatBuffers) which could be stored as-is inside the store. Potential wins here revolve around reduced memory footprint, especially in the absence of garbage collection, but also independently of it given that even in Relay it is possible to have some overfetching of data that is requested and not actually used in the views.
- Make a transform that would enable component-authors to continue to read data
using simple property access (eg.
this.props.viewer.notifications.count, which would transpile tothis.props.viewer().notifications().count()or any of a number of other possible variants). Note that we've previously experimented with lazily-constructed reader methods and noted some speed ups, at least with test data. These might not play out in the real world, but being able to defer materialization of view data from a compact binary representation until the last possible moment, could increase the chances of this working. - Indicate the presence of subfields using bit arrays rather than textual field names.
- Try to do better than standard gzip by using compression algorithms optimized for compressing small data payloads, such as preshared dictionaries, constructed based on static analysis of the schema and and the subset of it actually used by an application.
On the use of gzip
I've got tests in here that measure the effect of gzipping the payload. This is to give us a sense of relative size, and some notion of time cost, but bear in mind that all data coming from the server gets gzipped anyway transparently, so it's not something that we'd have to explicitly do.
Results
I compared payload sizes and wall-clock times for a number of different processing pipelines. The columns in the following table mean:
- op: The operation(s) applied in the pipeline.
- "passthrough": Data is passed through as a JSON-able object, and gets stringified.
- "messagepackize": The data object is processed into the MessagePack binary format.
- "leonize": The data is converted to LEON ("Little Endian Object") format, a binary representation choosen for its similarity to FlatBuffers.
- "flatten": I looked at JSON Graph, asked what pieces of it were
redundant and verbose, and made a Relay (node-centric) flattened
representation of the data that ended up looking very like the way we store
data in
RelayRecordData(albeit without the features ofGraphQLRange; instead, connections are just stored using arrays). - "thin": Like "flatten", but without the redundant embedding of node IDs inside their normalized record storage (because the ID is present in the key that points to the record).
- "arrayrefize": Like "thin", but instead of using a key/value map to store normalized records (where the keys can potentially be quite lengthy, Base-64 encoded things), store them in an array with small integer indices.
- "gzip": Passes the data from a previous step in the pipeline through gzip compression.
- size: The byte count of the data after processing.
- %: Percent size relative to the baseline (the stringified "passthrough").
- time: Time, in milliseconds, to run the data through the pipeline 100 times.
- reverse: Time, in milliseconds, to run the data through the inverse equivalent to the pipeline 100 times. In other words, this is the "decoding" counterpart to the "encoding" step. I did not implement this inverse operation for all of the pipelines.
Sample data results
This was "fake" data with a fair bit of redundancy encoded in it that I generated by hacking the TodoMVC example:
| op | size | % | time | reverse |
|---|---|---|---|---|
| passthrough | 25789 | 100.00% | 17.74ms | 27.89ms |
| passthrough + gzip(1) | 2241 | 8.69% | 391.99ms | n/a |
| passthrough + gzip(6) | 1891 | 7.33% | 527.76ms | n/a |
| passthrough + gzip(9) | 1797 | 6.97% | 683.41ms | n/a |
| messagepackize | 21349 | 82.78% | 85.11ms | 103.54ms |
| messagepackize + gzip(1) | 2189 | 8.49% | 193.17ms | n/a |
| messagepackize + gzip(6) | 1866 | 7.24% | 305.09ms | n/a |
| messagepackize + gzip(9) | 1782 | 6.91% | 411.25ms | n/a |
| leonize | 24506 | 95.03% | 85.34ms | 136.53ms |
| leonize + gzip(1) | 2268 | 8.79% | 205.18ms | n/a |
| leonize + gzip(6) | 1921 | 7.45% | 283.36ms | n/a |
| leonize + gzip(9) | 1832 | 7.10% | 356.84ms | n/a |
| flatten | 23313 | 90.40% | 91.49ms | n/a |
| flatten + gzip(1) | 3496 | 13.56% | 453.11ms | n/a |
| flatten + gzip(6) | 2889 | 11.20% | 559.32ms | n/a |
| flatten + gzip(9) | 2785 | 10.80% | 702.81ms | n/a |
| flatten + messagepackize | 18594 | 72.10% | 168.77ms | n/a |
| flatten + messagepackize + gzip(1) | 3315 | 12.85% | 283.93ms | n/a |
| flatten + messagepackize + gzip(6) | 2757 | 10.69% | 344.60ms | n/a |
| flatten + messagepackize + gzip(9) | 2704 | 10.49% | 473.94ms | n/a |
| thin | 20889 | 81.00% | 97.95ms | n/a |
| thin + gzip(1) | 3081 | 11.95% | 415.11ms | n/a |
| thin + gzip(6) | 2480 | 9.62% | 496.12ms | n/a |
| thin + gzip(9) | 2395 | 9.29% | 629.82ms | n/a |
| thin + messagepackize | 16662 | 64.61% | 156.70ms | n/a |
| thin + messagepackize + gzip(1) | 2892 | 11.21% | 282.12ms | n/a |
| thin + messagepackize + gzip(6) | 2386 | 9.25% | 344.54ms | n/a |
| thin + messagepackize + gzip(9) | 2314 | 8.97% | 414.01ms | n/a |
| arrayrefize | 18387 | 71.30% | 112.70ms | n/a |
| arrayrefize + gzip(1) | 2803 | 10.87% | 421.14ms | n/a |
| arrayrefize + gzip(6) | 2425 | 9.40% | 472.12ms | n/a |
| arrayrefize + gzip(9) | 2376 | 9.21% | 552.89ms | n/a |
| arrayrefize + messagepackize | 14019 | 54.36% | 171.58ms | n/a |
| arrayrefize + messagepackize + gzip(1) | 2576 | 9.99% | 272.92ms | n/a |
| arrayrefize + messagepackize + gzip(6) | 2256 | 8.75% | 340.27ms | n/a |
| arrayrefize + messagepackize + gzip(9) | 2236 | 8.67% | 378.97ms | n/a |
Observations
It is really hard to beat gzip. MessagePack is disappointing when it comes to compactness, but gives a marginal benefit as an intermediate step between intial processing and the final gzipping step.
Even the smallest first-phase ("arrayrefize") format, which is 71.30% of the size of the basic object+stringified format, ends up compressing worse than the naive "redundant" JSON representation (8.67% compared to 6.91%, even with the small boost from including MessagePack in the pipeline).
In short, if you try to eliminate redundancy by applying some kind of clever transformation or restructuring your data object before piping it to gzip, you will probably just do a crappy job of reimplementing gzip at the wrong layer of abstraction, and get worse results overall.
We should probably just represent the data "naturally", and let gzip do the rest.
However, all of this was based on testing a single result payload, and an artificial one at that, so I wanted do a test with another data point.
Newsfeed query results
This query result represents the initial data necessary to render the first few stories on the Facebook newsfeed. It is larger, and may or may not have a different degree of internal redundancy (I didn't look over the actual data with a magnifying glass), and uses lengthy, Base-64 encoded IDs.
| op | size | % | time | reverse |
|---|---|---|---|---|
| passthrough | 61637 | 100.00% | 37.05ms | 63.45ms |
| passthrough + gzip(1) | 12278 | 19.92% | 1063.56ms | n/a |
| passthrough + gzip(6) | 10266 | 16.66% | 1228.81ms | n/a |
| passthrough + gzip(9) | 10202 | 16.55% | 1280.92ms | n/a |
| messagepackize | 55137 | 89.45% | 118.07ms | 169.56ms |
| messagepackize + gzip(1) | 12104 | 19.64% | 526.86ms | n/a |
| messagepackize + gzip(6) | 10671 | 17.31% | 613.71ms | n/a |
| messagepackize + gzip(9) | 10649 | 17.28% | 600.60ms | n/a |
| leonize | 59854 | 97.11% | 150.39ms | 301.10ms |
| leonize + gzip(1) | 12789 | 20.75% | 528.62ms | n/a |
| leonize + gzip(6) | 11012 | 17.87% | 740.05ms | n/a |
| leonize + gzip(9) | 10989 | 17.83% | 704.96ms | n/a |
| flatten | 81788 | 132.69% | 174.48ms | n/a |
| flatten + gzip(1) | 13645 | 22.14% | 1571.88ms | n/a |
| flatten + gzip(6) | 11439 | 18.56% | 1743.39ms | n/a |
| flatten + gzip(9) | 11342 | 18.40% | 1763.25ms | n/a |
| flatten + messagepackize | 75336 | 122.23% | 240.31ms | n/a |
| flatten + messagepackize + gzip(1) | 13398 | 21.74% | 689.82ms | n/a |
| flatten + messagepackize + gzip(6) | 11920 | 19.34% | 869.47ms | n/a |
| flatten + messagepackize + gzip(9) | 11866 | 19.25% | 913.43ms | n/a |
| thin | 68702 | 111.46% | 157.63ms | n/a |
| thin + gzip(1) | 12887 | 20.91% | 1275.43ms | n/a |
| thin + gzip(6) | 10956 | 17.78% | 1544.59ms | n/a |
| thin + gzip(9) | 10867 | 17.63% | 1513.19ms | n/a |
| thin + messagepackize | 62512 | 101.42% | 261.12ms | n/a |
| thin + messagepackize + gzip(1) | 12617 | 20.47% | 616.97ms | n/a |
| thin + messagepackize + gzip(6) | 11379 | 18.46% | 797.79ms | n/a |
| thin + messagepackize + gzip(9) | 11328 | 18.38% | 818.44ms | n/a |
| arrayrefize | 54301 | 88.10% | 161.09ms | n/a |
| arrayrefize + gzip(1) | 12156 | 19.72% | 1039.70ms | n/a |
| arrayrefize + gzip(6) | 10477 | 17.00% | 1240.24ms | n/a |
| arrayrefize + gzip(9) | 10434 | 16.93% | 1219.25ms | n/a |
| arrayrefize + messagepackize | 47884 | 77.69% | 280.34ms | n/a |
| arrayrefize + messagepackize + gzip(1) | 11859 | 19.24% | 637.90ms | n/a |
| arrayrefize + messagepackize + gzip(6) | 10902 | 17.69% | 719.39ms | n/a |
| arrayrefize + messagepackize + gzip(9) | 10889 | 17.67% | 707.08ms | n/a |
Fascinatingly here, the "flatten" and even the "thin" strategies actually caused the result to get bigger. Although gzip does end up undoing a lot of the damage, the simple "passthrough"+gzip still comes out in front. Notably, and unlike the TodoMVC example, adding an intermediate MessagePack step did not improve the overall results.
These results again support the idea that a naive+gzip approach is likely to be a viable default. So in this instance, perhaps, "Dumb is the new Clever".
Conclusions and future work
I mostly focused on size here and didn't have time to really consider the implications of processing time, nor the client side aspects of the problem. Open questions:
- Even though flattened formats don't compress as well as the naive response format, can we get other wins (for example, cheaper writes, particularly as we align the network format with the what the Relay store expects to use internally)?
- How much cheaper would writes have to be to justify delaying the arrival of the query data?
- What about a streamier response format that we could write more granularly to the store as it arrives?
- Would a streamier format be substantially larger across the wire, if expressed in terms of write operations (eg. write field here, append there etc), and would gzip again prove capable of mitigating the downside?
- [Crazy Town] If anything were possible, and it were viable to maintain server-side state about content in the client side store, how small could we make "delta" updates? Does Relay (which tries very hard to avoid overfetching) leave little to win on the table here? How big would the upside be if we could avoid shipping data about nodes referenced deep in the query but which are already on the client (ie. not just redundant encoding vs normalized coding, but redundant encoding vs not encoding the data at all)? Is having the server track previously-queried data insurmountably more difficult than having it track queries to which a client is subscribed?
- Would keeping a binary format like FlatBuffers in the store and materializing them on demand be a net memory win? And would that have flow-on performance benefits? Would the read-time overhead be negligible?
- How do we trade write-time cost (for example, merging binary storage to make reads cheaper) against read-time cost?
- What would we do if the sky were the limits and we could change everything in the stack, including the HTTP protocol? How would we make a much more concurrent, batchy, streamy data retrieval system across client and server?