Skip to content

v0.11.0

@giovanni-guidini giovanni-guidini tagged this 06 Apr 20:39
* Introduce `SessionTotalsArray` class

Currently we've been having infro problems with a customer that uploads a LOT of carryforward flags. The issue we're having is that some columns on the database have such huge inserts (hundreds of MB) that the DB fails. These huge inserts are caused by the `session_totals` array, because the index in which a `SessionTotals` is inside of the array is the ID of the session that generated those totals.

So for many carryforward flags (we think) you end up having huge arrays filled with `null` and a few session_totals at the very end. Also if we shift by lines we might even clear out the totals and just have a huge array of all nulls.

As a quick-and-dirty solution for that problem we'll be encoding this array in a `SessionTotalsArray` class, that has a `real_length` so we know the index of any appended totals, and the non-null totals indexed by their index. We hope that for large totals arrays this will save enough space to protect the DB.

I still want to make some study to see how impactful these changes reall are in compressing the data.

Yes these are dangerous changes so I 'll be testing more thoroughly after hooking up to api and worker.

* Make sure that the items in the `non_null_items` are `SessionTotals`

First step into integration hell. I had issues when integrating the `SessionTotalsArray` into the worker because at times the class would receive lists as items of the `non_null_items` (when pulling from db), and other times `ReportTotals` (when processing a report).

When the items are `ReportTotals` I was having issues on the encoding of the `SessionTotalsArray`. So now when creating it we first make sure to convert all internal items of `non_null_items` into `ReportTotals` first.

One interesting benefit of doing this is that `ReportTotals` encoding includes a step to remove trailing zeros from the array, further reducing the final size of the encoded object.

* Fix session_totals when deleting and carry forwarding sessions

Step two into integration hell.

This bug came when integrating the worker. In a particular test we were deleting the session of ID 0. You can see in `editable.py` that when deleting sessions (which is relevant in the carryforward context) we were iterating over the session totals to generate new ones. Because `__iter__` is defined it would return a list of sessions but with the wrong index (which is now the key of the session in the `non_null_items` dict, NOT it's position in an array anymore).

To solve this we just need to explicitly delete from the `non_null_items` dict using the given key.

You can also see in the `test_carryforward.py` file that this unveiled tests that were gettign erroneus passes before. By creating the `session_totals` as proper `SessionArrayTotals` and keeping track of the session_totals that should be carried forward (via the flag, "simple" or "complex") we can now see that the results are correct.

As usual we can't promise a bug-free experience with 100% certainty, but this seems to be a step in the right direction.

* Foreshadow the next release number for SessionTotalsArray changes

* Small touches to SessionTotalsArray + use that in NetworkFile

Changes to SessionTotalsArray such as the default value in the constructor being immutable (from `{}` to `None`) and better typehints.

Also expanding SessionTotalsArray on iteration correctly (meaning that it now matches what legacy/expanded format for session_totals should be.

Bigger changes around the `NetworkFile` where we were not using `SessionTotalsArray`, but are now.

* Change `real_length` by `session_count` in `SessionTotalsArray`

To improve readability and maintain better context over time we're changing `real_length` to `session_count`

* Add flag to fallback to legacy report style on save

Because the new report format is not backwards compatible there's a change the deploy will go terribly wrong.
To make sure we can revert back quickly to the old style we are adding a feature flag to enable saving the reports in the legacy style.

The best case scenario is that we won't have to use, but it's no good to be prepare only for the best case scenario.
Thanks @scott-codecov for this good idea
Assets 2
Loading