Add support for the new PAE JSON format #1

Augustin-Zidek · 2022-07-28T10:57:23Z

Adds support for the new PAE JSON format while also keeping backwards compatibility with the legacy format.
The new format makes the PAE JSON about 4x smaller when stored compressed, even smaller when stored uncompressed (we don't store redundant residue indices and the predicted aligned error is rounded to integers).
The new format parses about 3x faster.
I kept the dtype of the matrix as np.float64 to not break any existing code, but since PAE is now integers, np.int32, np.float32 or even np.float16 could be used is RAM usage is too high.

* Adds support for the new PAE JSON format while also keeping backwards compatibility with the legacy format. * The new format makes the PAE JSON about 4x smaller when stored compressed, even smaller when stored uncompressed (we don't store redundant residue indices and the predicted aligned error is rounded to integers). * The new format parses about 3x faster. * I kept the `dtype` of the matrix as `np.float64` to not break any existing code, but since PAE is now integers, `np.int32`, `np.float32` or even `np.float16` could be used is RAM usage is too high.

tristanic · 2022-07-28T11:13:51Z

Thanks!

Will merge this... but can you tell me a little more about this?

and the predicted aligned error is rounded to integers

As written, that's a bit worrisome for a few different reasons. First is the 5-fold loss of precision (previously I believe it was reported in steps of 0.2 Angstroms, now just 1 A steps). Second, PAE values less than 0.5 would presumably round down to zero, causing potential divide-by-zero issues for any code that wants to use 1/(PAE). You could get the same level of compression without loss of precision or introduction of zeros by storing the data as int(round(pae*10))... is that a possibility?

tristanic merged commit f407c60 into tristanic:main Jul 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for the new PAE JSON format #1

Add support for the new PAE JSON format #1

Augustin-Zidek commented Jul 28, 2022

tristanic commented Jul 28, 2022

Add support for the new PAE JSON format #1

Add support for the new PAE JSON format #1

Conversation

Augustin-Zidek commented Jul 28, 2022

tristanic commented Jul 28, 2022