Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for the new PAE JSON format #1

Merged
merged 1 commit into from Jul 28, 2022
Merged

Add support for the new PAE JSON format #1

merged 1 commit into from Jul 28, 2022

Conversation

Augustin-Zidek
Copy link
Contributor

  • Adds support for the new PAE JSON format while also keeping backwards compatibility with the legacy format.
  • The new format makes the PAE JSON about 4x smaller when stored compressed, even smaller when stored uncompressed (we don't store redundant residue indices and the predicted aligned error is rounded to integers).
  • The new format parses about 3x faster.
  • I kept the dtype of the matrix as np.float64 to not break any existing code, but since PAE is now integers, np.int32, np.float32 or even np.float16 could be used is RAM usage is too high.

* Adds support for the new PAE JSON format while also keeping backwards compatibility with the legacy format.
* The new format makes the PAE JSON about 4x smaller when stored compressed, even smaller when stored uncompressed (we don't store redundant residue indices and the predicted aligned error is rounded to integers).
* The new format parses about 3x faster.
* I kept the `dtype` of the matrix as `np.float64` to not break any existing code, but since PAE is now integers, `np.int32`, `np.float32` or even `np.float16` could be used is RAM usage is too high.
@tristanic
Copy link
Owner

Thanks!

Will merge this... but can you tell me a little more about this?

and the predicted aligned error is rounded to integers

As written, that's a bit worrisome for a few different reasons. First is the 5-fold loss of precision (previously I believe it was reported in steps of 0.2 Angstroms, now just 1 A steps). Second, PAE values less than 0.5 would presumably round down to zero, causing potential divide-by-zero issues for any code that wants to use 1/(PAE). You could get the same level of compression without loss of precision or introduction of zeros by storing the data as int(round(pae*10))... is that a possibility?

@tristanic tristanic merged commit f407c60 into tristanic:main Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants