Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements pickle support for Page models #5998

Closed
wants to merge 3 commits into from
Closed

Implements pickle support for Page models #5998

wants to merge 3 commits into from

Conversation

pySilver
Copy link
Contributor

@pySilver pySilver commented May 1, 2020

Hi!

This PR misses tests but it will be nice if some maintainer review and maybe adopt this changes into the project.

My PR fixes broken Page model pickle support (django models are pickleable by default). So with this patch you can use popular caching solutions such as beloved django-cacheops (or any other cache battery that needs to cache querysets and/or model instances).

Problem was originally mentioned in #1988

@squash-labs
Copy link

squash-labs bot commented May 1, 2020

Manage this branch in Squash

Test this branch here: https://pysilverpickle-streamfield-5fefm.squash.io

@pySilver
Copy link
Contributor Author

pySilver commented May 4, 2020

Simplest test case would be smth like this:

In [1]: import pickle

In [2]: m = MerchantIndexPage.objects.first()

In [3]: pickle.dumps(m)
Out[3]: b'\x80\x04\x95\xba\x03\x00\x00\x00\x00\x00\x00\x8c\x15django.db.models.base\x94\x8c\x0emodel_unpickle\x94\x93\x94\x8c\tmerchants\x94\x8c\x11MerchantIndexPage\x94\x86\x94\x85\x94R\x94}\x94(\x8c\x06_state\x94h\x00\x8c\nModelState\x94\x93\x94)\x81\x94}\x94(\x8c\x0cfields_cache\x94}\x94\x8c\x06adding\x94\x89\x8c\x02db\x94\x8c\x07default\x94ub\x8c\x02id\x94J\xee\x10\x01\x00\x8c\x04path\x94\x8c\x0c00010002001M\x94\x8c\x05depth\x94K\x03\x8c\x08numchild\x94K\x00\x8c\x05title\x94\x8c\x15Twoje ulubione sklepy\x94\x8c\x0bdraft_title\x94\x8c\x15Twoje ulubione sklepy\x94\x8c\x04slug\x94\x8c\x06sklepy\x94\x8c\x0fcontent_type_id\x94K<\x8c\x04live\x94\x88\x8c\x17has_unpublished_changes\x94\x89\x8c\x08url_path\x94\x8c\x0b/pl/sklepy/\x94\x8c\x08owner_id\x94K\x01\x8c\tseo_title\x94\x8c\x00\x94\x8c\rshow_in_menus\x94\x89\x8c\x12search_description\x94h%\x8c\ngo_live_at\x94N\x8c\texpire_at\x94N\x8c\x07expired\x94\x89\x8c\x06locked\x94\x89\x8c\tlocked_at\x94N\x8c\x0clocked_by_id\x94N\x8c\x12first_published_at\x94\x8c\x08datetime\x94\x8c\x08datetime\x94\x93\x94C\n\x07\xe4\x02\x13\x01\x01\x07\x02O\xdd\x94\x8c\x04pytz\x94\x8c\x04_UTC\x94\x93\x94)R\x94\x86\x94R\x94\x8c\x11last_published_at\x94h1C\n\x07\xe4\x02\x13\x01\x01\x07\x02O\xdd\x94h6\x86\x94R\x94\x8c\x1alatest_revision_created_at\x94h1C\n\x07\xe4\x02\x13\x01\x01\x07\x01\x81I\x94h6\x86\x94R\x94\x8c\x10live_revision_id\x94M\x12\x03\x8c\x0bpage_ptr_id\x94J\xee\x10\x01\x00\x8c\nmenu_title\x94h%\x8c\x0fsocial_image_id\x94N\x8c\x1f_wagtail_cached_site_root_paths\x94]\x94(K\x02\x8c\x0f/your-location/\x94\x8c\x15http://localhost:8000\x94\x87\x94K\x03\x8c\x04/pl/\x94\x8c\x18http://localhost:8000/pl\x94\x87\x94e\x8c\r_relative_url\x94\x8c\x0b/pl/sklepy/\x94\x8c\x0f_django_version\x94\x8c\x053.0.5\x94ub.'

In [4]: p = pickle.dumps(m)

In [5]: p
Out[5]: b'\x80\x04\x95\xba\x03\x00\x00\x00\x00\x00\x00\x8c\x15django.db.models.base\x94\x8c\x0emodel_unpickle\x94\x93\x94\x8c\tmerchants\x94\x8c\x11MerchantIndexPage\x94\x86\x94\x85\x94R\x94}\x94(\x8c\x06_state\x94h\x00\x8c\nModelState\x94\x93\x94)\x81\x94}\x94(\x8c\x0cfields_cache\x94}\x94\x8c\x06adding\x94\x89\x8c\x02db\x94\x8c\x07default\x94ub\x8c\x02id\x94J\xee\x10\x01\x00\x8c\x04path\x94\x8c\x0c00010002001M\x94\x8c\x05depth\x94K\x03\x8c\x08numchild\x94K\x00\x8c\x05title\x94\x8c\x15Twoje ulubione sklepy\x94\x8c\x0bdraft_title\x94\x8c\x15Twoje ulubione sklepy\x94\x8c\x04slug\x94\x8c\x06sklepy\x94\x8c\x0fcontent_type_id\x94K<\x8c\x04live\x94\x88\x8c\x17has_unpublished_changes\x94\x89\x8c\x08url_path\x94\x8c\x0b/pl/sklepy/\x94\x8c\x08owner_id\x94K\x01\x8c\tseo_title\x94\x8c\x00\x94\x8c\rshow_in_menus\x94\x89\x8c\x12search_description\x94h%\x8c\ngo_live_at\x94N\x8c\texpire_at\x94N\x8c\x07expired\x94\x89\x8c\x06locked\x94\x89\x8c\tlocked_at\x94N\x8c\x0clocked_by_id\x94N\x8c\x12first_published_at\x94\x8c\x08datetime\x94\x8c\x08datetime\x94\x93\x94C\n\x07\xe4\x02\x13\x01\x01\x07\x02O\xdd\x94\x8c\x04pytz\x94\x8c\x04_UTC\x94\x93\x94)R\x94\x86\x94R\x94\x8c\x11last_published_at\x94h1C\n\x07\xe4\x02\x13\x01\x01\x07\x02O\xdd\x94h6\x86\x94R\x94\x8c\x1alatest_revision_created_at\x94h1C\n\x07\xe4\x02\x13\x01\x01\x07\x01\x81I\x94h6\x86\x94R\x94\x8c\x10live_revision_id\x94M\x12\x03\x8c\x0bpage_ptr_id\x94J\xee\x10\x01\x00\x8c\nmenu_title\x94h%\x8c\x0fsocial_image_id\x94N\x8c\x1f_wagtail_cached_site_root_paths\x94]\x94(K\x02\x8c\x0f/your-location/\x94\x8c\x15http://localhost:8000\x94\x87\x94K\x03\x8c\x04/pl/\x94\x8c\x18http://localhost:8000/pl\x94\x87\x94e\x8c\r_relative_url\x94\x8c\x0b/pl/sklepy/\x94\x8c\x0f_django_version\x94\x8c\x053.0.5\x94ub.'

In [6]: m2 = pickle.loads(p)

In [7]: m2
Out[7]: <MerchantIndexPage: Twoje ulubione sklepy>

In [8]: m2 == m
Out[8]: True

@lb-
Copy link
Member

lb- commented May 10, 2020

@pySilver - just checking, is this a stream field problem only or all fields in general. The issue referenced and the code changes look like stream fixes only, but the title says page models.

Is the intent of this PR for stream fields only? Or are there problems with other field types?

@pySilver
Copy link
Contributor Author

@lb its more broader problem than just StreamField. Basically anything that extends Block wouldn't work without changes I've made in Block class. StreamField in fact is a different story since it's value is converted into StreamValue that needs to have an instruction how to (un)pickle.

@pySilver
Copy link
Contributor Author

@lb- have you had any time to look into that?

@lb-
Copy link
Member

lb- commented May 13, 2020

@pySilver thanks for your patience, we had a bit of discussion about this within the core team.
Everyone agrees it would be great to have pickling support of StreamField values, but we need to ensure that the fixes that get into Wagtail cover pickle & unpickle scenarios plus unit tests (to ensure this does not break in the future).

The important thing there will be testing to ensure everything works as expected both before sending an object to and retrieving the object from the cache.

Are you ok to continue work on this and cover both pickle/unpicke scenarios + add unit tests, including for some more complex scenarios such as custom StructBlock + ListBlock usage?

In general though, this appears to be a great start for this PR and a good direction to head in.

@pySilver
Copy link
Contributor Author

@lb- thanks for feedback. Ok, I'll do my best to create some tests, since it's kinda simple. I'll try complete it sometime next week

@lb-
Copy link
Member

lb- commented May 14, 2020

Thanks

@pySilver
Copy link
Contributor Author

pySilver commented Jun 25, 2020

@lauantai

Wow. This was long. I've discovered this is not a fully correct approach. The problem is, it would recreate very expensive block objects many many times during unpickling, so it kills the idea pickling for cache. I've digged into Django sources once again here django.db.models.fields.Field.__reduce__ and it turns out they load field directly from a model definition during unpickling, so we need to do the same.

All the fuss is about the StreamField returning StremValue in it's to_python method. Thats why original django code is not working here (because StreamValue does not have __reduce__ method, while all objects inherited from django model Field has.

Making it work look a little bit hackish and I'd love to see your opinion and ideas on that. Basically we need to tie model field with StreamValue returned from StreamField.to_python and create StreamValue.__reduce__ method that would be used during unpickling.

The demo code goes below:

import json

from django.db.models.fields import _load_field

from wagtail.core.blocks.stream_block import StreamValue
from wagtail.core.fields import StreamField


def stream_value_loader(app_label, model_name, field_name, field_value) -> StreamValue:
    """Returns StreamValue from pickled data"""
    stream_field = _load_field(app_label, model_name, field_name)
    stream_value = stream_field.stream_block.to_python(field_value)
    stream_value._field = stream_field
    return stream_value


def stream_value_reducer(self) -> tuple:
    """Reducer to make this class pickleable."""
    return (
        stream_value_loader,
        (
            self._field.model._meta.app_label,
            self._field.model._meta.object_name,
            self._field.name,
            self.get_prep_value(),
        ),
    )


def stream_field_to_python(self, value):
    """
    Patch for StreamField.to_python

    * Adds StreamValue._field reference to associated StreamField so we can unpickle later.
    """
    if value is None or value == "":
        ret = StreamValue(self.stream_block, [])
        ret._field = self
        return ret
    elif isinstance(value, StreamValue):
        ret = value
        ret._field = self
        return ret
    elif isinstance(value, str):
        try:
            unpacked_value = json.loads(value)
        except ValueError:
            # value is not valid JSON; most likely, this field was previously a
            # rich text field before being migrated to StreamField, and the data
            # was left intact in the migration. Return an empty stream instead
            # (but keep the raw text available as an attribute, so that it can be
            # used to migrate that data to StreamField)
            ret = StreamValue(self.stream_block, [], raw_text=value)
            ret._field = self
            return ret

        if unpacked_value is None:
            # we get here if value is the literal string 'null'. This should probably
            # never happen if the rest of the (de)serialization code is working properly,
            # but better to handle it just in case...
            ret = StreamValue(self.stream_block, [])
            ret._field = self
            return ret

        ret = self.stream_block.to_python(unpacked_value)
        ret._field = self
        return ret
    else:
        # See if it looks like the standard non-smart representation of a
        # StreamField value: a list of (block_name, value) tuples
        try:
            [None for (x, y) in value]
        except (TypeError, ValueError):
            # Give up trying to make sense of the value
            raise TypeError(
                "Cannot handle %r (type %r) as a value of StreamField"
                % (value, type(value))
            )

        # Test succeeded, so return as a StreamValue-ified version of that value
        ret = StreamValue(self.stream_block, value)
        ret._field = self
        return ret


def patch_blocks():
    """Patches Wagtail StreamField & StreamValue so it can be pickled"""
    StreamValue.__reduce__ = stream_value_reducer
    StreamField.to_python = stream_field_to_python

To make it more resilient, StreamValue.__reduce__ should also check for absence of a _field but I'd left it for simplicity. Anyway making absolutely any type of Block pickleable would also require adding changes from the original pull request, but I'm not sure its something we want. I'm solving my use case here – caching models, not pickling anything in Wagtail.

@pySilver
Copy link
Contributor Author

pySilver commented Jul 6, 2020

@lb- ping :)

@lb-
Copy link
Member

lb- commented Jul 6, 2020

haha thanks for the ping @pySilver - I will try to take another look at this soon.

I am not the best member of the team to dig into this fully though, I will flag with the core team and see who else can take a look also.

@pySilver
Copy link
Contributor Author

(code comment updated)

@pySilver
Copy link
Contributor Author

Btw, latest 2.10.1 breaks pickling for BaseSettings by adding page_url attribute that cannot be serialized.

@zerolab zerolab mentioned this pull request Nov 9, 2020
Base automatically changed from master to main March 3, 2021 17:08
@pySilver pySilver closed this by deleting the head repository Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants