PageChooserBlock/ImageChooserBlock values go null when restoring from loaddata #2218

mx-moth · 2016-02-10T00:42:10Z

If a database dump is taken using ./manage.py dumpdata > data.json, and then restored to an empty database (after taking care of #2217) using ./manage.py loaddata data.json, StreamField chooser blocks can loose their value.

This behaviour is dependent upon the order of the data from dumpdata, so is not 100% reproducible without some manual intervention. It does happen without manual intervention on occasion though.

My best guess is that StreamFields will validate their data for validity and integrity when they are unserialized. If the page instance a PageChooserBlock references, or the image an ImageChooserBlock references has not been loaded yet, the StreamField will replace the instance ID with null. When the loaddata import is finished, the database is not how it should be, and any ChooserBlocks that were loaded in the wrong order will be empty.

To reproduce:

wagtail start testdata

Add a StreamField with some chooser blocks to the HomePage:

from wagtail.wagtailadmin.edit_handlers import StreamFieldPanel
from wagtail.wagtailcore import blocks
from wagtail.wagtailcore.fields import StreamField
from wagtail.wagtailcore.models import Page
from wagtail.wagtailimages.blocks import ImageChooserBlock


class HomePage(Page):
body = StreamField([
    ('page', blocks.PageChooserBlock()),
    ('image', ImageChooserBlock()),
])

content_panels = Page.content_panels + [
    StreamFieldPanel('body'),
]

./manage.py makemigrations; ./manage.py migrate
Edit the HomePage from the admin, choosing some pages and images in the StreamField.

Dump, flush and load the data:

$ ./manage.py dumpdata > data.json
$ ./manage.py sqlflush | ./manage.py dbshell
$ ./manage.py loaddata data.json

Go back to the admin, or use ./manage.py dbshell, to observe that HomePage.body no longer correctly references the chosen pages and images. Instead, these chooser blocks are blank as if nothing has been chosen, with null in the fields instead of the page/image id.

As mentioned, this behaviour is dependant upon the order of the data dump, which is not consistent across runs. You can manually ensure the incorrect behaviour happens by editing data.json and moving the HomePage instances above the pages/images referenced in the StreamField. The rest of the data loading will work, including foreign keys to instances yet to be created.

The text was updated successfully, but these errors were encountered:

Fixes wagtail#2218

gasman · 2016-11-24T17:15:37Z

I suspect this happens at the point where StreamBlock.get_prep_value is called, to write the StreamField value back to the database - this recursively calls get_prep_value on each child block, constructing the block value from the raw JSONish data in the process (if necessary), which here results in a useless round trip from JSONish data (= page/image ID) to native block value (= page/image instance) and back again.

A possible fix would be to make StreamBlock.get_prep_value aware of lazy StreamValues, to avoid the round trip:

if StreamBlock.get_prep_value receives a StreamValue, call get_prep_value() or (get_raw_stream_data() or some other sensible name) on it
StreamValue.get_prep_value() works as follows, for each block in the stream:
- if this StreamValue is lazy and the item exists in self.stream_data but not self._bound_blocks (ref: Make StreamField mutable (issue #2876) #2886 (comment)), use the stream_data entry as is
- otherwise, take the result of calling get_prep_value on the native block value.

zxchew2014 · 2018-02-21T02:59:22Z

Check how can we solve it, as I also have this issue.
possible to give the situation

Currently lazy StreamFields contain initial JSON data read from the database. When written back to the database, instead of reusing the JSON, it is first converted to Python objects and then converted back to JSON. This not only potentially introduces extraneous database queries but also makes it more difficult for developers to access the raw JSON for e.g. migrations. This change fixes wagtail#2218 where using loaddata to load PageChooserBlock/ImageChooserBlock values in a StreamField may lead to broken references. To verify, follow the instructions in that issue to create a data dump with a page and images, then edit the dump so that the page is loaded before the image. You'll see that page-image references are properly loaded even if the dump order isn't in dependency order. This change includes some slight changes to two API unit tests to get them to pass. In those tests we manually create StreamField data without providing an ID. The test was checking to see that the data retrieved from the API did contain an ID, because the previous logic would add one during the JSON->Python->JSON round trip. This change removes that roundtrip and so the test needs to provide an ID in the test data. Unfortunately this does mean that if anyone is using a similar mechanism to create StreamField data manually, and counting on IDs being added automatically even if they are never accessed outside of the API, this will break. As this kind of creation is unsupported I wasn't sure if I should add some kind of logic to handle this; one potential place to do that would be in wagtail.api.v2.serializers.StreamField: if the field is lazy, make sure that it's fully realized as bound blocks before returning the API representation.

gannetson added a commit to gannetson/wagtail that referenced this issue Jun 15, 2016

Fix ForeignKey selectbox in ChooserBlock

d001cac

Fixes wagtail#2218

gannetson mentioned this issue Jun 15, 2016

Fix ForeignKey selectbox in ChooserBlock #2714

Closed

mvantellingen pushed a commit to mvantellingen/wagtail that referenced this issue Nov 18, 2016

Fix ForeignKey selectbox in ChooserBlock

2da61cf

Fixes wagtail#2218

mvantellingen mentioned this issue Nov 18, 2016

Fix ForeignKey selectbox in ChooserBlock (2) #3169

Closed

gasman added component:Streamfield type:Bug labels Nov 24, 2016

gasman added this to the real-soon-now milestone Nov 24, 2016

gasman mentioned this issue Feb 27, 2018

Streamfield json outputs int pks even when dumpdata uses natural keys #4317

Closed

chosak mentioned this issue Aug 9, 2018

Reuse lazy StreamField values in get_prep_value #4727

Closed

gasman added a commit to gasman/wagtail that referenced this issue Oct 5, 2018

Reuse lazy StreamField values in get_prep_value. Fixes wagtail#2218

181304a

gasman mentioned this issue Oct 5, 2018

Reuse lazy StreamField values in get_prep_value. Fixes #2218 #4801

Closed

gasman closed this as completed in dc0ae76 Oct 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PageChooserBlock/ImageChooserBlock values go null when restoring from loaddata #2218

PageChooserBlock/ImageChooserBlock values go null when restoring from loaddata #2218

mx-moth commented Feb 10, 2016

gasman commented Nov 24, 2016

zxchew2014 commented Feb 21, 2018

PageChooserBlock/ImageChooserBlock values go null when restoring from loaddata #2218

PageChooserBlock/ImageChooserBlock values go null when restoring from loaddata #2218

Comments

mx-moth commented Feb 10, 2016

gasman commented Nov 24, 2016

zxchew2014 commented Feb 21, 2018