Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PageChooserBlock/ImageChooserBlock values go null when restoring from loaddata #2218

Closed
mx-moth opened this issue Feb 10, 2016 · 2 comments
Closed

Comments

@mx-moth
Copy link
Contributor

mx-moth commented Feb 10, 2016

If a database dump is taken using ./manage.py dumpdata > data.json, and then restored to an empty database (after taking care of #2217) using ./manage.py loaddata data.json, StreamField chooser blocks can loose their value.

This behaviour is dependent upon the order of the data from dumpdata, so is not 100% reproducible without some manual intervention. It does happen without manual intervention on occasion though.

My best guess is that StreamFields will validate their data for validity and integrity when they are unserialized. If the page instance a PageChooserBlock references, or the image an ImageChooserBlock references has not been loaded yet, the StreamField will replace the instance ID with null. When the loaddata import is finished, the database is not how it should be, and any ChooserBlocks that were loaded in the wrong order will be empty.

To reproduce:

  • wagtail start testdata

  • Add a StreamField with some chooser blocks to the HomePage:

    from wagtail.wagtailadmin.edit_handlers import StreamFieldPanel
    from wagtail.wagtailcore import blocks
    from wagtail.wagtailcore.fields import StreamField
    from wagtail.wagtailcore.models import Page
    from wagtail.wagtailimages.blocks import ImageChooserBlock
    
    
    class HomePage(Page):
    body = StreamField([
        ('page', blocks.PageChooserBlock()),
        ('image', ImageChooserBlock()),
    ])
    
    content_panels = Page.content_panels + [
        StreamFieldPanel('body'),
    ]
  • ./manage.py makemigrations; ./manage.py migrate

  • Edit the HomePage from the admin, choosing some pages and images in the StreamField.

  • Dump, flush and load the data:

    $ ./manage.py dumpdata > data.json
    $ ./manage.py sqlflush | ./manage.py dbshell
    $ ./manage.py loaddata data.json
  • Go back to the admin, or use ./manage.py dbshell, to observe that HomePage.body no longer correctly references the chosen pages and images. Instead, these chooser blocks are blank as if nothing has been chosen, with null in the fields instead of the page/image id.

As mentioned, this behaviour is dependant upon the order of the data dump, which is not consistent across runs. You can manually ensure the incorrect behaviour happens by editing data.json and moving the HomePage instances above the pages/images referenced in the StreamField. The rest of the data loading will work, including foreign keys to instances yet to be created.

@gasman
Copy link
Collaborator

gasman commented Nov 24, 2016

I suspect this happens at the point where StreamBlock.get_prep_value is called, to write the StreamField value back to the database - this recursively calls get_prep_value on each child block, constructing the block value from the raw JSONish data in the process (if necessary), which here results in a useless round trip from JSONish data (= page/image ID) to native block value (= page/image instance) and back again.

A possible fix would be to make StreamBlock.get_prep_value aware of lazy StreamValues, to avoid the round trip:

  • if StreamBlock.get_prep_value receives a StreamValue, call get_prep_value() or (get_raw_stream_data() or some other sensible name) on it
  • StreamValue.get_prep_value() works as follows, for each block in the stream:

@zxchew2014
Copy link

Check how can we solve it, as I also have this issue.
possible to give the situation

chosak added a commit to chosak/wagtail that referenced this issue Aug 9, 2018
Currently lazy StreamFields contain initial JSON data read from the
database. When written back to the database, instead of reusing the
JSON, it is first converted to Python objects and then converted back to
JSON. This not only potentially introduces extraneous database queries
but also makes it more difficult for developers to access the raw JSON
for e.g. migrations.

This change fixes wagtail#2218 where using loaddata to load
PageChooserBlock/ImageChooserBlock values in a StreamField may lead
to broken references. To verify, follow the instructions in that issue
to create a data dump with a page and images, then edit the dump so that
the page is loaded before the image. You'll see that page-image
references are properly loaded even if the dump order isn't in
dependency order.

This change includes some slight changes to two API unit tests to get
them to pass. In those tests we manually create StreamField data without
providing an ID. The test was checking to see that the data retrieved
from the API did contain an ID, because the previous logic would add one
during the JSON->Python->JSON round trip. This change removes that
roundtrip and so the test needs to provide an ID in the test data.

Unfortunately this does mean that if anyone is using a similar mechanism
to create StreamField data manually, and counting on IDs being added
automatically even if they are never accessed outside of the API, this
will break. As this kind of creation is unsupported I wasn't sure if I
should add some kind of logic to handle this; one potential place to do
that would be in wagtail.api.v2.serializers.StreamField: if the field is
lazy, make sure that it's fully realized as bound blocks before
returning the API representation.
chosak added a commit to chosak/wagtail that referenced this issue Aug 9, 2018
Currently lazy StreamFields contain initial JSON data read from the
database. When written back to the database, instead of reusing the
JSON, it is first converted to Python objects and then converted back to
JSON. This not only potentially introduces extraneous database queries
but also makes it more difficult for developers to access the raw JSON
for e.g. migrations.

This change fixes wagtail#2218 where using loaddata to load
PageChooserBlock/ImageChooserBlock values in a StreamField may lead
to broken references. To verify, follow the instructions in that issue
to create a data dump with a page and images, then edit the dump so that
the page is loaded before the image. You'll see that page-image
references are properly loaded even if the dump order isn't in
dependency order.

This change includes some slight changes to two API unit tests to get
them to pass. In those tests we manually create StreamField data without
providing an ID. The test was checking to see that the data retrieved
from the API did contain an ID, because the previous logic would add one
during the JSON->Python->JSON round trip. This change removes that
roundtrip and so the test needs to provide an ID in the test data.

Unfortunately this does mean that if anyone is using a similar mechanism
to create StreamField data manually, and counting on IDs being added
automatically even if they are never accessed outside of the API, this
will break. As this kind of creation is unsupported I wasn't sure if I
should add some kind of logic to handle this; one potential place to do
that would be in wagtail.api.v2.serializers.StreamField: if the field is
lazy, make sure that it's fully realized as bound blocks before
returning the API representation.
gasman added a commit to gasman/wagtail that referenced this issue Oct 5, 2018
@gasman gasman closed this as completed in dc0ae76 Oct 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment