Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace data_json TextField with data JSONField in BaseLogEntry #8021

Merged
merged 1 commit into from
Feb 28, 2022

Conversation

laymonage
Copy link
Member

@laymonage laymonage commented Feb 22, 2022

Fixes #6942.

Since we only support Django 3.2+, we can now replace fields that hold JSON data using TextField with the cross-database JSONField from Django 3.1.

Not sure if this should be a breaking change, though. The public API mostly stays the same, as users are likely to use the previously @cached_property data in order to access the deserialized data. The log_action() method also accepts data instead of data_json, so this change would mostly be unnoticable. Unless there are users that access data_json directly...

More docs would probably needed. Let me know where to add more info.

The next one would be wagtailcore.PageRevision.content_json. I tried to include it in this PR, but it seems trickier than I expected as it requires some changes to django-modelcluster.

@@ -978,9 +978,9 @@ Database fields

A foreign key to the user that triggered the action.

.. attribute:: data_json
.. attribute:: data
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably need to add a note here somewhere to indicate the name change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this would be good - you could pair it with a .. versionchanged:: version to let people know about the details of the change (see https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

class Migration(migrations.Migration):

dependencies = [
('wagtailcore', '0067_alter_modellogentry_data_json_and_more'),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can squash the migrations if desired.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think combining the migrations would be good.

@@ -100,7 +99,7 @@ def log_action(self, instance, action, **kwargs):
% (instance,)
)

data = kwargs.pop("data", "")
data = kwargs.pop("data", {}) or {}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just in case the user uses "", [], or other falsy values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kwargs.pop("data", {}), if data isn't present, will return {} and then immediately get overwritten because {} is also an empty value? Nitpick, but I think None might be clearer here - also, I suggest adding a comment to the effect that this is to prevent "" due to the old TextField type, since usually we wouldn't need to be quite so defensive.

@zerolab
Copy link
Contributor

zerolab commented Feb 22, 2022

Note: this fixes #6942
@laymonage got time to submit another one for #5280 while you're in this space?

@laymonage
Copy link
Member Author

laymonage commented Feb 22, 2022

Note: this fixes #6942 @laymonage got time to submit another one for #5280 while you're in this space?

Thanks. I'll take a look into it, but it's likely to be trickier.

Copy link
Member

@jacobtoppm jacobtoppm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few little tweaks and testing on postgres - generally this is looking great!

Maybe also worth a brief note in upgrade considerations for the new release (along with the other PR)

@@ -978,9 +978,9 @@ Database fields

A foreign key to the user that triggered the action.

.. attribute:: data_json
.. attribute:: data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this would be good - you could pair it with a .. versionchanged:: version to let people know about the details of the change (see https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html)

class Migration(migrations.Migration):

dependencies = [
('wagtailcore', '0067_alter_modellogentry_data_json_and_more'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think combining the migrations would be good.

@@ -100,7 +99,7 @@ def log_action(self, instance, action, **kwargs):
% (instance,)
)

data = kwargs.pop("data", "")
data = kwargs.pop("data", {}) or {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kwargs.pop("data", {}), if data isn't present, will return {} and then immediately get overwritten because {} is also an empty value? Nitpick, but I think None might be clearer here - also, I suggest adding a comment to the effect that this is to prevent "" due to the old TextField type, since usually we wouldn't need to be quite so defensive.

@laymonage
Copy link
Member Author

laymonage commented Feb 23, 2022

Testing this PR on wagtail-bakery using docker-wagtail-develop:

Before

image

image

image

After

image

image

I can query with JSON-specific operators on psql:

image

I can also query with JSONField-specific path lookup, and the model field type will be dict once retrieved:

image

Potential problem

We use "" (an empty string) as the default :

def log_page_action(self, action, revision, has_content_changes):
PageLogEntry.objects.log_action(
instance=revision.page.specific,
action=action,
data="",
revision=None if action == "wagtail.create" else revision,
user=revision.user,
timestamp=revision.created_at,
content_changed=has_content_changes,
)

data = kwargs.pop("data", "")

When the data is migrated to JSONField, this empty value will still be stored as an empty string (JSON empty string, to be exact):

image

This still works fine on PostgreSQL, but it breaks on Oracle, which does not support storing scalar values on the top-level of a JSON column (it can only store JSON object or array). See https://github.com/django/django/blob/7fd2deb3e86a021b3108027d786d45657243532c/django/db/backends/oracle/features.py#L66

Possible fix

Create a migration that changes all "" into {} in data_json prior to the AlterField operations. (Done)

@laymonage laymonage force-pushed the use-jsonfield-logentry branch 3 times, most recently from 9f52875 to 43bc490 Compare February 24, 2022 05:32
@squash-labs
Copy link

squash-labs bot commented Feb 24, 2022

Manage this branch in Squash

Test this branch here: https://laymonageuse-jsonfield-logentr-qjkom.squash.io

@laymonage laymonage force-pushed the use-jsonfield-logentry branch 5 times, most recently from 2eecc02 to ef86147 Compare February 25, 2022 12:37
Copy link
Member

@jacobtoppm jacobtoppm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @laymonage!

@jacobtoppm jacobtoppm merged commit bf8d5b3 into wagtail:main Feb 28, 2022
@Scotchester
Copy link
Contributor

Hi @laymonage,

I want to report something that just happened to me that may be related to this PR.

I was attempting to dumpdata to back up a production site and loaddata to get its data into a local dev site and I ran into an issue loading a PageLogEntry:

db_1                    | 2022-08-30 14:11:49.637 UTC [294] ERROR:  null value in column "data" violates not-null constraint
db_1                    | 2022-08-30 14:11:49.637 UTC [294] DETAIL:  Failing row contains (2, Services, wagtail.publish, null, 2022-04-11 19:15:26.045+00, t, f, 22, 4, 1, 1, 49cef16c-4ef3-4492-a03e-58a59e3637da).
db_1                    | 2022-08-30 14:11:49.637 UTC [294] STATEMENT:  INSERT INTO "wagtailcore_pagelogentry" ("id", "content_type_id", "label", "action", "data", "timestamp", "uuid", "user_id", "content_changed", "deleted", "page_id", "revision_id") VALUES (2, 22, 'Services', 'wagtail.publish', NULL, '2022-04-11T19:15:26.045000+00:00'::timestamptz, '49cef16c-4ef3-4492-a03e-58a59e3637da'::uuid, 1, true, false, 4, 1) RETURNING "wagtailcore_pagelogentry"."id"
cms_1                   | Traceback (most recent call last):
cms_1                   |   File "/home/cms/.local/lib/python3.8/site-packages/django/db/backends/utils.py", line 89, in _execute
cms_1                   |     return self.cursor.execute(sql, params)
cms_1                   | psycopg2.errors.NotNullViolation: null value in column "data" violates not-null constraint
cms_1                   | DETAIL:  Failing row contains (2, Services, wagtail.publish, null, 2022-04-11 19:15:26.045+00, t, f, 22, 4, 1, 1, 49cef16c-4ef3-4492-a03e-58a59e3637da).

Here is the relevant snipped of the dumped JSON:

{
  "model": "wagtailcore.pagelogentry",
  "pk": 2,
  "fields": {
    "content_type": [
      "standard_page",
      "standardpage"
    ],
    "label": "Services",
    "action": "wagtail.publish",
    "data": null,
    "timestamp": "2022-04-11T19:15:26.045Z",
    "uuid": "49cef16c-4ef3-4492-a03e-58a59e3637da",
    "user": [
      "kmrichar"
    ],
    "content_changed": true,
    "deleted": false,
    "page": 4,
    "revision": 1
  }
},

And if I directly access the production DB, the data field has no value:

>>> PageLogEntry.objects.get(pk=2).data
>>> PageLogEntry.objects.get(pk=2).data == None
True

Potentially relevant details:

  • This log entry occurred in Wagtail 2.15.3 and am now trying to do this dump/load on 3.0.1, so I wondered if it might be due to the changes in this PR.
  • The page that the log entry was associated with has since been deleted (we were still on 2.15.3 when it was deleted), so I wondered if it may be unrelated to this PR and just related to how PageLogEntry handles the situation where it's referring to a deleted page, but looking at the definitions of PageLogEntry and BaseLogEntry, I don't see anything that would modify data on delete.

It seems like the data field was stored as an empty string "" originally, and should have been converted to an empty dict in the migration, but it doesn't seem to have worked.

Any idea what might be happening here?

@laymonage
Copy link
Member Author

Hey @Scotchester, I'm not too sure about it but I'm guessing that for some reason the log entry was created with an explicit data_json=None, which is saved on the database as 'null' back when the field was still a TextField. The migration only replaces JSON empty strings on the database ('""') to JSON empty objects ('{}'), so the 'null' value was left as-is.

After the upgrade, I assume doing dumpdata would mean that:

When doing loaddata, the null is interpreted as an SQL NULL by Django. As the data field doesn't have null=True, it crashes with your given output.


Do you have somewhere in your code that creates a log entry with data_json=None prior to the upgrade?

@laymonage
Copy link
Member Author

Almost a year later, but I think I've found the cause of the above issue when investigating a potential problem in #9590. Apparently the wagtail.publish log entries created before this PR (pre-Wagtail 3.0) might have their data value set to the JSON null as a result of this line:

Any new logs created since then will always normalise to an empty JSON object {} as a result of this line:

data = kwargs.pop("data", None) or {}

The 0068_log_entry_empty_object migration unfortunately only handles the other case where the empty data was stored as an empty string. So, some of the old log entries will still have the JSON null as its value. I'm not sure if we should have another migration to handle this, given that it's been over a year and we haven't heard of such issues. However, I'm sure people will run into this when they run the migrations in #9590. We could fix the migrations in #9590, or add a new migration before the ones in that PR to fix this, or just do both.

laymonage added a commit to laymonage/wagtail that referenced this pull request Jul 31, 2023
Old logs may have the data set to JSON `null` instead of an empty JSON
object `{}`.

See wagtail#8021 (comment)
and wagtail#9590 (comment)
for more details.
laymonage added a commit to laymonage/wagtail that referenced this pull request Jul 31, 2023
Old logs may have the data set to JSON `null` instead of an empty JSON
object `{}`.

See wagtail#8021 (comment)
and wagtail#9590 (comment)
for more details.
laymonage added a commit to laymonage/wagtail that referenced this pull request Jul 31, 2023
Old logs may have the data set to JSON `null` instead of an empty JSON
object `{}`.

See wagtail#8021 (comment)
and wagtail#9590 (comment)
for more details.
laymonage added a commit that referenced this pull request Jul 31, 2023
Old logs may have the data set to JSON `null` instead of an empty JSON
object `{}`.

See #8021 (comment)
and #9590 (comment)
for more details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JSONField for BaseLogEntry data_json
5 participants