New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
import: Fix rendered_content in imported messages. #10258
Conversation
Hello @zulip/server-misc members, this pull request was labeled with the "area: export/import" label, so you may want to check it out! |
edf5064
to
fe1ed61
Compare
@@ -861,6 +872,7 @@ def import_message_data(import_dir: Path) -> None: | |||
|
|||
re_map_foreign_keys(data, 'zerver_message', 'id', related_table='message', id_field=True) | |||
bulk_import_model(data, Message) | |||
fix_message_rendered_content(data, 'zerver_message') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want something like this. The reason is that in production use, it's quite annoying to have the import crash here (because the tool isn't designed to be rerun without starting from the beginning), whereas it's only mildly annoying to have a couple messages that didn't markdown-render at first.
re_map_foreign_keys(data, 'zerver_message', 'id', related_table='message', id_field=True)
bulk_import_model(data, Message)
- fix_message_rendered_content(data, 'zerver_message')
+ try:
+ fix_message_rendered_content(data, 'zerver_message')
+ except Exception as e:
+ logging.warning("Error in markdown rendering for message ID %s; continuing" % (<message_id>))
Needs some cleanup. And then I think we can have a tool that tries to render any un-rendered messages that one can run afterwards.
fe1ed61
to
8c8240c
Compare
After the messages have been imported, set the rendered_content of the messages instead of leaving its value to be 'None'. Fixes zulip#9168
@timabbott I have updated this |
I tweaked the logging code to print something to show where the time goes for markdown-rendering, and it's definitely in total a huge fraction of the import time:
I'm going to merge this, because correctness is more important than speed, but we should probably look at some combination of (1) optimizing it (I'd be willing to bet that we re-query a lot of data unnecessarily with the import process) and/or (2) doing the markdown rendering in a separate parallel phase after we finish importing all the messages. May be worth starting with (1), though; I know |
Merged as 2630011. After doing some code reading, the performance piece looks kinda annoying to optimize. I'll open an issue, which we can tackle a bit later. |
(Huge thanks for fixing this!) |
Thank you for the review! |
After the messages have been imported, set the rendered_content of the messages instead of
leaving its value to be
None
.Fixes #9168