Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import: Fix rendered_content in imported messages. #10258

Closed
wants to merge 1 commit into from

Conversation

rheaparekh
Copy link
Collaborator

@rheaparekh rheaparekh commented Aug 9, 2018

After the messages have been imported, set the rendered_content of the messages instead of
leaving its value to be None.

Fixes #9168

@zulipbot
Copy link
Member

zulipbot commented Aug 9, 2018

Hello @zulip/server-misc members, this pull request was labeled with the "area: export/import" label, so you may want to check it out!

@rheaparekh rheaparekh force-pushed the import_message_search branch 2 times, most recently from edf5064 to fe1ed61 Compare August 9, 2018 17:59
@@ -861,6 +872,7 @@ def import_message_data(import_dir: Path) -> None:

re_map_foreign_keys(data, 'zerver_message', 'id', related_table='message', id_field=True)
bulk_import_model(data, Message)
fix_message_rendered_content(data, 'zerver_message')
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want something like this. The reason is that in production use, it's quite annoying to have the import crash here (because the tool isn't designed to be rerun without starting from the beginning), whereas it's only mildly annoying to have a couple messages that didn't markdown-render at first.

 
         re_map_foreign_keys(data, 'zerver_message', 'id', related_table='message', id_field=True)
         bulk_import_model(data, Message)
-        fix_message_rendered_content(data, 'zerver_message')
+        try:
+            fix_message_rendered_content(data, 'zerver_message')
+        except Exception as e:
+            logging.warning("Error in markdown rendering for message ID %s; continuing" % (<message_id>))

Needs some cleanup. And then I think we can have a tool that tries to render any un-rendered messages that one can run afterwards.

After the messages have been imported, set the
rendered_content of the messages instead of
leaving its value to be 'None'.

Fixes zulip#9168
@rheaparekh
Copy link
Collaborator Author

@timabbott I have updated this

@timabbott
Copy link
Sponsor Member

I tweaked the logging code to print something to show where the time goes for markdown-rendering, and it's definitely in total a huge fraction of the import time:

2018-08-09 22:02:46.447 INFO [] Importing message dump /tmp/slack-export/messages-000001.json
2018-08-09 22:02:46.694 INFO [] Successfully imported <class 'zerver.models.Message'> from zerver_message.
2018-08-09 22:03:02.790 INFO [] Successfully rendered markdown for messages in dump /tmp/slack-export/messages-000001.json
2018-08-09 22:03:03.225 INFO [] Successfully imported <class 'zerver.models.UserMessage'> from zerver_usermessage.
2018-08-09 22:03:03.235 INFO [] Importing message dump /tmp/slack-export/messages-000002.json
2018-08-09 22:03:03.489 INFO [] Successfully imported <class 'zerver.models.Message'> from zerver_message.
2018-08-09 22:03:19.966 INFO [] Successfully rendered markdown for messages in dump /tmp/slack-export/messages-000002.json
2018-08-09 22:03:20.529 INFO [] Successfully imported <class 'zerver.models.UserMessage'> from zerver_usermessage.
2018-08-09 22:03:20.538 INFO [] Importing message dump /tmp/slack-export/messages-000003.json

I'm going to merge this, because correctness is more important than speed, but we should probably look at some combination of (1) optimizing it (I'd be willing to bet that we re-query a lot of data unnecessarily with the import process) and/or (2) doing the markdown rendering in a separate parallel phase after we finish importing all the messages. May be worth starting with (1), though; I know bugdown does a few database queries in its outer layer to get e.g. the set of users in the organization, and we're clearly doing those in a loop here.

@timabbott
Copy link
Sponsor Member

Merged as 2630011.

After doing some code reading, the performance piece looks kinda annoying to optimize. I'll open an issue, which we can tackle a bit later.

@timabbott timabbott closed this Aug 9, 2018
@timabbott
Copy link
Sponsor Member

(Huge thanks for fixing this!)

@timabbott
Copy link
Sponsor Member

See #10264 for the performance follow-up issue (probably a lower-priority than some other projects for now, just because it's likely difficult). While thinking about this, I added 4c4b6d1.

@rheaparekh rheaparekh deleted the import_message_search branch August 9, 2018 23:25
@rheaparekh
Copy link
Collaborator Author

Thank you for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants