Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database: Add database index for private messages. #9753

Closed
wants to merge 6 commits into from

Conversation

shubham-padia
Copy link
Member

@shubham-padia shubham-padia commented Jun 14, 2018

Fixes #6896.

For commit 1:
For set_initial_value_of_is_private_flag, I tried out two approaches for the migration.

The first was :

    UserMessage = apps.get_model("zerver", "UserMessage")
    all_objects = UserMessage.objects.all()

    for obj in all_objects:
        recipient_type = obj.message.recipient.type
        if recipient_type == 1 or recipient_type == 3:
            obj.flags |= UserMessage.flags.is_private
        else:
            obj.flags &= ~UserMessage.flags.is_private
        obj.save(update_fields=['flags'])

And the second one was the currently implemented one.

Although I suspected that the second approach maybe faster, I checked the execution time for both the approaches. The first approach took about ~1.44s for my db and second one about ~0.066s. So I chose to go with the second approach.

For (hah, don't think I can call this testing) testing the migration, I ran

for obj in stream_objects:
    if obj.message.recipient.type != 2 or obj.flags.is_private:
        print('fail')

similarly for private_objects.

For commit 2:
Other commits like 8bb812c also added the sql to the changelog, since there was no changelog file rn, I haven't added it yet.
I've also added the migration to upgrade-zulip-stage-2 script. From what the script stated, this is used to upgrade to a newer version of Zulip and not any specific version, so I added the migration to it like the other commits.

@zulipbot
Copy link
Member

Hello @zulip/server-api, @zulip/server-production members, this pull request was labeled with the "area: api", "area: production" labels, so you may want to check it out!

@shubham-padia
Copy link
Member Author

pinging @timabbott for review. It'd be great to have this reviewed and merged in order to start with adding another flag in #7459.

@timabbott
Copy link
Sponsor Member

@shubham-padia this looks good, except that I'm a bit worried that the migration to add the new flag might be really slow and need to be batched. The only real way to test that is to try it, so I'm going to test-deploy this to chat.zulip.org and see what happens. If it doesn't go well, we'll need to batch it (e.g. by user).

@timabbott
Copy link
Sponsor Member

OK, conclusion was that the migration's runtime was at least 2 minutes on chat.zulip.org, which is way too long for a synchronous migration. So we'll need to do it in batches, using atomic=False, and probably looping over the users (one transaction per user).

@shubham-padia
Copy link
Member Author

Cool! Working on that.

@shubham-padia
Copy link
Member Author

@timabbott I've updated the PR with the requested changes

apps: StateApps, schema_editor: DatabaseSchemaEditor) -> None:
UserMessage = apps.get_model("zerver", "UserMessage")
UserProfile = apps.get_model("zerver", "UserProfile")
user_profiles = UserProfile.objects.all()
Copy link
Contributor

@showell showell Jun 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good overall, but one comment.

It's a simple optimization here to just fetch userids with .values('id'). I know it's not the bottleneck here, but you're not buying much simplicity to pull in the fat user objects for all Nk users.

stream_objects = UserMessage.objects.filter(message__recipient__type = 2,
user_profile = user_profile)
private_objects.update(flags=F('flags').bitor(UserMessage.flags.is_private))
stream_objects.update(flags=F('flags').bitand(~UserMessage.flags.is_private))
Copy link
Contributor

@showell showell Jun 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it would be cheaper to do it like this:

  • clear all flags where is_private = True (probably very few, since the old flag was rare)
  • set all flags for private_objects

This will result in fewer updates and allow you to replace the second query that has to join to Recipient via Message with a simpler query.

@showell
Copy link
Contributor

showell commented Jun 20, 2018

I added a couple comments, both of which are easy-to-implement performance optimizations.

@shubham-padia
Copy link
Member Author

@showell Thanks for the review, I've updated the PR :)

@showell
Copy link
Contributor

showell commented Jun 20, 2018

Thanks @shubham-padia! Your changes address my comments, and overall LGTM, but I'll wait for Tim to merge this, for fairly obvious reasons. 😄

@timabbott
Copy link
Sponsor Member

This seems to be really slow; we end up doing a table scan of zerver_message for each user. I think we want to do it one-recipient-at-a-time instead, since that avoids the table scan issue.

@showell
Copy link
Contributor

showell commented Jun 23, 2018

Why is it table scanning Message when message_id is part of the join? Also aren’t there more PM recipients than users? I suppose we could force all flags to 1 then invert them for UMs attached to stream recipients. I suspect this might just be inherently slow and we should strategize to make the slowness mostly transparent to users?

@timabbott
Copy link
Sponsor Member

I'm currently testing this as the main migration block:

    Recipient = apps.get_model("zerver", "Recipient")
    recipient_ids = Recipient.objects.filter(type__in=[1,3]).values_list("id", flat=True)

    total = len(recipient_ids)
    i = 0
    for recipient_id in recipient_ids:
        i += 1
	print("Processing %s %s/%s" % (recipient_id, i, total))
	sys.stdout.flush()
        with transaction.atomic():
            private_objects = UserMessage.objects.filter(message__recipient_id = recipient_id)
            private_objects.update(flags=F('flags').bitor(UserMessage.flags.is_private))

It also seems fairly slow; this will end up being a fairly expensive migration to execute :/.

@timabbott
Copy link
Sponsor Member

(Regardless, I think that this branch is missing a commit to make the is:private narrow actually use the new index by doing a query on UserMessage?).

@shubham-padia
Copy link
Member Author

Since #6896 only mentioned adding the index, I thought making the narrow use the index was a follow-up issue, I'll add the commit in this PR then.

@timabbott
Copy link
Sponsor Member

The reason it's table-scanning Message is because even though we have an index on Message__recipient_id, when the list of recipient IDs is a few thousand long (as it is), there isn't an efficient way for Postgres to filter Message for the matching rows using the index, so instead it table scans and then filters.

@showell
Copy link
Contributor

showell commented Jun 23, 2018

Why isn’t it starting with UserMessage rows that match the user_profile_id index and then finding the Message in o(1) and filtering on recipient id?

@timabbott
Copy link
Sponsor Member

timabbott commented Jun 23, 2018

The query plan is below:

zulip=> explain analyze UPDATE "zerver_usermessage" SET "flags" = ("zerver_usermessage"."flags" | 2048) WHERE "zerver_usermessage"."id" IN (SELECT V0."id" AS Col1 FROM "zerver_usermessage" V0 INNER JOIN "zerver_message" V2 ON (V0."message_id" = V2."id") WHERE (V0."user_profile_id" = 79 AND V2."recipient_id" IN (SELECT U0."id" AS Col1 FROM "zerver_recipient" U0 WHERE U0."type" IN (1, 3))));

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on zerver_usermessage  (cost=203478.70..403933.86 rows=23562 width=44) (actual time=1261.376..1261.376 rows=0 loops=1)
   ->  Nested Loop  (cost=203478.70..403933.86 rows=23562 width=44) (actual time=1260.864..1261.109 rows=10 loops=1)
         ->  HashAggregate  (cost=203478.13..203713.75 rows=23562 width=22) (actual time=1260.806..1260.855 rows=10 loops=1)
               ->  Hash Join  (cost=146558.77..203419.23 rows=23562 width=22) (actual time=811.029..1260.782 rows=10 loops=1)
                     Hash Cond: (v2.recipient_id = u0.id)
                     ->  Hash Join  (cost=146274.47..202581.59 rows=23562 width=20) (actual time=803.195..1238.252 rows=64840 loops=1)
                           Hash Cond: (v0.message_id = v2.id)
                           ->  Index Scan using zerver_usermessage_06037614 on zerver_usermessage v0  (cost=0.57..52390.90 rows=23562 width=14) (actual time=0.071..288.141 rows=64840 loops=1)
                                 Index Cond: (user_profile_id = 79)
                           ->  Hash  (cost=134730.40..134730.40 rows=664040 width=14) (actual time=802.696..802.696 rows=602819 loops=1)
                                 Buckets: 65536  Batches: 2  Memory Usage: 14156kB
                                 ->  Seq Scan on zerver_message v2  (cost=0.00..134730.40 rows=664040 width=14) (actual time=0.046..590.753 rows=602819 loops=1)
                     ->  Hash  (cost=171.69..171.69 rows=9009 width=10) (actual time=5.239..5.239 rows=8913 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 383kB
                           ->  Seq Scan on zerver_recipient u0  (cost=0.00..171.69 rows=9009 width=10) (actual time=0.017..2.796 rows=8913 loops=1)
                                 Filter: (type = ANY ('{1,3}'::integer[]))
                                 Rows Removed by Filter: 248
         ->  Index Scan using zerver_usermessage_pkey on zerver_usermessage  (cost=0.57..8.49 rows=1 width=26) (actual time=0.021..0.022 rows=1 loops=10)
               Index Cond: (id = v0.id)
 Total runtime: 1261.809 ms
(20 rows)

@timabbott
Copy link
Sponsor Member

    UserProfile = apps.get_model("zerver", "UserProfile")
    user_profile_ids = UserProfile.objects.all().order_by("id").values_list("id", flat=True)
    Recipient = apps.get_model("zerver", "Recipient")
    Subscription = apps.get_model("zerver", "Subscription")

    total = len(user_profile_ids)
    for user_id in user_profile_ids:
        print(user_id, total)
        sys.stdout.flush()
	recipient_ids = Subscription.objects.filter(recipient__type__in=[1,3], user_profile_id=user_id).\
values_list("recipient_id", flat=True)
#        recipient_ids = Recipient.objects.filter(type__in=[1,3]).values_list("id", flat=True)           
        with transaction.atomic():
            # We only need to do this because a previous migration didn't clean the field.               
            private_objects = UserMessage.objects.filter(user_profile__id = user_id,
                                                         message__recipient_id__in=recipient_ids)
            private_objects.update(flags=F('flags').bitor(UserMessage.flags.is_private))
        print("Processed %s/%s" % (user_id, total))

seems to do considerably better.

self.send_personal_message(self.example_email("hamlet"), user_profile.email,
content="test")
message = most_recent_message(user_profile)
assert(UserMessage.objects.get(user_profile=user_profile, message=message).flags.is_private.is_set)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.assertTrue is the right way to write this.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(In general, one doesn't use the assert keyword in unit tests; it produces less nice output than the self.assert* methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'll push the change with the other changes and will take care of this is future :)

@timabbott
Copy link
Sponsor Member

I tweaked the first commit to add a variable NON_API_FLAGS and pushed back to this PR.

I think we need to also add a check in the update_message_flags view to block client interaction with flags in the NON_API_FLAGS list; it looks like we don't currently have validation of the flag validity beyond this:

    flagattr = getattr(UserMessage.flags, flag)                                                                                          

(which I bet 500s with an invalid flag name) so we'll need to add something new with some tests to give a nice "Invalid flag 'foo'" type error message.

@shubham-padia
Copy link
Member Author

@timabbott This is ready for another review.

Copy link
Sponsor Member

@timabbott timabbott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a few of the changes mentioned here, and pushed the branch back. Can you re-test the migration and fix the tests?

# `flags` is a list. Do not use `flags` here by mistake.
flagattr = getattr(UserMessage.flags, flag)
else:
raise JsonableError(_("Invalid flag: %s" % (flag,)))
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do an early-exit with JsonableError. Also, I renamed flags to valid_flags so we don't need the comment.

UserMessage = apps.get_model("zerver", "UserMessage")
Message = apps.get_model("zerver", "Message")
i = 0
total = Message.objects.count()
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This total approach is buggy, because if messages are being sent while this runs, the total we queried before is wrong. At least if we're using atomic=False which we should for this expensive migration.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Using the looping flow from migration 0177 is robust against this sort of thing)

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(we need to adapt it a bit for the fact the query is sparse, though; see the code I just pushed)

zerver/models.py Outdated
# Zulip backend, and don't make sense to expose to the API. A
# good example is is_private, which is just a denormalization of
# message.recipient_type for database query performance.
NON_API_FLAGS = {"is_private"}
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a commit that also puts active_mobile_push_notification in this list.


with self.assertRaises(JsonableError):
do_update_message_flags(user_profile, get_client("website"), 'remove', 'invalid', [message])

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_events.py is really just for event system races; we should move your test changes. git grep messages/flags shows that test_messages.py is probably where this belongs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timabbott Can you have a look at test_messages.py, I'm not sure under which test case class should this fall?
(When adding the test initially I had a look at test_messages.py, but I didn't find any relevant test case classes.)

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a new test function in MessageAccessTests, next to the starring tests.

"test")

for msg in self.get_messages():
self.assertTrue('is_private' not in msg['flags'])
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a self.assertNotIn.

@timabbott
Copy link
Sponsor Member

I test-deployed this on chat.zulip.org, and the index works and seems to fix the original performance problem, which is great! So just a bit more cleanup and we can merge this.

@timabbott
Copy link
Sponsor Member

I guess one of the more important pieces of cleanup is we should have the migration abort early if there are no messages in the database (the CI failure):

Aug 09 21:31:59   File "/srv/zulip-venv-cache/c5ebf65c6930c83dc2e5ef4011cf534e70de3346/zulip-py3-venv/lib/python3.6/site-packages/django/core/management/__init__.py", line 364, in execute_from_command_line
Aug 09 21:31:59     utility.execute()
Aug 09 21:31:59   File "/srv/zulip-venv-cache/c5ebf65c6930c83dc2e5ef4011cf534e70de3346/zulip-py3-venv/lib/python3.6/site-packages/django/core/management/__init__.py", line 356, in execute
Aug 09 21:31:59     self.fetch_command(subcommand).run_from_argv(self.argv)
Aug 09 21:31:59   File "/srv/zulip-venv-cache/c5ebf65c6930c83dc2e5ef4011cf534e70de3346/zulip-py3-venv/lib/python3.6/site-packages/django/core/management/base.py", line 283, in run_from_argv
Aug 09 21:31:59     self.execute(*args, **cmd_options)
Aug 09 21:31:59   File "/srv/zulip-venv-cache/c5ebf65c6930c83dc2e5ef4011cf534e70de3346/zulip-py3-venv/lib/python3.6/site-packages/django/core/management/base.py", line 330, in execute
Aug 09 21:31:59     output = self.handle(*args, **options)
Aug 09 21:31:59   File "/srv/zulip-venv-cache/c5ebf65c6930c83dc2e5ef4011cf534e70de3346/zulip-py3-venv/lib/python3.6/site-packages/django/core/management/commands/migrate.py", line 204, in handle
Aug 09 21:31:59     fake_initial=fake_initial,
Aug 09 21:31:59   File "/srv/zulip-venv-cache/c5ebf65c6930c83dc2e5ef4011cf534e70de3346/zulip-py3-venv/lib/python3.6/site-packages/django/db/migrations/executor.py", line 115, in migrate
Aug 09 21:31:59     state = self._migrate_all_forwards(state, plan, full_plan, fake=fake, fake_initial=fake_initial)
Aug 09 21:31:59   File "/srv/zulip-venv-cache/c5ebf65c6930c83dc2e5ef4011cf534e70de3346/zulip-py3-venv/lib/python3.6/site-packages/django/db/migrations/executor.py", line 145, in _migrate_all_forwards
Aug 09 21:31:59     state = self.apply_migration(state, migration, fake=fake, fake_initial=fake_initial)
Aug 09 21:31:59   File "/srv/zulip-venv-cache/c5ebf65c6930c83dc2e5ef4011cf534e70de3346/zulip-py3-venv/lib/python3.6/site-packages/django/db/migrations/executor.py", line 244, in apply_migration
Aug 09 21:31:59     state = migration.apply(state, schema_editor)
Aug 09 21:31:59   File "/srv/zulip-venv-cache/c5ebf65c6930c83dc2e5ef4011cf534e70de3346/zulip-py3-venv/lib/python3.6/site-packages/django/db/migrations/migration.py", line 129, in apply
Aug 09 21:31:59     operation.database_forwards(self.app_label, schema_editor, old_state, project_state)
Aug 09 21:31:59   File "/srv/zulip-venv-cache/c5ebf65c6930c83dc2e5ef4011cf534e70de3346/zulip-py3-venv/lib/python3.6/site-packages/django/db/migrations/operations/special.py", line 193, in database_forwards
Aug 09 21:31:59     self.code(from_state.apps, schema_editor)
Aug 09 21:31:59   File "/home/circleci/zulip/zerver/migrations/0182_set_initial_value_is_private_flag.py", line 28, in set_initial_value_of_is_private_flag
Aug 09 21:31:59     if count == 0 and range_end >= Message.objects.last().id:
Aug 09 21:31:59 AttributeError: 'NoneType' object has no attribute 'id'

shubham-padia and others added 6 commits August 10, 2018 04:11
See the comment for why this is correct; basically, this flag is used
only for internal accounting, and would only confuse API clients.
The reasoning here is similar to `is_private`; this flag is only used
for internal accounting inside the Zulip server.
Raise error if flag is present in NON_API_FLAGS or is not present in
UserMessage.flags.
@shubham-padia
Copy link
Member Author

@timabbott This is ready for another review.

do_update_message_flags(user_profile, get_client("website"), 'remove', first_non_api_flag, [message])

with self.assertRaises(JsonableError):
do_update_message_flags(user_profile, get_client("website"), 'remove', 'invalid', [message])
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our tests, we usually prefer API interaction to calling raw actions.py methods where possible. I'll just redo this that way.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

         )
-        user_profile = self.example_user('hamlet')
-        for first_non_api_flag in UserMessage.NON_API_FLAGS:
-            break
 
-        with self.assertRaises(JsonableError):
-            do_update_message_flags(user_profile, get_client("website"), 'remove', first_non_api_flag, [message])
+        self.login(self.example_email("hamlet"))
+        result = self.client_post("/json/messages/flags",
+                                  {"messages": ujson.dumps([message]),
+                                   "op": "add",
+                                   "flag": "invalid"})
+        self.assert_json_error(result, "Invalid flag: 'invalid'")
 
-        with self.assertRaises(JsonableError):
-            do_update_message_flags(user_profile, get_client("website"), 'remove', 'invalid', [message])
+        result = self.client_post("/json/messages/flags",
+                                  {"messages": ujson.dumps([message]),
+                                   "op": "add",
+                                   "flag": "is_private"})
+        self.assert_json_error(result, "Invalid flag: 'is_private'")
+
+        result = self.client_post("/json/messages/flags",
+                                  {"messages": ujson.dumps([message]),
+                                   "op": "add",
+                                   "flag": "active_mobile_push_notification"})
+        self.assert_json_error(result, "Invalid flag: 'active_mobile_push_notification'")
 
     def change_star(self, messages: List[int], add: bool=True, **kwargs: Any) -> HttpResponse:

@timabbott
Copy link
Sponsor Member

Nice, I changed the one thing I mentioned above, and merged this as the series of commits ending with bdaff17 (note that I also moved the more extensive commit message and closes to the last commit). Thanks @shubham-padia!!

@timabbott timabbott closed this Aug 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants