Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sessions: Migrate to custom indexed session model. #29668

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

abdelrahman725
Copy link
Contributor

@abdelrahman725 abdelrahman725 commented Apr 10, 2024

Fixes #19490.

CZO discussion

New custom Session model indexed by realm and user.

Created RealmSession model with fields realm, user and ip_address.

Customized SessionStore to work with RealmSession.

Migrate code to RealmSession to get the performant benefits of the index (e.g. zerver/lib/sessions.py has the most impact)

This introduced one extra query for zerver/tests/test_signup.py and zerver/tests/test_home.py, so we increment the query count param for assert_database_query_count()

@zulipbot zulipbot added size: XL area: performance Performance and scalability issues area: production labels Apr 10, 2024
@zulipbot
Copy link
Member

Hello @zulip/server-production members, this pull request was labeled with the "area: production" label, so you may want to check it out!

Copy link
Collaborator

@alexmv alexmv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create RealmSession model with fields: realm, user and ip_address,
indexed by realm and user.

Indexing the IP address would also be useful.

This change introduced one extra query for 'assert_database_query_count()'

Which query, and why?

It also looks like you'll need to add some tests for complete coverage.

Comment on lines +41 to +44
for session in RealmSession.objects.filter(realm=user_profile.realm, user=user_profile):
delete_session(session)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to do O(n) SQL queries for the delete.

That's fine for now, but (as a future commit) we would like to do this in three steps:

  • Fetch the matching session keys
  • Bulk-delete all of the matching sessions from the database using the filter
  • Bulk-delete all of the matching sessions from the cache using the session keys

Comment on lines +46 to +49
for session in RealmSession.objects.filter(realm=realm):
delete_session(session)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

Comment on lines -60 to 54
for session in Session.objects.all():
for session in RealmSession.objects.all():
delete_session(session)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

Comment on lines 56 to 62
for session in RealmSession.objects.filter(user__isnull=False).select_related("realm", "user"):
# Guaranteed to be UserProfile since we are excluding Null users from the queryset.
user_profile = cast(UserProfile, session.user)
# Guaranteed to be Realm since a user (at least in this session model) always has a realm.
user_realm = cast(Realm, session.realm)

if not user_profile.is_active or user_realm.deactivated:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reason to not do these filters in SQL, with:

from django.db.models import Q

for session in RealmSession.objects.filter(Q(user__is_active=False) | Q(realm__deactivated=True)):
    # ...

These casts would also have been better expressed as user_realm = assert_is_not_none(session.realm) -- we don't want to cast (we almost never do), we just want to tell mypy that we know that it won't be None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and i think that filter will also remove the need for the select_related and if condition checks because we now have the right filtered sessions so we can delete them directly

Comment on lines 13 to 41
operations = [
migrations.CreateModel(
name="RealmSession",
fields=[
(
"session_key",
models.CharField(
max_length=40, primary_key=True, serialize=False, verbose_name="session key"
),
),
("session_data", models.TextField(verbose_name="session data")),
("expire_date", models.DateTimeField(db_index=True, verbose_name="expire date")),
("ip_address", models.GenericIPAddressField(null=True)),
(
"realm",
models.ForeignKey(
null=True, on_delete=django.db.models.deletion.CASCADE, to="zerver.realm"
),
),
(
"user",
models.ForeignKey(
null=True,
on_delete=django.db.models.deletion.CASCADE,
to=settings.AUTH_USER_MODEL,
),
),
],
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new table. That implies that we need to copy over the data from the existing sessions table -- we don't want to suddenly log out everyone who had sessions in the old table.

Bear in mind that this is a large table. We will want to use a INSERT INTO ... SELECT or similar to do this copy as performantly as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where should this code (for copying over the data) live ?

We want it be raw SQL query not django orm, right ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be a new database migration, probably with a RunSQL migration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. After INSERT INTO... query is successful, should i delete data from existing session table, TRUNCATE TABLE django_session ?

  2. What is the reason for prefering raw SQL over django orm, performance ?

Comment on lines +34 to +39
if instance.user is not None:
instance.realm_id = instance.user.realm_id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user logs out, and instance.user becomes None, should we be unset'ing realm_id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When user logs out, that instance gets deleted

@override
def create_model_instance(self, data: Dict[str, Union[int, str]]) -> RealmSession:
session_object = super().create_model_instance(data)
assert isinstance(session_object, RealmSession)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies a bug or limitation in the typing of create_model_instance, since it should ideally be able to determine this type from the get_model_class type, above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and in that case assert isinstance is the right way to silent that false alarm from mypy ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case the right way to silence the false alarm is to file a bug with the project whose types aren't correct. We can then make this:

session_object: RealmSession = super().create_model_instance(data)  # type: ignore[assignment]  # link to bug report

...so we can know to come back and clean this up later.

@@ -205,6 +205,8 @@ def home_real(request: HttpRequest) -> HttpResponse:
# that we're a spectator, not a logged-in user.
user_profile = None

request.session["realm_id"] = realm.id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this line required? Why is it only necessary here, and not in some middleware to be more generic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe needs more investigation, but i put it there based on this suggestion from CZO

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should have an argument for why this change is the right change, not just that it seems to work.

@abdelrahman725 abdelrahman725 force-pushed the session-indexing branch 7 times, most recently from efa5c75 to ef3ad42 Compare April 18, 2024 05:40
@abdelrahman725
Copy link
Contributor Author

abdelrahman725 commented Apr 20, 2024

This change introduced one extra query for 'assert_database_query_count()'

Which query, and why?

@alexmv

UPDATE "zerver_realmsession" SET "session_data" = 'some value"',
"expire_date" = 'some value'::timestamptz, 
"realm_id" = 2, 
"user_id" = 10,
"ip_address" = NULL 
WHERE "zerver_realmsession"."session_key" = 'some value'

That extra query is triggered because in create_realm_session_store() called by zerver.models.create_model_instance() we add the custom fields to RealmSession instance when they are available in the session data, so we need to update that table to save those fields, hence the query.

Fixes zulip#19490 and has other performance benefits.

Create RealmSession model with custom fields: realm, user and ip_address,
indexed by (realm, user) and ip_address.

Migrate code to use that model, "zerver/lib/sessions" has most impact.

Create data_migration to copy over data from
django_session table to zerver_realmsession table.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: performance Performance and scalability issues area: production size: XL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide an internal function to deactivate multiple users more efficiently
3 participants