New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sessions: Migrate to custom indexed session model. #29668
base: main
Are you sure you want to change the base?
Conversation
Hello @zulip/server-production members, this pull request was labeled with the "area: production" label, so you may want to check it out! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create RealmSession model with fields: realm, user and ip_address,
indexed by realm and user.
Indexing the IP address would also be useful.
This change introduced one extra query for 'assert_database_query_count()'
Which query, and why?
It also looks like you'll need to add some tests for complete coverage.
for session in RealmSession.objects.filter(realm=user_profile.realm, user=user_profile): | ||
delete_session(session) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to do O(n) SQL queries for the delete.
That's fine for now, but (as a future commit) we would like to do this in three steps:
- Fetch the matching session keys
- Bulk-delete all of the matching sessions from the database using the filter
- Bulk-delete all of the matching sessions from the cache using the session keys
for session in RealmSession.objects.filter(realm=realm): | ||
delete_session(session) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above.
for session in Session.objects.all(): | ||
for session in RealmSession.objects.all(): | ||
delete_session(session) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above.
zerver/lib/sessions.py
Outdated
for session in RealmSession.objects.filter(user__isnull=False).select_related("realm", "user"): | ||
# Guaranteed to be UserProfile since we are excluding Null users from the queryset. | ||
user_profile = cast(UserProfile, session.user) | ||
# Guaranteed to be Realm since a user (at least in this session model) always has a realm. | ||
user_realm = cast(Realm, session.realm) | ||
|
||
if not user_profile.is_active or user_realm.deactivated: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no reason to not do these filters in SQL, with:
from django.db.models import Q
for session in RealmSession.objects.filter(Q(user__is_active=False) | Q(realm__deactivated=True)):
# ...
These casts would also have been better expressed as user_realm = assert_is_not_none(session.realm)
-- we don't want to cast (we almost never do), we just want to tell mypy that we know that it won't be None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and i think that filter will also remove the need for the select_related
and if
condition checks because we now have the right filtered sessions so we can delete them directly
operations = [ | ||
migrations.CreateModel( | ||
name="RealmSession", | ||
fields=[ | ||
( | ||
"session_key", | ||
models.CharField( | ||
max_length=40, primary_key=True, serialize=False, verbose_name="session key" | ||
), | ||
), | ||
("session_data", models.TextField(verbose_name="session data")), | ||
("expire_date", models.DateTimeField(db_index=True, verbose_name="expire date")), | ||
("ip_address", models.GenericIPAddressField(null=True)), | ||
( | ||
"realm", | ||
models.ForeignKey( | ||
null=True, on_delete=django.db.models.deletion.CASCADE, to="zerver.realm" | ||
), | ||
), | ||
( | ||
"user", | ||
models.ForeignKey( | ||
null=True, | ||
on_delete=django.db.models.deletion.CASCADE, | ||
to=settings.AUTH_USER_MODEL, | ||
), | ||
), | ||
], | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new table. That implies that we need to copy over the data from the existing sessions table -- we don't want to suddenly log out everyone who had sessions in the old table.
Bear in mind that this is a large table. We will want to use a INSERT INTO ... SELECT
or similar to do this copy as performantly as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where should this code (for copying over the data) live ?
We want it be raw SQL query not django orm, right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be a new database migration, probably with a RunSQL migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
After
INSERT INTO...
query is successful, should i delete data from existing session table,TRUNCATE TABLE django_session
? -
What is the reason for prefering raw SQL over django orm, performance ?
if instance.user is not None: | ||
instance.realm_id = instance.user.realm_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user logs out, and instance.user
becomes None
, should we be unset'ing realm_id
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When user logs out, that instance gets deleted
zerver/models/sessions.py
Outdated
@override | ||
def create_model_instance(self, data: Dict[str, Union[int, str]]) -> RealmSession: | ||
session_object = super().create_model_instance(data) | ||
assert isinstance(session_object, RealmSession) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implies a bug or limitation in the typing of create_model_instance
, since it should ideally be able to determine this type from the get_model_class
type, above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and in that case assert isinstance
is the right way to silent that false alarm from mypy
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case the right way to silence the false alarm is to file a bug with the project whose types aren't correct. We can then make this:
session_object: RealmSession = super().create_model_instance(data) # type: ignore[assignment] # link to bug report
...so we can know to come back and clean this up later.
zerver/views/home.py
Outdated
@@ -205,6 +205,8 @@ def home_real(request: HttpRequest) -> HttpResponse: | |||
# that we're a spectator, not a logged-in user. | |||
user_profile = None | |||
|
|||
request.session["realm_id"] = realm.id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this line required? Why is it only necessary here, and not in some middleware to be more generic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe needs more investigation, but i put it there based on this suggestion from CZO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should have an argument for why this change is the right change, not just that it seems to work.
efa5c75
to
ef3ad42
Compare
UPDATE "zerver_realmsession" SET "session_data" = 'some value"',
"expire_date" = 'some value'::timestamptz,
"realm_id" = 2,
"user_id" = 10,
"ip_address" = NULL
WHERE "zerver_realmsession"."session_key" = 'some value' That extra query is triggered because in |
ef3ad42
to
964de7d
Compare
61aadc0
to
2edb41e
Compare
2edb41e
to
b86cb52
Compare
b86cb52
to
e222570
Compare
e222570
to
9cf24bc
Compare
9cf24bc
to
4946373
Compare
Fixes zulip#19490 and has other performance benefits. Create RealmSession model with custom fields: realm, user and ip_address, indexed by (realm, user) and ip_address. Migrate code to use that model, "zerver/lib/sessions" has most impact. Create data_migration to copy over data from django_session table to zerver_realmsession table.
4946373
to
8cc9fc2
Compare
Fixes #19490.
CZO discussion
New custom Session model indexed by realm and user.
Created
RealmSession
model with fieldsrealm
,user
andip_address
.Customized
SessionStore
to work withRealmSession
.Migrate code to
RealmSession
to get the performant benefits of the index (e.g.zerver/lib/sessions.py
has the most impact)This introduced one extra query for
zerver/tests/test_signup.py
andzerver/tests/test_home.py
, so we increment the query count param forassert_database_query_count()