-
-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
narrow: Add support for anchoring messages by date. #25677
base: main
Are you sure you want to change the base?
Conversation
These are the backend changes will add the frontend changes in the next PR. |
890b4d5
to
61c2d67
Compare
anchor_date
operator for message filtering.anchor:"date"
for message filtering.
240491d
to
971ebfb
Compare
@@ -819,6 +819,7 @@ def test_generate_and_render_curl_with_array_example(self) -> None: | |||
"curl -sSX GET -G http://localhost:9991/api/v1/messages \\", | |||
" -u BOT_EMAIL_ADDRESS:BOT_API_KEY \\", | |||
" --data-urlencode anchor=43 \\", | |||
" --data-urlencode anchor_date=2023-05-18T00:00:00Z \\", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't want to add these in this file, since the whole point is to have a real example input, and this parameter should not be used with that value of anchor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very sure how to pass the test without changing the example anchor value to date
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't want anchor_date
documented here,intentionally_undocumented
of REQ
might be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm, so do you want me to use intentionally_undocumented
in REQ
of anchor_date
? But doesn't it mean we don't anymore need the documentation for anchor_date
in zulip.yaml
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right then.I guess we don't want to do that. I think we want to pass exclude=["anchor_date"]
. May be look at other places in the file where we skip few fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that exclude
option sounds correct.
zerver/lib/narrow.py
Outdated
@@ -1276,6 +1297,25 @@ def fetch_messages( | |||
) | |||
|
|||
with get_sqlalchemy_connection() as sa_conn: | |||
if anchor_date: | |||
# The anchor_date value needs to be parsed here because the date query can | |||
# only be applied after all the other narrow conditions have been applied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can it only be applied at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've improved the comment to clarify this:
# `date_anchor` is not a filter but a mechanism to obtain the anchor value.
# This anchor value points to the message ID having date sent closest to
# the `anchor_date`. This anchor is utilized to restrict the query within
# a specific range. It can only be computed at this point, considering the
# set of messages/query that have already undergone the narrow conditions.
.order_by( | ||
extract( | ||
"epoch", literal_column("zerver_message.date_sent") - literal(anchor_date) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this extract
part do? It might deserve an explanatory comment on what this implements, since it's not a pattern we use often.
Also, if you can get the actual database query that this generates, and run EXPLAIN ANALYZE
on the query in a manage.py dbshell
, it'd be nice to see the query plan this uses. Here's the block to uncomment to get the query:
## Uncomment the following to get all database queries logged to the console
# 'django.db': {
# 'level': 'DEBUG',
# 'handlers': ['console'],
# 'propagate': False,
# },
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uncommenting that part of the code produced so many logs :') I used another hack to generate the db query log:
SELECT id AS message_id, subject, rendered_content, pgroonga_match_positions_character(rendered_content, pgroonga_query_extract_keywords(escape_html(%(escape_html_1)s))) AS content_matches, pgroonga_match_positions_character(escape_html(subject), pgroonga_query_extract_keywords(escape_html(%(escape_html_1)s))) AS topic_matches
FROM zerver_message
WHERE recipient_id = %(recipient_id_1)s AND (search_pgroonga &@~ escape_html(%(escape_html_1)s)) ORDER BY abs(EXTRACT(epoch FROM zerver_message.date_sent - %(param_1)s))
with the following parameters:
- `%(escape_html_1)s`: 'foo'
- `%(recipient_id_1)s`: 30
- `%(param_1)s`: `datetime.datetime(2023, 6, 6, 0, 0)`
On running EXPLAIN ANALYZE
I got:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
---------------
Sort (cost=4.05..4.05 rows=1 width=335) (actual time=1.465..1.466 rows=0 loops=1)
Sort Key: (abs(date_part('epoch'::text, (date_sent - '2023-06-06 00:00:00+00'::timestamp with time zone))))
Sort Method: quicksort Memory: 25kB
-> Index Scan using zerver_message_search_pgroonga on zerver_message (cost=0.00..4.04 rows=1 width=335) (actual time=1.362..1.363 r
ows=0 loops=1)
Index Cond: (search_pgroonga &@~ 'foo'::text)
Filter: (recipient_id = 30)
Rows Removed by Filter: 1
Planning Time: 109.001 ms
Execution Time: 3.895 ms
(9 rows)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extract("epoch", literal_column("zerver_message.date_sent") - literal(anchor_date))
expression calculates the difference between the two timestamps and extracts it as an epoch value. By ordering the query based on this epoch difference, the closest message to the anchor_date
will be selected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a comment to explain the use extract
function:
# Order the query results based on the absolute difference between the
# `date_sent` column and `anchor_date`. The `extract` function converts
# the timestamps to epoch values for accurate ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we call this epoch_difference
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the database have messages sorted by date already?
Would it be possible to find the closest date with sth like binary search instead of finding difference in timestamps for every message? Not an expert with db queries, so I'm not aware if that's actually possible.
May be start a discussion in czo - #backend, to check if this is already efficient or others have better ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the database have messages sorted by date already?
Mostly -- there's exceptions to that rule because of data imports. But that's true within an individual realm.
Hmm. What I'm worried about here is that we're not taking advantage of the database index that we have on the date_sent
column, and this operation could be very inefficient... but that index isn't very useful, since it isn't limited to a given realm or anything, so effectively what we're doing is asking the database to walk all the message matching whatever the rest of the query is, and sort those by this date filter.
One option would be to just first find the message ID in the realm that is closest to the target date via some sort of binary search mechanism -- worst case we add a new (realm, date_sent)
index to make this step efficient.
One option would be to figure out a different query construction that we can use to ask the database to walk a new (realm, date_sent)
index or (recipient, date_sent)
index.
if not anchor_date: | ||
raise JsonableError(_("Missing 'anchor_date' argument.")) | ||
# This anchor type is used to search for the message id closes to the anchor_date. | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to do the not anchor_date
check inside parse_anchor_date_value
? I'm not sure if that'd be cleaner or not.
I kinda feel like we should only be parsing anchor_date
if anchor
was set in a way such that it should be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved the validation logic inside parse_anchor_date_value
.
zerver/openapi/zulip.yaml
Outdated
- name: anchor_date | ||
in: query | ||
description: | | ||
Datetime (in ISO 8601 format) to search for the messages closest to this datetime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a **Changes**
entry documenting the API change with a feature level bump; check out the new application feature tutorial for a guide on this process.
I think this should also explicitly state that datetimes that do not include a timezone will be interpreted as UTC... assuming that's the behavior we intend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
zerver/openapi/zulip.yaml
Outdated
@@ -6243,12 +6255,24 @@ paths: | |||
- `oldest`: The oldest message. | |||
- `first_unread`: The oldest unread message matching the | |||
query, if any; otherwise, the most recent message. | |||
- `date`: The closest message near the searched date. Need to set | |||
`anchor_date` argument when using this anchor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be edited to clearly specify the boundary conditions, like we do for first_unread
. For example "The oldest message on or after the date indicated in the anchor_date
parameter, if any; otherwise, the most recent message".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
zerver/tests/test_message_fetch.py
Outdated
result = orjson.loads(payload.content) | ||
self.assertEqual(result["anchor"], msg_ids[0]) | ||
|
||
# case search_datetime > datetime of latest message sent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see this test improved to more clearly verify all the corner cases in the actual logic, with more comments like this one, and also checking which message IDs were returned, not just the anchor.
I would also like to see a case using the date-only ISO format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more test cases.
Thanks for the work on this @akshatdalton! I posted a batch of initial comments. I think the main structural thing that we need to decide in "api design" is how we want to handle the timezone/UTC issue. |
971ebfb
to
4d3293e
Compare
anchor:"date"
for message filtering.
@timabbott I've updated the PR addressing the comments. |
4d3293e
to
a70e23f
Compare
a70e23f
to
a1b99f7
Compare
a1b99f7
to
6450e60
Compare
6450e60
to
470c754
Compare
message_lists.current.selected_id = () => -1; | ||
|
||
all_messages_data.all_messages_data = { | ||
all_messages: () => messages, | ||
visibly_empty: () => false, | ||
first: () => ({id: 900}), | ||
last: () => ({id: 1100}), | ||
}; | ||
|
||
message_fetch.load_messages_for_narrow = (opts) => { | ||
assert.deepEqual(opts, { | ||
cont: opts.cont, | ||
msg_list: opts.msg_list, | ||
anchor: "date", | ||
anchor_date: "2023-06-02", | ||
}); | ||
|
||
opts.cont({anchor: 55}, {anchor: "date"}); | ||
}; | ||
|
||
narrow.activate([{operator: "date", operand: "2023-06-02"}], {trigger: "search"}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case, does cover the new added lines: https://github.com/zulip/zulip/pull/25677/files#diff-6c6eeaa26f7145a3d41a485431ee2442ca53a7aded6a4fdbc265e9575326834dR796-R800. After investigating I found that the test is returning from: https://github.com/zulip/zulip/pull/25677/files#diff-6c6eeaa26f7145a3d41a485431ee2442ca53a7aded6a4fdbc265e9575326834dR786-R789 even though I have set visibly_empty: () => false
in the mock.
- type: integer | ||
example: 43 | ||
example: date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find any way where without changing this example value I can pass the tests in test_openapi.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as 175286a#r1236887608?
Date and time (in ISO 8601 format) to search for messages closest to this datetime. | ||
If only the date is provided, the timezone will be interpreted as UTC. | ||
|
||
**Changes**: New in Zulip 7.1 (feature level 188). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7.1
is a mistake I'll correct it along with the other changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it should just say 8.0, since we don't backport this sort of change to the 7.x style stable releases.
search_datetime = datetime.datetime(1888, 6, 2, tzinfo=datetime.timezone.utc) | ||
search_date_str = search_datetime.isoformat() | ||
result = get_message_search_result(search_date_str) | ||
self.assertEqual(result["anchor"], 151) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can someone help me - what's the easiest way to fetch the message ID of the first message sent? I'll replace it with 151
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you're getting at -- self.send_stream_message
returns the message ID of a message you sent.
Is the issue you want the very oldest message available to a user? One option would be to just do a GET /messages
query with anchor: "oldest"
without a date filter, and verify the results agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the issue you want the very oldest message available to a user?
yes; thanks for the suggestion.
anchor_date, | ||
cont(data, opts) { | ||
if (!select_immediately) { | ||
update_selection({ | ||
anchor_type: opts.anchor, | ||
anchor: data.anchor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to add these changes to update the focus view after fetching the response from the server (along with
these changes) because:
- as suggested in the chat to use
then_select_id
logic to update the focus - the current code path doesn't work that way - it doesn't usethen_select_id
logic rather it calls this callback functioncont
in the very end to update the focus - the
near:
filter works differently and we can't leverage the same logic for thisdate:
filter becausenear
has a message ID specified which we may either find or not. This same message ID is set to theid_info
object which is passed to thiscont
function (and hence thenear:
filter does not direct depend on the fetched server anchor value to update the focus). The correct value if found by theclosest_id
function inMessageList
(all this happens inside the callback functioncont
).
This adds a new `anchor_date` parameter which needs to be used in conjunction with the `anchor` parameter set to `date` value. It allows to anchor the view to the messages that are closest to the specified searched date (provided via `anchor_date`). Fixes: zulip#25436. Signed-off-by: Akshat <akshat25iiit@gmail.com>
Signed-off-by: Akshat <akshat25iiit@gmail.com>
Signed-off-by: Akshat <akshat25iiit@gmail.com>
470c754
to
7915ce0
Compare
4ec3636
to
88b200c
Compare
Heads up @akshatdalton, we just merged some commits that conflict with the changes you made in this pull request! You can review this repository's recent commits to see where the conflicts occur. Please rebase your feature branch against the |
@akshatdalton do you have time to rebase this PR and get the backend tests passing? I'd like to try to integrate the API part of this next. |
Sure, will do it! |
This adds a new
anchor_date
parameter which needs to be used in conjunction with theanchor
parameter set todate
value. It allows to anchor the view to the messages that are closest to the specified searched date (provided viaanchor_date
).Fixes: #25436.
Self-review checklist
(variable names, code reuse, readability, etc.).
Communicate decisions, questions, and potential concerns.
Individual commits are ready for review (see commit discipline).
Completed manual review and testing of the following: