Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

narrow: Add support for anchoring messages by date. #25677

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

akshatdalton
Copy link
Member

@akshatdalton akshatdalton commented May 19, 2023

This adds a new anchor_date parameter which needs to be used in conjunction with the anchor parameter set to date value. It allows to anchor the view to the messages that are closest to the specified searched date (provided via anchor_date).

Fixes: #25436.

Self-review checklist
  • Self-reviewed the changes for clarity and maintainability
    (variable names, code reuse, readability, etc.).

Communicate decisions, questions, and potential concerns.

  • Explains differences from previous plans (e.g., issue description).
  • Highlights technical choices and bugs encountered.
  • Calls out remaining decisions and concerns.
  • Automated tests verify logic where appropriate.

Individual commits are ready for review (see commit discipline).

  • Each commit is a coherent idea.
  • Commit message(s) explain reasoning and motivation for changes.

Completed manual review and testing of the following:

  • Visual appearance of the changes.
  • Responsiveness and internationalization.
  • Strings and tooltips.
  • End-to-end functionality of buttons, interactions and flows.
  • Corner cases, error conditions, and easily imagined bugs.

@akshatdalton
Copy link
Member Author

These are the backend changes will add the frontend changes in the next PR.

@akshatdalton akshatdalton force-pushed the issue_#25436 branch 2 times, most recently from 890b4d5 to 61c2d67 Compare May 19, 2023 13:47
@akshatdalton akshatdalton changed the title narrow: Introduce anchor_date operator for message filtering. narrow: Add support for anchor:"date" for message filtering. May 19, 2023
@akshatdalton akshatdalton force-pushed the issue_#25436 branch 7 times, most recently from 240491d to 971ebfb Compare May 19, 2023 19:20
@@ -819,6 +819,7 @@ def test_generate_and_render_curl_with_array_example(self) -> None:
"curl -sSX GET -G http://localhost:9991/api/v1/messages \\",
" -u BOT_EMAIL_ADDRESS:BOT_API_KEY \\",
" --data-urlencode anchor=43 \\",
" --data-urlencode anchor_date=2023-05-18T00:00:00Z \\",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't want to add these in this file, since the whole point is to have a real example input, and this parameter should not be used with that value of anchor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very sure how to pass the test without changing the example anchor value to date.

Copy link
Collaborator

@chdinesh1089 chdinesh1089 May 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't want anchor_date documented here,intentionally_undocumented of REQ might be helpful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, so do you want me to use intentionally_undocumented in REQ of anchor_date? But doesn't it mean we don't anymore need the documentation for anchor_date in zulip.yaml?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right then.I guess we don't want to do that. I think we want to pass exclude=["anchor_date"]. May be look at other places in the file where we skip few fields.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that exclude option sounds correct.

@@ -1276,6 +1297,25 @@ def fetch_messages(
)

with get_sqlalchemy_connection() as sa_conn:
if anchor_date:
# The anchor_date value needs to be parsed here because the date query can
# only be applied after all the other narrow conditions have been applied.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can it only be applied at the end?

Copy link
Member Author

@akshatdalton akshatdalton May 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've improved the comment to clarify this:

# `date_anchor` is not a filter but a mechanism to obtain the anchor value.
# This anchor value points to the message ID having date sent closest to
# the `anchor_date`. This anchor is utilized to restrict the query within
# a specific range. It can only be computed at this point, considering the
# set of messages/query that have already undergone the narrow conditions.

.order_by(
extract(
"epoch", literal_column("zerver_message.date_sent") - literal(anchor_date)
)
Copy link
Member

@timabbott timabbott May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this extract part do? It might deserve an explanatory comment on what this implements, since it's not a pattern we use often.

Also, if you can get the actual database query that this generates, and run EXPLAIN ANALYZE on the query in a manage.py dbshell, it'd be nice to see the query plan this uses. Here's the block to uncomment to get the query:

        ## Uncomment the following to get all database queries logged to the console                    
        # 'django.db': {                                                                                
        #     'level': 'DEBUG',                                                                         
        #     'handlers': ['console'],                                                                  
        #     'propagate': False,                                                                       
        # },                                                                                            

Copy link
Member Author

@akshatdalton akshatdalton May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncommenting that part of the code produced so many logs :') I used another hack to generate the db query log:

SELECT id AS message_id, subject, rendered_content, pgroonga_match_positions_character(rendered_content, pgroonga_query_extract_keywords(escape_html(%(escape_html_1)s))) AS content_matches, pgroonga_match_positions_character(escape_html(subject), pgroonga_query_extract_keywords(escape_html(%(escape_html_1)s))) AS topic_matches
FROM zerver_message
WHERE recipient_id = %(recipient_id_1)s AND (search_pgroonga &@~ escape_html(%(escape_html_1)s)) ORDER BY abs(EXTRACT(epoch FROM zerver_message.date_sent - %(param_1)s))

with the following parameters:

- `%(escape_html_1)s`: 'foo'
- `%(recipient_id_1)s`: 30
- `%(param_1)s`: `datetime.datetime(2023, 6, 6, 0, 0)`

On running EXPLAIN ANALYZE I got:

                                                                      QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------
---------------
 Sort  (cost=4.05..4.05 rows=1 width=335) (actual time=1.465..1.466 rows=0 loops=1)
   Sort Key: (abs(date_part('epoch'::text, (date_sent - '2023-06-06 00:00:00+00'::timestamp with time zone))))
   Sort Method: quicksort  Memory: 25kB
   ->  Index Scan using zerver_message_search_pgroonga on zerver_message  (cost=0.00..4.04 rows=1 width=335) (actual time=1.362..1.363 r
ows=0 loops=1)
         Index Cond: (search_pgroonga &@~ 'foo'::text)
         Filter: (recipient_id = 30)
         Rows Removed by Filter: 1
 Planning Time: 109.001 ms
 Execution Time: 3.895 ms
(9 rows)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extract("epoch", literal_column("zerver_message.date_sent") - literal(anchor_date)) expression calculates the difference between the two timestamps and extracts it as an epoch value. By ordering the query based on this epoch difference, the closest message to the anchor_date will be selected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment to explain the use extract function:

# Order the query results based on the absolute difference between the
# `date_sent` column and `anchor_date`. The `extract` function converts
# the timestamps to epoch values for accurate ordering.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this epoch_difference instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the database have messages sorted by date already?

Would it be possible to find the closest date with sth like binary search instead of finding difference in timestamps for every message? Not an expert with db queries, so I'm not aware if that's actually possible.

May be start a discussion in czo - #backend, to check if this is already efficient or others have better ideas?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the database have messages sorted by date already?

Mostly -- there's exceptions to that rule because of data imports. But that's true within an individual realm.

Hmm. What I'm worried about here is that we're not taking advantage of the database index that we have on the date_sent column, and this operation could be very inefficient... but that index isn't very useful, since it isn't limited to a given realm or anything, so effectively what we're doing is asking the database to walk all the message matching whatever the rest of the query is, and sort those by this date filter.

One option would be to just first find the message ID in the realm that is closest to the target date via some sort of binary search mechanism -- worst case we add a new (realm, date_sent) index to make this step efficient.

One option would be to figure out a different query construction that we can use to ask the database to walk a new (realm, date_sent) index or (recipient, date_sent) index.

if not anchor_date:
raise JsonableError(_("Missing 'anchor_date' argument."))
# This anchor type is used to search for the message id closes to the anchor_date.
return None
Copy link
Member

@timabbott timabbott May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to do the not anchor_date check inside parse_anchor_date_value? I'm not sure if that'd be cleaner or not.

I kinda feel like we should only be parsing anchor_date if anchor was set in a way such that it should be used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved the validation logic inside parse_anchor_date_value.

- name: anchor_date
in: query
description: |
Datetime (in ISO 8601 format) to search for the messages closest to this datetime.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a **Changes** entry documenting the API change with a feature level bump; check out the new application feature tutorial for a guide on this process.

I think this should also explicitly state that datetimes that do not include a timezone will be interpreted as UTC... assuming that's the behavior we intend.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -6243,12 +6255,24 @@ paths:
- `oldest`: The oldest message.
- `first_unread`: The oldest unread message matching the
query, if any; otherwise, the most recent message.
- `date`: The closest message near the searched date. Need to set
`anchor_date` argument when using this anchor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be edited to clearly specify the boundary conditions, like we do for first_unread. For example "The oldest message on or after the date indicated in the anchor_date parameter, if any; otherwise, the most recent message".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

result = orjson.loads(payload.content)
self.assertEqual(result["anchor"], msg_ids[0])

# case search_datetime > datetime of latest message sent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see this test improved to more clearly verify all the corner cases in the actual logic, with more comments like this one, and also checking which message IDs were returned, not just the anchor.

I would also like to see a case using the date-only ISO format.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more test cases.

@timabbott
Copy link
Member

Thanks for the work on this @akshatdalton! I posted a batch of initial comments. I think the main structural thing that we need to decide in "api design" is how we want to handle the timezone/UTC issue.

@akshatdalton akshatdalton changed the title narrow: Add support for anchor:"date" for message filtering. narrow: Add support for anchoring messages by date. May 27, 2023
@akshatdalton
Copy link
Member Author

@timabbott I've updated the PR addressing the comments.

api_docs/changelog.md Outdated Show resolved Hide resolved
Comment on lines +216 to +236
message_lists.current.selected_id = () => -1;

all_messages_data.all_messages_data = {
all_messages: () => messages,
visibly_empty: () => false,
first: () => ({id: 900}),
last: () => ({id: 1100}),
};

message_fetch.load_messages_for_narrow = (opts) => {
assert.deepEqual(opts, {
cont: opts.cont,
msg_list: opts.msg_list,
anchor: "date",
anchor_date: "2023-06-02",
});

opts.cont({anchor: 55}, {anchor: "date"});
};

narrow.activate([{operator: "date", operand: "2023-06-02"}], {trigger: "search"});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case, does cover the new added lines: https://github.com/zulip/zulip/pull/25677/files#diff-6c6eeaa26f7145a3d41a485431ee2442ca53a7aded6a4fdbc265e9575326834dR796-R800. After investigating I found that the test is returning from: https://github.com/zulip/zulip/pull/25677/files#diff-6c6eeaa26f7145a3d41a485431ee2442ca53a7aded6a4fdbc265e9575326834dR786-R789 even though I have set visibly_empty: () => false in the mock.

- type: integer
example: 43
example: date
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find any way where without changing this example value I can pass the tests in test_openapi.py.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Date and time (in ISO 8601 format) to search for messages closest to this datetime.
If only the date is provided, the timezone will be interpreted as UTC.

**Changes**: New in Zulip 7.1 (feature level 188).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7.1 is a mistake I'll correct it along with the other changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it should just say 8.0, since we don't backport this sort of change to the 7.x style stable releases.

search_datetime = datetime.datetime(1888, 6, 2, tzinfo=datetime.timezone.utc)
search_date_str = search_datetime.isoformat()
result = get_message_search_result(search_date_str)
self.assertEqual(result["anchor"], 151)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can someone help me - what's the easiest way to fetch the message ID of the first message sent? I'll replace it with 151.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you're getting at -- self.send_stream_message returns the message ID of a message you sent.

Is the issue you want the very oldest message available to a user? One option would be to just do a GET /messages query with anchor: "oldest" without a date filter, and verify the results agree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the issue you want the very oldest message available to a user?

yes; thanks for the suggestion.

Comment on lines +516 to +521
anchor_date,
cont(data, opts) {
if (!select_immediately) {
update_selection({
anchor_type: opts.anchor,
anchor: data.anchor,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to add these changes to update the focus view after fetching the response from the server (along with
these changes) because:

  • as suggested in the chat to use then_select_id logic to update the focus - the current code path doesn't work that way - it doesn't use then_select_id logic rather it calls this callback function cont in the very end to update the focus
  • the near: filter works differently and we can't leverage the same logic for this date: filter because near has a message ID specified which we may either find or not. This same message ID is set to the id_info object which is passed to this cont function (and hence the near: filter does not direct depend on the fetched server anchor value to update the focus). The correct value if found by the closest_id function in MessageList (all this happens inside the callback function cont).

@akshatdalton
Copy link
Member Author

@zulipbot add "buddy review"
@zulipbot add "mentor review"

@zulipbot zulipbot added buddy review GSoC buddy review needed. mentor review GSoC mentor review needed. labels Jun 18, 2023
This adds a new `anchor_date` parameter which needs
to be used in conjunction with the `anchor` parameter
set to `date` value. It allows to anchor the view to
the messages that are closest to the specified
searched date (provided via `anchor_date`).

Fixes: zulip#25436.

Signed-off-by: Akshat <akshat25iiit@gmail.com>
Signed-off-by: Akshat <akshat25iiit@gmail.com>
Signed-off-by: Akshat <akshat25iiit@gmail.com>
@zulipbot
Copy link
Member

Heads up @akshatdalton, we just merged some commits that conflict with the changes you made in this pull request! You can review this repository's recent commits to see where the conflicts occur. Please rebase your feature branch against the upstream/main branch and resolve your pull request's merge conflicts accordingly.

@timabbott
Copy link
Member

@akshatdalton do you have time to rebase this PR and get the backend tests passing? I'd like to try to integrate the API part of this next.

@akshatdalton
Copy link
Member Author

Sure, will do it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buddy review GSoC buddy review needed. has conflicts mentor review GSoC mentor review needed. post release Issues to focus attention on after the current major release size: XL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Search Operators by Date
4 participants