-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code blocks in documentation search #10139
Comments
This is a known problem with the Algolia search indexing along with #8159, although it seems that we didn't have an open issue for it already, so thanks for reporting! |
Thanks for the report @tbrlpld! Can confirm this is the case, though only for code blocks. Inline code formatting is still searchable as long as it’s inside a heading, paragraph, or list item. For code blocks – this is intentional as per recommendations from Algolia, who state indexing code blocks creates a lot of noise since there’s lots of repetition in code. We’ve discussed this at the last core team meeting and decided to give this a go anyway, as we now have much more control over how the indexing of the docs is configured. So we’ll be able to compare results with and without code blocks indexed. |
Awesome. Thanks for the update @thibaudcolas |
I have updated the Documentation search wiki page with a copy of our crawler configuration. There are still a few steps to go through before we can try out code block indexing but we’re getting closer. Once we’re ready to try this indexing, here is the recordExtractor: ({ helpers }) => {
return helpers.docsearch({
recordProps: {
lvl1: ["header h1", "article h1", "main h1", "h1", "head > title"],
content: ["article p, article li", "main p, main li", "p, li, pre"],
lvl0: {
selectors: "",
defaultValue: "Documentation",
},
lvl2: ["article h2", "main h2", "h2"],
lvl3: ["article h3", "main h3", "h3"],
lvl4: ["article h4", "main h4", "h4"],
lvl5: ["article h5", "main h5", "h5"],
lvl6: ["article h6", "main h6", "h6"],
},
aggregateContent: true,
recordVersion: "v3",
});
}, The only difference is the |
Now ready to be picked up (was waiting on #8159). If anyone has suggested searches to try this out with please post them here. While investigating this I think I might have also spotted another related issue: |
Interesting, it still does not seem to find this page: Dynamic image serve view — Wagtail Documentation 4.2.1 documentation docs.wagtail.org when looking for “image_url”
|
@tbrlpld I’ve added this specific example at https://wagtail-docs-search-comparison.netlify.app/#image_url I think that result is there already, even in the first screenshot you shared in this issue? We get one more result with code search turned on but it seems to be a partial match on "image" rather than a match of |
Right yea, the page is there. I guess I was confused because it's listed because of the sub string match on "generate_image_url" instead of the exact match of "image_url" which is also multiple times on the page. But right, it looks like it now highlights one exact match too 👍 |
Searching for Currently, all Algolia results point to the Panel API. As a comparison it might sometimes be useful to Google search with: |
Thanks Coen, I’ve added this specific query to the comparison. It’s a very interesting one because it highlights how Google, Algolia, RTD differ in what they index / how they return results:
There are a couple things we could try in Algolia to improve its results:
|
Could we make it so that the rank of code blocks is lower than that of other contents? If so, I assume that it would make pages that mention Edit: Just realised that it isn't a code block but rather a signature. Still, could potentially be applied too. |
When using search on the docs, the surfaced results seem to ignore words that are found in code blocks (inline or block).
E.g. the search for
image_url
only reveals two entries:https://docs.wagtail.org/en/stable/search.html?q=image_url
The second result contains the search term more accurately and multiple times. The other instances are not shown in the results view. Not sure if this is because of the page was already linked. But in that case it would be nice to see the more fitting results first.
https://docs.wagtail.org/en/stable/advanced_topics/images/image_serve_view.html#generating-dynamic-image-urls-in-python
But, there are other pages in the docs that the search completely missed.
https://docs.wagtail.org/en/stable/advanced_topics/performance.html#image-urls
I wonder if we configure Algolia somehow to pick up code blocks better.
Tasks
The text was updated successfully, but these errors were encountered: