Improve docs search #5825

mffap · 2023-05-12T06:32:23Z

Currently there are some issues with search, namely:

API docs fill all search results, instead of a mix of concepts, guides and api (--> hierarchy? context-based on which tab?) [example: search for "self-service" vs. search for "mfa"]
Synonyms should be handled (eg, permissions <> roles <> grants)
Search finds "orphaned" pages, ie. pages that are not in the sidebar
Technical: Analyze index/extraction errors

Acceptance Criteria

Give feedback

docs(api): update api path #5876
Show API methods as sub of resource (APIs > Core Resources > Authentication method - ...)
Evaluate if API search can be optimized based on main tab
Create synonyms for roles, mfa, and other typical terms (not in the oss plan)
docs: remove orphaned pages #5826
Clear index errors
Weight legal pages lower (do they come up in search?)
docs: optimize titles for search #5880
docs(search): add getMissingResultsUrl #5893

docs
Include API methods/calls in the generated documentation #5929

0 of 3

resources
Options

Resources

Current draft (actions):

actions: [
    {
      indexName: "zitadel",
      pathsToMatch: [
        "https://zitadel.com/docs/**",
        "!https://zitadel.com/docs/apis/resources/**",
      ],
      recordExtractor: ({ $, helpers }) => {
        // priority order: deepest active sub list header -> navbar active item -> 'Documentation'
        const lvl0 =
          $(".navbar__item.navbar__link--active").last().text() +
            " - " +
            $(".menu__link.menu__link--sublist.menu__link--active:first")
              .last()
              .text() || "General";

        return helpers.docsearch({
          recordProps: {
            lvl0: {
              selectors: "",
              defaultValue: lvl0,
            },
            lvl1: ".menu__link.menu__link--sublist.menu__link--active:last",
            lvl2: ["header h1", "article h1"],
            lvl3: ["header h2", "article h2"],
            lvl4: "article h3",
            lvl5: "article h4",
            lvl6: "article h5, article td:first-child",
            content: "article p, article li, article td:last-child",
          },
          indexHeadings: true,
          aggregateContent: true,
          recordVersion: "v3",
        });
      },
    },
    {
      indexName: "zitadel",
      pathsToMatch: ["https://zitadel.com/docs/apis/resources/**"],
      recordExtractor: ({ $, helpers }) => {
        // priority order: deepest active sub list header -> navbar active item -> 'Documentation'
        const lvl0 =
          $(".navbar__item.navbar__link--active").last().text() || "APIs";

        return helpers.docsearch({
          recordProps: {
            lvl0: {
              selectors: "",
              defaultValue: lvl0,
            },
            lvl1: ".menu__link.menu__link--sublist.menu__link--active:first",
            lvl2: ["header h2", "article h2"],
            lvl3: "article h3",
            lvl4: "article h4",
            lvl5: "article h5",
            lvl6: "article h6, article td:first-child",
            content: "article p:first",
          },
          indexHeadings: true,
          aggregateContent: true,
          recordVersion: "v3",
        });
      },
    },
  ],

The text was updated successfully, but these errors were encountered:

mffap · 2023-05-12T06:32:44Z

@dakshitha feel free to add other observations

mffap · 2023-05-13T07:51:35Z

One issue identified regarding hierarchies: the crawler picks up either the top-level nav xor the active sidebar element as level0. That means as soon as you go down the hierarchy in the sidebar, you loose context of the top nav. With our current setup that does not make sense.

We should preserve at least top-nav as lv0 and the upper most category of the sidebar as lv1 with our current information architecture.

mffap · 2023-05-14T07:11:25Z

Regarding generated docs:

We can make Algolia crawl the API docs with a separate index to allow for the different structure
Can we add custom classes to some generated elements (missing h1, description, based path, path)? --> makes easier to select content
Some pages in "API" are not generated some are. Those types follow different formatting, which is relevant to the crawler. I could create a separate crawl logic for the generated pages, however they are not identifiable just from the url, only way would be to hard-code the relevant hard-coded paths in the crawler (not optimal for future-proofing). It might be worth to move the generated files ("Core Resources") under one path to make selection easier. --> drawback: breaks documentation links...

mffap · 2023-05-14T07:32:25Z

Can we add custom classes to some generated elements (missing h1, description, based path, path)? --> makes easier to select content

Could not figure out how to manipulate the md template (only mdx content), so we have to find another way.
Also re: h1 issue

mffap · 2023-05-14T07:39:46Z

Some pages in "API" are not generated some are. Those types follow different formatting, which is relevant to the crawler. I could create a separate crawl logic for the generated pages, however they are not identifiable just from the url, only way would be to hard-code the relevant hard-coded paths in the crawler (not optimal for future-proofing). It might be worth to move the generated files ("Core Resources") under one path to make selection easier. --> drawback: breaks documentation links...

@fforootd @hifabienne @dakshitha any opinions on this? You can see an example of the two indices in the issue description, should be clear from that what the struggle is :)

fforootd · 2023-05-15T15:18:20Z

Some pages in "API" are not generated some are. Those types follow different formatting, which is relevant to the crawler. I could create a separate crawl logic for the generated pages, however they are not identifiable just from the url, only way would be to hard-code the relevant hard-coded paths in the crawler (not optimal for future-proofing). It might be worth to move the generated files ("Core Resources") under one path to make selection easier. --> drawback: breaks documentation links...

@fforootd @hifabienne @dakshitha any opinions on this? You can see an example of the two indices in the issue description, should be clear from that what the struggle is :)

Hm good question.

Is not everything in /api to be ranked lower then the other content?

mffap · 2023-05-15T15:28:13Z

Is not everything in /api to be ranked lower then the other content?

Yes and not relevant :) Ranking can be done per Index, since we have only one index atm all have the same rank (see hierarchy issue above). The real issue is that the generated pages follow a different format than the other pages, thus we need to parse them differently. At the moment they can only be separated by hard-coding the path.

I would actually like to put all generated content in a subpath like /api/resources
With that we can apply a different crawler/index to the sub-path. Also makes exclusion in gitinore etc. cleaner.

fforootd · 2023-05-15T15:34:35Z

Ok, we can do this already IMO. We can separate the whole /api into its own index.
In my mind it makes no difference if it is generated or not.

(I am not sure if docsearch can do this out of the box)

mffap · 2023-05-15T15:43:43Z

Ok, we can do this already IMO. We can separate the whole /api into its own index. In my mind it makes no difference if it is generated or not.

(I am not sure if docsearch can do this out of the box)

Yes it does. I can do that change. Was looking for any vetos of you.

fforootd · 2023-05-15T15:44:41Z

No veto on my end 😁

mffap · 2023-05-20T10:35:34Z

Updated the crawler: Not sure if the secondary index will be included in the search. Need to check if multiple actions can write to the same index. Having multiple indices causes issues in docsearch, since we can only define one index.

mffap · 2023-05-21T13:43:04Z

@dakshitha I've updated the crawl. Results should be reflected in the current search behavior. Can you please have a look? Any suggestions on how we could improve further?

dakshitha · 2023-05-25T10:49:26Z

Search works well for me now.

mffap · 2023-05-25T12:51:47Z

Done. Rest as follow-up issues.

mffap added the docs Improvements or additions to documentation label May 12, 2023

mffap self-assigned this May 12, 2023

mffap mentioned this issue May 12, 2023

docs: remove orphaned pages #5826

Closed

This was referenced May 16, 2023

docs(api): update api path #5876

Merged

docs: optimize titles for search #5880

Merged

mffap mentioned this issue May 20, 2023

docs(search): add getMissingResultsUrl #5893

Closed

mffap mentioned this issue May 25, 2023

Include API methods/calls in the generated documentation #5929

Open

mffap closed this as completed May 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve docs search #5825

Improve docs search #5825

mffap commented May 12, 2023 •

edited

Acceptance Criteria

mffap commented May 12, 2023

mffap commented May 13, 2023

mffap commented May 14, 2023 •

edited

mffap commented May 14, 2023

mffap commented May 14, 2023

fforootd commented May 15, 2023

mffap commented May 15, 2023

fforootd commented May 15, 2023 •

edited

mffap commented May 15, 2023

fforootd commented May 15, 2023

mffap commented May 20, 2023 •

edited

mffap commented May 21, 2023

dakshitha commented May 25, 2023

mffap commented May 25, 2023

Improve docs search #5825

Improve docs search #5825

Comments

mffap commented May 12, 2023 • edited

Acceptance Criteria

mffap commented May 12, 2023

mffap commented May 13, 2023

mffap commented May 14, 2023 • edited

mffap commented May 14, 2023

mffap commented May 14, 2023

fforootd commented May 15, 2023

mffap commented May 15, 2023

fforootd commented May 15, 2023 • edited

mffap commented May 15, 2023

fforootd commented May 15, 2023

mffap commented May 20, 2023 • edited

mffap commented May 21, 2023

dakshitha commented May 25, 2023

mffap commented May 25, 2023

mffap commented May 12, 2023 •

edited

mffap commented May 14, 2023 •

edited

fforootd commented May 15, 2023 •

edited

mffap commented May 20, 2023 •

edited