Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve docs search #5825

Closed
9 of 10 tasks
mffap opened this issue May 12, 2023 · 14 comments
Closed
9 of 10 tasks

Improve docs search #5825

mffap opened this issue May 12, 2023 · 14 comments
Assignees
Labels
docs Improvements or additions to documentation

Comments

@mffap
Copy link
Member

mffap commented May 12, 2023

Currently there are some issues with search, namely:

  • API docs fill all search results, instead of a mix of concepts, guides and api (--> hierarchy? context-based on which tab?) [example: search for "self-service" vs. search for "mfa"]
  • Synonyms should be handled (eg, permissions <> roles <> grants)
  • Search finds "orphaned" pages, ie. pages that are not in the sidebar
  • Technical: Analyze index/extraction errors

Acceptance Criteria

  1. docs
  2. 0 of 3
    resources

Resources

Current draft (actions):

actions: [
    {
      indexName: "zitadel",
      pathsToMatch: [
        "https://zitadel.com/docs/**",
        "!https://zitadel.com/docs/apis/resources/**",
      ],
      recordExtractor: ({ $, helpers }) => {
        // priority order: deepest active sub list header -> navbar active item -> 'Documentation'
        const lvl0 =
          $(".navbar__item.navbar__link--active").last().text() +
            " - " +
            $(".menu__link.menu__link--sublist.menu__link--active:first")
              .last()
              .text() || "General";

        return helpers.docsearch({
          recordProps: {
            lvl0: {
              selectors: "",
              defaultValue: lvl0,
            },
            lvl1: ".menu__link.menu__link--sublist.menu__link--active:last",
            lvl2: ["header h1", "article h1"],
            lvl3: ["header h2", "article h2"],
            lvl4: "article h3",
            lvl5: "article h4",
            lvl6: "article h5, article td:first-child",
            content: "article p, article li, article td:last-child",
          },
          indexHeadings: true,
          aggregateContent: true,
          recordVersion: "v3",
        });
      },
    },
    {
      indexName: "zitadel",
      pathsToMatch: ["https://zitadel.com/docs/apis/resources/**"],
      recordExtractor: ({ $, helpers }) => {
        // priority order: deepest active sub list header -> navbar active item -> 'Documentation'
        const lvl0 =
          $(".navbar__item.navbar__link--active").last().text() || "APIs";

        return helpers.docsearch({
          recordProps: {
            lvl0: {
              selectors: "",
              defaultValue: lvl0,
            },
            lvl1: ".menu__link.menu__link--sublist.menu__link--active:first",
            lvl2: ["header h2", "article h2"],
            lvl3: "article h3",
            lvl4: "article h4",
            lvl5: "article h5",
            lvl6: "article h6, article td:first-child",
            content: "article p:first",
          },
          indexHeadings: true,
          aggregateContent: true,
          recordVersion: "v3",
        });
      },
    },
  ],
@mffap mffap added the docs Improvements or additions to documentation label May 12, 2023
@mffap mffap self-assigned this May 12, 2023
@mffap
Copy link
Member Author

mffap commented May 12, 2023

@dakshitha feel free to add other observations

@mffap
Copy link
Member Author

mffap commented May 13, 2023

One issue identified regarding hierarchies: the crawler picks up either the top-level nav xor the active sidebar element as level0. That means as soon as you go down the hierarchy in the sidebar, you loose context of the top nav. With our current setup that does not make sense.

We should preserve at least top-nav as lv0 and the upper most category of the sidebar as lv1 with our current information architecture.

@mffap
Copy link
Member Author

mffap commented May 14, 2023

Regarding generated docs:

  • We can make Algolia crawl the API docs with a separate index to allow for the different structure
  • Can we add custom classes to some generated elements (missing h1, description, based path, path)? --> makes easier to select content
  • Some pages in "API" are not generated some are. Those types follow different formatting, which is relevant to the crawler. I could create a separate crawl logic for the generated pages, however they are not identifiable just from the url, only way would be to hard-code the relevant hard-coded paths in the crawler (not optimal for future-proofing). It might be worth to move the generated files ("Core Resources") under one path to make selection easier. --> drawback: breaks documentation links...

@mffap
Copy link
Member Author

mffap commented May 14, 2023

  • Can we add custom classes to some generated elements (missing h1, description, based path, path)? --> makes easier to select content

Could not figure out how to manipulate the md template (only mdx content), so we have to find another way.
Also re: h1 issue

@mffap
Copy link
Member Author

mffap commented May 14, 2023

  • Some pages in "API" are not generated some are. Those types follow different formatting, which is relevant to the crawler. I could create a separate crawl logic for the generated pages, however they are not identifiable just from the url, only way would be to hard-code the relevant hard-coded paths in the crawler (not optimal for future-proofing). It might be worth to move the generated files ("Core Resources") under one path to make selection easier. --> drawback: breaks documentation links...

@fforootd @hifabienne @dakshitha any opinions on this? You can see an example of the two indices in the issue description, should be clear from that what the struggle is :)

@fforootd
Copy link
Member

  • Some pages in "API" are not generated some are. Those types follow different formatting, which is relevant to the crawler. I could create a separate crawl logic for the generated pages, however they are not identifiable just from the url, only way would be to hard-code the relevant hard-coded paths in the crawler (not optimal for future-proofing). It might be worth to move the generated files ("Core Resources") under one path to make selection easier. --> drawback: breaks documentation links...

@fforootd @hifabienne @dakshitha any opinions on this? You can see an example of the two indices in the issue description, should be clear from that what the struggle is :)

Hm good question.

Is not everything in /api to be ranked lower then the other content?

@mffap
Copy link
Member Author

mffap commented May 15, 2023

Is not everything in /api to be ranked lower then the other content?

Yes and not relevant :) Ranking can be done per Index, since we have only one index atm all have the same rank (see hierarchy issue above). The real issue is that the generated pages follow a different format than the other pages, thus we need to parse them differently. At the moment they can only be separated by hard-coding the path.

I would actually like to put all generated content in a subpath like /api/resources
With that we can apply a different crawler/index to the sub-path. Also makes exclusion in gitinore etc. cleaner.

@fforootd
Copy link
Member

fforootd commented May 15, 2023

Ok, we can do this already IMO. We can separate the whole /api into its own index.
In my mind it makes no difference if it is generated or not.

(I am not sure if docsearch can do this out of the box)

@mffap
Copy link
Member Author

mffap commented May 15, 2023

Ok, we can do this already IMO. We can separate the whole /api into its own index. In my mind it makes no difference if it is generated or not.

(I am not sure if docsearch can do this out of the box)

Yes it does. I can do that change. Was looking for any vetos of you.

@fforootd
Copy link
Member

No veto on my end 😁

@mffap
Copy link
Member Author

mffap commented May 20, 2023

Updated the crawler: Not sure if the secondary index will be included in the search. Need to check if multiple actions can write to the same index. Having multiple indices causes issues in docsearch, since we can only define one index.

@mffap
Copy link
Member Author

mffap commented May 21, 2023

@dakshitha I've updated the crawl. Results should be reflected in the current search behavior. Can you please have a look? Any suggestions on how we could improve further?

@dakshitha
Copy link
Member

Search works well for me now.

@mffap
Copy link
Member Author

mffap commented May 25, 2023

Done. Rest as follow-up issues.

@mffap mffap closed this as completed May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants