Skip to content

Commit

Permalink
more images
Browse files Browse the repository at this point in the history
  • Loading branch information
jeremyorr-hm committed Jul 25, 2023
1 parent 1526599 commit 2653a86
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 0 deletions.
Binary file added docs/guide/img/allowed-path.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/guide/img/blocked-path.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 7 additions & 0 deletions docs/guide/user-guide/03-configure-project.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,17 @@ To configure a sitemap as a starting url use the full path to the sitemap e.g. w

#### Allowed Path Patterns
The allowed path pattern is a way of refining the content the indexer chooses to index. When this is set, the indexer will evaluate the url path of the document being processed. If the url contains a match to the allowed path pattern the content will be extracted and indexed, if the url does not contain a pattern match the content will be ignored. If the allowed path is left blank, all content is extracted and indexed (unless it matches any blocked path pattern). Multiple allowed path ptterns can be set and content that matches any one of these will be extracted.
::: tip
Individual path entries in the configuration are only recognised if you hit return once they are entered - they are then displayed as individual entries
:::

![Allowed Path](../img/allowed-path.png)

#### Blocked Path Patterns
The blocked path pattern can be used to explicitly ignore content matching a certain pattern. Blocked path patterns are evaluated after allowed paths, so any content matching a blocked path will be ignored regardless of whether it matches an allowed path or not. Multiple blocked path patterns can be set or it can be left blank.

![Blocked Path](../img/blocked-path.png)

#### XPath
The indexer uses Machine Learning to infer what the most appropriate / primary content of a page or paragraph is, but sometimes this doesn't identify the content correctly. The XPath config parameter can be uesd to specifically instruct the indexer to extract content that matches the XPath.

Expand Down

0 comments on commit 2653a86

Please sign in to comment.