Extraction strategy for deep crawlin #936
Unanswered
franjefriten
asked this question in
Forums - Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
In the documentation, it specifically tells to use the following structure when using
JsonXPathExtractionStrategyorJsonCssExtractionStrategy:However, this only leaves us with only the possibility of doing Shallow crawling and not Deep crawling, as both methods construct a lxml tree from the html of one page and require a base Selector to be given. Therefore, you cannot extract from another anidated page whose url is found in the html of the base page.
Beta Was this translation helpful? Give feedback.
All reactions