You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ISSUE: When the rootUrl does not match the urlFilter criteria, recursive scraping terminates at the root page.
RECOMMENDED SOLUTION: Do not apply the urlFilter to the rootUrl; or make it an option to ignore the rootUrl from scraping.
DESIRED: I'm specifying a rootUrl and would like the scraper to recurse through all hyperlinks. The rootUrl will not be downloaded in this scenario. When the scraper finds a hyperlink ending in .abc it should download the file.
ACTUAL: The rootUrl (see code below) does not meet the urlFilter criteria and the scraper stops with no recursion. The scraper should find a hyperlink to http://trillian.mit.edu/~jc/music/book/SCD/Book45.abc in the rootUrl among other .abc urls, but it does not. Note that when I set the rootUrl equal to an .abc url, e.g. the example above, the file downloads as expected.
See this discussion thread for more detail.
ISSUE: When the
rootUrl
does not match theurlFilter
criteria, recursive scraping terminates at the root page.RECOMMENDED SOLUTION: Do not apply the
urlFilter
to therootUrl
; or make it an option to ignore therootUrl
from scraping.DESIRED: I'm specifying a
rootUrl
and would like the scraper to recurse through all hyperlinks. TherootUrl
will not be downloaded in this scenario. When the scraper finds a hyperlink ending in.abc
it should download the file.ACTUAL: The
rootUrl
(see code below) does not meet theurlFilter
criteria and the scraper stops with no recursion. The scraper should find a hyperlink to http://trillian.mit.edu/~jc/music/book/SCD/Book45.abc in therootUrl
among other.abc
urls, but it does not. Note that when I set therootUrl
equal to an.abc
url, e.g. the example above, the file downloads as expected.The text was updated successfully, but these errors were encountered: