-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow multiple URLs as input to the spiders #36
Comments
1 is friendlier with a no-code approach. If we are OK with only supporting HTTP/HTTPS start requests, we could automatically interpret input URLs with a different protocol as links to lists of input URLs, which would allow to support both 1 and 2 with a single spider parameter. Or we could follow an approach similar to that of |
What if we add a checkbox to indicate the URL is a input file instead ? It seems easier to use than having to prefix |
|
What if we had a separate |
The downside is that now you have to validate that people is not using |
Alternatively, we could have a single field and decide what to do with it based on the response headers from a HEAD request:
The downside is that this method requires one additional requests, and there could network / bans issues that affects its outcome. |
Yeah, then I think the boolean option is the way to go. Any naming suggestions? |
Description:
The current ecommerce spider in the repository accepts a single input URL. This means that crawling different categories in a website requires the creation of multiple spiders. However, this approach is impractical for several reasons:
Proposed Solution:
Implementing this feature can be achieved through various methods
My prefered solution is number 2 as is usually the simpler and more versatile of all of them. It allows to update the URLs set on the file without having to re-configure the spider and there is a huge number of options to host such files without getting handcuffed into a specific provider.
The text was updated successfully, but these errors were encountered: