-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow extraction of Item data without discovery process #35
Comments
Option 2 sounds good to me. |
We could call it |
Does it make sense to have one different spider per item type? We may end up with tens of different templates just for this. |
What would be the main advantage of using zyte-spider-templates compared to simply using Zyte API directly to perform product extraction? example: https://docs.zyte.com/zyte-api/usage/extract.html or is this more of like leveraging the features in Scrapy Cloud like periodic jobs, logging, storing items in Hubstorage, etc? |
It can be helpful for a no-code path, specially when in combination with #36. |
Description:
Currently, the ecommerce spider in the repository accepts an input URL for an ecommerce website as input and then performs a discovery process to find Products. There are some use cases where it is required to directly input a Product URL to retrieve specific product information without the need for website crawling. For example, for monitoring stock or price changes in a particular product, or to decouple the discovery and extraction processes.
Proposed Solution:
Update Ecommerce spider: Add a new crawling strategy
EcommerceCrawlStrategy
nameddisabled
to bypass any further navigation on the provided page. The spider should return the expected output Item type,Product
in this case.Create a new Spider: Instead of extending the current existing spider, we can build a new spider that is only focused on extraction, without any crawling functionality. In this case, we could drop the
crawl_strategy
andmax_requests
input parameters. Additionally, we can make the spider work with other item types a part from Product by adding an Output Type Selector with values like: Product, ProductNavigation, Article, etc...I prefer the second solution because it's more versatile and keeps the alignment of their original concept on the ecommerce spider.
The text was updated successfully, but these errors were encountered: