You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
data:image URLs are not fully ignored despite setting urlFilter to filter out URLs containing 'data:image'. It looks like there are two URLs extracted from data:image URL - one with data:image/gif;base64, another with the rest part after , - R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
website-scraper:debug filtering out { url: "data:image/gif;base64", filename: "undefined", depth: 3 } by url filter +0ms
website-scraper:debug found requested resource for { url: "http://example.com/product/abc/R0lGODlhAQABAIAAAAAAAP///R0lGODlhAQABAIAAAAAAAP///R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7", filename: "undefined", depth: 3
} +0ms
R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7 is attached to base URL
Expected behavior: There should not be any downloads from URLs like Base URL + data:image URL
Actual behavior: HTML is downloaded as yH5BAEAAAAALAAAAAABAAEAAAIBRAA7_1.html
Additional Information
Downloaded HTML contains extra space between data:image/gif;base64, and R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Configuration
version: 4.2.3
options:
Description
The page contains image like:
data:image
URLs are not fully ignored despite settingurlFilter
to filter out URLs containing'data:image'
. It looks like there are two URLs extracted fromdata:image
URL - one withdata:image/gif;base64
, another with the rest part after,
-R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
is attached to base URLExpected behavior: There should not be any downloads from URLs like Base URL + data:image URL
Actual behavior: HTML is downloaded as
yH5BAEAAAAALAAAAAABAAEAAAIBRAA7_1.html
Additional Information
Downloaded HTML contains extra space between
data:image/gif;base64,
andR0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
I'm sorry I cannot provide the real URL. It's an internal URL on the customer side.
The text was updated successfully, but these errors were encountered: