Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assets from minified CSS not downloaded #169

Closed
joshas opened this issue May 23, 2018 · 3 comments
Closed

Assets from minified CSS not downloaded #169

joshas opened this issue May 23, 2018 · 3 comments

Comments

@joshas
Copy link

joshas commented May 23, 2018

Background images from minified CSS were not downloaded. In minified CSS they are stored like this:

background:url(../images/logo/main.svg) no-repeat;
Same for webfonts declared in CSS.

@marcstern
Copy link

As most of modern web sites are now minified, this is a major problem

@marcstern
Copy link

In htsparse.c, we have on line 1348:

                      if (!nc && (nc = strfield(html, "url")) && (!isalnum(*(html - 1))) && *(html - 1) != '_') {  // url(url)
                        expected = '('; // parenthèse
                        expected_end = ")";     // fin: parenthèse
                        can_avoid_quotes = 1;
                        quotes_replacement = ')';
                      }

It's supposed to be supported !?!
Is it really a known bug unsolved for 4 years?

@foxtacles
Copy link

foxtacles commented Apr 2, 2023

I'm experiencing the same issue - the parser does find and download the first asset it can find in a minified CSS, but then stops. I've done some testing, and the only way to fix this it seems is to place at least one line break character between each asset URL.

Effectively that means working with minified CSS (and probably other files falling under the "javascript" parser code path) isn't possible right now. A minimal reproducible example is for instance:

.div{background:url(http://www.domain.com/quote-left.png) left no-repeat}.div2{background:url(http://www.domain.com/quote-right.png) left no-repeat}

It will get the first image, but not the second. This however works fine:

.div{background:url(http://www.domain.com/quote-left.png) left no-repeat}
.div2{background:url(http://www.domain.com/quote-right.png) left no-repeat}

I could imagine a fix shouldn't be too hard. I took a look at https://github.com/xroche/httrack/blob/master/src/htsparse.c but the code is quite a challenge to grasp (I may give it another shot once I have more time on my hands). To anyone who is already familiar with this code, I believe fixing this might be quite worthwhile considering most CSS etc. is minified today.

@xroche xroche closed this as completed Jan 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants