New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore pages with noindex #9
Comments
Hi, thanks for your note - I'm glad this helps! Regarding your use-case, unfortunately there's no in-built mechanism at the moment that does that... =/ So for now you may need to use the We may consider adding this feature in a future release, but implementation comes with its own set of challenges: Th quick way may be to string-detect, but it's definitely not robust. Because <meta name="robots" content="noindex" />
<meta content="noindex" name="robots">
<meta
name="robots
content="noindex"> are all valid HTML. The other way will be to parse the HTML (using JSDOM or such), but it's a non-trivial task and a resource-intensive operation that will significantly impact speed. Alternatively, you can continue to include |
It could be possible to only parse the node-html-parser claims it can parse an HTML file in under 2ms, which wouldn't be too much of a speed hit, bearing in mind most people would likely only use this tool before deploying their changes to a webserver. I use this directly after prettier which ends up spending up to 750ms per HTML file. I'll make some changes and see how an implementation of this could affect runtime. |
@zerodevx So I've made a version of the tool which follows It's slower than the normal version by roughly 4x... Benchmarking with 529 HTML files (totalling 50 MB), I found that by following the noindex tags, it took about 1400-1500ms. By ignoring them, it took about 350-380ms. At the moment I've implemented it as an argument which needs to be manually enabled. I'll PR and see what you think. |
That's great work! Looking through it right now. |
Looks really good to me. I'll merge #10 and release a new minor. Thanks for your contribution! 🎉 |
Thank you so much for this package! I really love using it and it saves me a lot of pain.
I wanted to ask if you could add an option to ignore pages which have been set to have indexing turned off...
e.g. pages with...
The text was updated successfully, but these errors were encountered: