Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate an option for following links within the same domain to a certain depth #8

Open
snshn opened this issue Aug 23, 2019 · 3 comments
Assignees
Labels
feature request Suggestions related to functionality enhancements

Comments

@snshn
Copy link
Member

snshn commented Aug 23, 2019

Suggested by HN user ajxs, source: https://news.ycombinator.com/item?id=20774594

@snshn snshn changed the title Incorporating an option for following links within the same domain to a certain depth Incorporate an option for following links within the same domain to a certain depth Aug 23, 2019
@dibstern
Copy link

+1, this is what would make this the tool that I need

@snshn snshn self-assigned this Aug 23, 2019
@snshn snshn added the feature request Suggestions related to functionality enhancements label Aug 25, 2019
@Alch-Emi
Copy link
Contributor

Alch-Emi commented Dec 7, 2019

How would this work with pages that are linked to multiple times? Would only one link work, or would the page and every resource it links to be duplicated?

@snshn
Copy link
Member Author

snshn commented Dec 7, 2019

If JS was something we could always rely on, then we'd be able to have just one dataURL link to some sub-page, with other links having something like a href="javascript:<click the first link to this resource on the page>". But we can't imply JS is always on, not to mention one of the features of monolith is to strip document off JS (mostly for security and privacy reasons). Hence the only way to do it is likely to cache nested dataURLs but still include them in the final output. Limiting depth and having code to avoid infinite loops would be the key here, but it's hard to predict what may go wrong, it's a very big and complex feature.
Since the main goal of the program is to save the resource as one file, the output should be one file even in case when sub-pages within the same domain are being embedded as dataURLs -- that undoubtedly will result in the file being very large in size and hard to edit due to a hrefs' dataURLs containing whole pages along with their assets; but I'm sure people who will archive web resources that way understand this, and mostly will use this feature for convenience of having one file on their filesystem representing that resource, even if it's very big and ugly. So we can't really save one resource as a separate file here and then just link to it from everywhere, unless we implement two modes of this feature: one where it's one file, and the other where it saves monolithic files next to one another. We'll need to implement an -o flag to let that happen, since the usual stdout way can't really tell where the monolithic HTML file's going to be saved.

snshn pushed a commit to snshn/monolith that referenced this issue Dec 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Suggestions related to functionality enhancements
Projects
None yet
Development

No branches or pull requests

3 participants