Scraping from archives feature #336

catfromplan9 · 2023-05-07T17:01:38Z

Add feature to scrape from archive site. Using that flag will detect for archive.today (theres a few backup domains ppl use so dont hardcode domain) and if it finds it, edit the html and remove the divs that contain the scraper stuff leaving behind just site contents. I did this manually and im sure it could be automated. And for archive.org you can parse out some html field on the site that contains a link to the un-archive.orgified webpage just as it was originally.

Also, another flag to disable the behaviour of converting links on the page if this archiving archive option is on. Converting links can work by looking for a second https:// or http:// after start of link

You could support other archive sites with this feature but i only know of these two. I did this manually with a site i archived using monolith and I havent seen any tool for parsing archive.org or archive.today sites into original format

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scraping from archives feature #336

Scraping from archives feature #336

catfromplan9 commented May 7, 2023

Scraping from archives feature #336

Scraping from archives feature #336

Comments

catfromplan9 commented May 7, 2023