Skip to content
This repository has been archived by the owner on Sep 17, 2020. It is now read-only.

Support directly replay of URLs that are not pages (was: Problem playing back WARC with long URL:s) #77

Open
peterk opened this issue Dec 12, 2018 · 2 comments

Comments

@peterk
Copy link

peterk commented Dec 12, 2018

I have WARC files collected with node-warc 3.1.0 that can not be opened in Webrecorder player (No pages found). The only discerning characteristic is that the files are archived from Facebook posts with long URL:s. Other files archived with the same tool seem to work fine.

Listing URLs from the WARC with warcio works. Not sure if this is a bug in Webrecorder player or node-warc. Example file in the related node-warc issue: N0taN3rd/node-warc#25

Version details:
webrecorder player 1.6.1 (Mac)
webrecorder 4.1.5 (@e926c65)
pywb 2.1.1 (@3e0bb49)
har2warc 1.0.4
warcio 1.6.2

@peterk
Copy link
Author

peterk commented Dec 12, 2018

The same file is opened correctly in openwayback.

@ikreymer
Copy link
Member

To clarify, the issue is not that the file doesn't load, it's related to the page detection.
To make things easier for the user, when opening non-Webrecorder WARCs, we attempt to 'detect' which URLs are pages, and you are right in that long urls are occasionally rejected. (The other option is for squidwarc to write the page metadata directly as WR does, and @N0taN3rd and I are looking into that as well).

openwayback does not have any such page detection, but allows you to enter urls directly. We also need to add support for loading an arbitrary URL that you know, even if its not detected as a page.
We plan to make exploring the WARC easier as well.

@ikreymer ikreymer changed the title Problem playing back WARC with long URL:s Support directly replay of URLs that are not pages (was: Problem playing back WARC with long URL:s) Dec 12, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants