Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete output on broken HTML like https://distrowatch.com/table.php?distribution=void #373

Open
eMPee584 opened this issue Mar 29, 2024 · 5 comments

Comments

@eMPee584
Copy link

Nice tool.. but there's a lot of broken HTML on the web I guess 😅
Tried monolith 'https://distrowatch.com/table.php?distribution=void' > distrowatch.com--void-linux.htm but output misses quite a lot compared to curling the htm file..
Maybe this is part of the problem: https://validator.w3.org/nu/?showoutline=yes&doc=https%3A%2F%2Fdistrowatch.com%2Ftable.php%3Fdistribution%3Dvoid

@snshn
Copy link
Member

snshn commented Mar 29, 2024

Hello Marcel,

I was able to get an almost 1:1 exact same page as on the web with that command. Could you please point out what seems to be missing in the saved file?

@eMPee584
Copy link
Author

It was missing the huge table following an H2 tag..
Oh wow, what the heck.. now it works here too 🤔
Unfortunately I had overwritten the file I previously got to diff with curl.. that was created with
monolith --no-css --no-images --no-js 'https://distrowatch.com/table.php?distribution=void':
distrowatch.com--void-linux.txt
Mh actually that is still missing it.. gotta run now got train to catch

@snshn
Copy link
Member

snshn commented Mar 29, 2024

Interesting. Could you please try saving it again and wait for another train? It doesn't look like I'm able to reproduce it on my end.

@snshn
Copy link
Member

snshn commented Mar 29, 2024

Looks like it needs to either have JS or CSS to render those tables, or alternatively you can provide this flag: -n. It'll unwrap NOSCRIPT tags and make it look the way things look in browsers that don't have JS enabled.

@eMPee584
Copy link
Author

eMPee584 commented Apr 28, 2024

Ok coming back to this, a more interesting observation is that output fluctuates... Try this command repeatedly:

monolith 'https://distrowatch.com/table.php?distribution=void' > distrowatch.com--void-linux.$(date +%F.%H%Mh%S).htm

The -n option actually makes no difference with that..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants