Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[generic] HTML content starting with BOM will be incorrectly identified as direct link to a video #4753

Closed
naglis opened this issue Jan 20, 2015 · 0 comments
Labels
bug

Comments

@naglis
Copy link
Collaborator

@naglis naglis commented Jan 20, 2015

The direct link detection (since 4e262a8) in generic will incorrectly assume HTML to be a direct link to a video if the HTML starts with BOM. Some examples:

This will render the generic extractor useless on some sites, see: #4534.

I've started working on a solution by simply striping the BOM, but soon realized that would not be enough, as we would need to decode the first_bytes using the corresponding encoding. So I though I'd bring this up, maybe you'll have better/simpler ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.