Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing dependencies in list plus ERROR due to unsanitized special characters #8

Open
cooperdk opened this issue Jul 6, 2022 · 2 comments
Assignees

Comments

@cooperdk
Copy link

cooperdk commented Jul 6, 2022

Hi,

you're missing a dependency in your list which is not available by default.
Why don't you instead include a requirements.txt file as per Python standards?

lxml

Also,
you need to sanitize the URLs in order to avoid errors with international and special characters. It's really easy:

sanitized_string = htmlentities(unsanitized_string)

You should just append the sanitized URL to the queue, I imagine.

@timbly5000
Copy link

timbly5000 commented Jul 15, 2022

Great utility - thanks

I have seen the exact same thing with lxml and UTF8 web pages
For example: http://www.themadhowes.org.uk/kpop/subtitles.html

@wiejakp
Copy link
Owner

wiejakp commented Dec 26, 2022

@cooperdk , I'm not familiar with python standards, would you mind creating PR with those changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants