Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoding of html entities in title does not work properly #264

Closed
paxter opened this issue Oct 23, 2021 · 2 comments
Closed

Decoding of html entities in title does not work properly #264

paxter opened this issue Oct 23, 2021 · 2 comments

Comments

@paxter
Copy link
Contributor

paxter commented Oct 23, 2021

Using: https://www.imdb.com/title/tt13950332/

Test code

$imdb = new \Imdb\Title(13950332);
echo $imdb->title();

Getting

While the Rest of Us Die: Secrets of America's Shadow Government

Expecting

While the Rest of Us Die: Secrets of America's Shadow Government

I took a look into the code and I could identify the following line in the title_year() function:

$this->main_title = htmlspecialchars_decode($match['title']);

To get it working properly I replaced the line with:

$this->main_title = html_entity_decode($match['title'], ENT_QUOTES, 'UTF-8');

I'm not sure if this is the proper or best solution, but it worked in my case. There are some other usages of htmlspecialchars_decode() in that function I have replaced too.

@tboothman
Copy link
Owner

tboothman commented Oct 23, 2021

Imdb seems to be only escaping characters that mean something to html, so I think htmlspecialchars_decode is an appropriate function to use here. Non ascii characters are represented as UTF-8.
Seems like a silly mistake in PHP someone made a long time ago probably with good intentions to exclude single quotes from these functions .. it's been fixed very recently though https://php.watch/versions/8.1/html-entity-default-value-changes

php > $a = '&"''';
php > echo htmlspecialchars_decode($a);
&"''
php > echo html_entity_decode($a);
&"''
php > echo html_entity_decode($a, ENT_QUOTES, 'UTF-8');
&"''
php > echo htmlspecialchars_decode($a, ENT_QUOTES);
&"''

Some examples of <title> elements:

Nausicaä of the Valley of the Wind
Forhøret
&quot;Firefly&quot; The Train Job (TV Episode 2002)
While the Rest of Us Die: Secrets of America&#x27;s Shadow Government

tboothman added a commit that referenced this issue Oct 23, 2021
@paxter
Copy link
Contributor Author

paxter commented Oct 23, 2021

Thanks for your fast reply. Your provided solution is working for me too. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants