Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect processing of <script type="text/html"> #26

Closed
roadster31 opened this issue Jan 5, 2019 · 3 comments
Closed

Incorrect processing of <script type="text/html"> #26

roadster31 opened this issue Jan 5, 2019 · 3 comments

Comments

@roadster31
Copy link

roadster31 commented Jan 5, 2019

Hello,

I often use the construct <script id="some-id" type="text/html"> some HTML code </script> to inject HTML code in the DOM. The HTML code between <script> and </script> is incorrectly processed by HtmlMin.

What is this feature about (expected vs actual behaviour)?

Source code :

<!doctype html>
<html lang="fr">
<head>
    <title>Test</title>
</head>
<body>
    A Body

    <script id="elements-image-1" type="text/html">
        <div class="place badge-carte">Place du Village<br>250m - 2mn à pied</div>
        <div class="telecabine badge-carte">Télécabine du Chamois<br>250m - 2mn à pied</div>
        <div class="situation badge-carte"><img src="https://domain.tld/assets/frontOffice/kneiss/template-assets/assets/dist/img/08ecd8a.png" alt=""></div>
    </script>
</body>
</html>

Expected behaviour :

<!DOCTYPE html><html lang="fr"><head><title>Test</title></head><body>A Body<script id="elements-image-1" type="text/html">
        <div class="place badge-carte">Place du Village<br>250m - 2mn à pied</div>
        <div class="telecabine badge-carte">Télécabine du Chamois<br>250m - 2mn à pied</div>
        <div class="situation badge-carte"><img src="https://domain.tld/assets/frontOffice/kneiss/template-assets/assets/dist/img/08ecd8a.png" alt=""></div>
    </script></body></html>

Actual behaviour :

<!DOCTYPE html><html lang="fr"><head><title>Test</title></head><body>A Body<script id="elements-image-1" type="text/html">
        <div class="place badge-carte">Place du Village<br>250m - 2mn à pied
        <div class="telecabine badge-carte">Télécabine du Chamois<br>250m - 2mn à pied
        <div class="situation badge-carte"><img src="https://domain.tld/assets/frontOffice/kneiss/template-assets/assets/dist/img/08ecd8a.png" alt="">
    </script></body></html>

How can I reproduce it?

Use the above source code.

Does it take minutes, hours or days to fix?

Not sure about that. Maybe minutes to ignore <script type="text/html"> content ?

Any additional information?

Thanks for your work :)

@roadster31
Copy link
Author

roadster31 commented Jan 5, 2019

After a few tests, it seems that DOMDocument::loadHTML() is the root cause of this problem. Loading the test document and saving it immediately gives the following result, where </div> are missing :

<!DOCTYPE html>
<?xml encoding="UTF-8" ?><html lang="fr"><head><title>Test</title></head><body>
    A Body

    <script id="elements-image-1" type="text/html">
        <div class="place badge-carte">Place du Village<br>250m - 2mn &agrave; pied
        <div class="telecabine badge-carte">T&eacute;l&eacute;cabine du Chamois<br>250m - 2mn &agrave; pied
        <div class="situation badge-carte"><img src="https://domain.tld/assets/frontOffice/kneiss/template-assets/assets/dist/img/08ecd8a.png" alt="">
    </script></body></html>

I'll investigate and get back to you if I find something interesting about that.

@roadster31
Copy link
Author

After digging in StackOverflow, it seems that the only possible solution is parsing the HTML as XML, after processing self-closing tags to provide a valid XML document to the XML loader :

https://stackoverflow.com/questions/19788017/how-to-combine-phps-domdocument-with-a-javascript-template

voku added a commit to voku/simple_html_dom that referenced this issue Jan 6, 2019
@voku
Copy link
Owner

voku commented Jan 6, 2019

fixed in version 3.1.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants