3

I use DOMDocument to manipulate html and php 7. The problem is that text shows good on page (cyrillic), but when I go to "See HTML page source", it is not good. It shows like this: Здесь осн

What might be wrong? <meta> charset is utf-8. My code:

$dom = new DOMDocument();
if (@$dom->loadHTML(mb_convert_encoding("<div>$body</div>", 'HTML-ENTITIES', 'UTF-8'), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD)) {

    // https://stackoverflow.com/questions/29493678/loadhtml-libxml-html-noimplied-on-an-html-fragment-generates-incorrect-tags

    $container = $dom->getElementsByTagName('div')->item(0);
    $container = $container->parentNode->removeChild($container);

    while ($dom->firstChild)
        $dom->removeChild($doc->firstChild);

    while ($container->firstChild )
        $dom->appendChild($container->firstChild);

    $xpath = new DOMXPath($dom); 
    $headlines = $xpath->query("//h2");
    // some code..

    return $dom->saveHTML();
}
1

1 Answer 1

9
+50

The problem is with $dom->saveHTML();, you need to add the root node as a parameter, like this:

return $dom->saveHTML((new \DOMXPath($dom))->query('/')->item(0));

The suddenly it renders the page differently, with substitution. If it does not, double check the values of $dom->encoding and $dom->substituteEntities, they should read UTF-8 and TRUE.

Sign up to request clarification or add additional context in comments.

4 Comments

How did you know that, my friend? I read a lot of sources and nobody wrote about this solution
From memory, I had the same problem with my own framework years ago. The shorter syntax would be $dom->saveHTML($dom->documentElement);
@sirjay Others found the solution as well. The behaviour is not document on php.net. Also Google has no results about this, so it must be something within the SaveHTML function passing parameters to libxml, I doubt the PHP team knowns this, there is no bug report. It's something users found out for themselves.
Incredible, this fix saved our bacon still in 2024. Would never have known how to fix this issue without this!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.