13

I'm new to DOM parsing in PHP:
I have a HTML file that I'm trying to parse. It has a bunch of DIVs like this:

<div id="interestingbox"> 
   <div id="interestingdetails" class="txtnormal">
        <div>Content1</div>
        <div>Content2</div>
   </div>
</div>

<div id="interestingbox"> 
......

I'm trying to get the contents of the many div boxes using php. How can I use the DOM parser to do this?

Thanks!

4 Answers 4

20

First i have to tell you that you can't use the same id on two different divs; there are classes for that point. Every element should have an unique id.

Code to get the contents of the div with id="interestingbox"

$html = '
<html>
<head></head>
<body>
<div id="interestingbox"> 
   <div id="interestingdetails" class="txtnormal">
        <div>Content1</div>
        <div>Content2</div>
   </div>
</div>

<div id="interestingbox2"><a href="#">a link</a></div>
</body>
</html>';


$dom_document = new DOMDocument();

$dom_document->loadHTML($html);

//use DOMXpath to navigate the html with the DOM
$dom_xpath = new DOMXpath($dom_document);

// if you want to get the div with id=interestingbox
$elements = $dom_xpath->query("*/div[@id='interestingbox']");

if (!is_null($elements)) {

  foreach ($elements as $element) {
    echo "\n[". $element->nodeName. "]";

    $nodes = $element->childNodes;
    foreach ($nodes as $node) {
      echo $node->nodeValue. "\n";
    }

  }
}

//OUTPUT
[div]  {
        Content1
        Content2
}

Example with classes:

$html = '
<html>
<head></head>
<body>
<div class="interestingbox"> 
   <div id="interestingdetails" class="txtnormal">
        <div>Content1</div>
        <div>Content2</div>
   </div>
</div>

<div class="interestingbox"><a href="#">a link</a></div>
</body>
</html>';

//the same as before.. just change the xpath

[...]

$elements = $dom_xpath->query("*/div[@class='interestingbox']");

[...]

//OUTPUT
[div]  {
        Content1
        Content2
}

[div]  {
a link
}

Refer to the DOMXPath page for more details.

Sign up to request clarification or add additional context in comments.

Comments

6

I got this to work using simplehtmldom as a start:

$html = file_get_html('example.com');
foreach ($html->find('div[id=interestingbox]') as $result)
{
    echo $result->innertext;
}

1 Comment

this is very easy to use
0

Very nice function from http://www.sitepoint.com/forums/showthread.php?611393-php5-need-something-like-innerHTML-instead-of-nodeValue

function innerXML($node) 

{ 

    $doc  = $node->ownerDocument; 

    $frag = $doc->createDocumentFragment(); 

    foreach ($node->childNodes as $child) 

    { 

        $frag->appendChild($child->cloneNode(TRUE)); 

    } 

    return $doc->saveXML($frag); 

}  


$dom = new DOMDocument(); 

$dom->loadXML(' 

<html> 

<body> 

<table> 

<tr> 

    <td id="foo">  

        The first bit of Data I want 

        <br />The second bit of Data I want 

        <br />The third bit of Data I want 

    </td> 

</tr> 

</table> 

<body> 

<html> 



'); 

$xpath = new DOMXPath($dom); 

$node = $xpath->evaluate("/html/body//td[@id='foo' ]"); 

$dataString = innerXML($node->item(0)); 
$dataArr = explode("<br />", $dataString); 

$dataUno = $dataArr[0]; 
$dataDos = $dataArr[1]; 
$dataTres = $dataArr[2]; 

echo "firstdata = $nameUno<br />seconddata = $nameDos<br />thirddata = $nameTres<br />"  

Comments

-1

WebExtractor: https://github.com/knyga/webextractor It can parse page with css, regex, xpath selectors.

Look package and tests for examples:

use WebExtractor\DataExtractor\DataExtractorFactory; use WebExtractor\DataExtractor\DataExtractorTypes; use WebExtractor\Client\Client;

$factory = DataExtractorFactory::getFactory(); $extractor = $factory->createDataExtractor(DataExtractorTypes::CSS); $client = new Client; $content = $client->get('https://en.wikipedia.org/wiki/2014_Winter_Olympics'); $extractor->setContent($content); $h1 = $extractor->setSelector('h1')->extract();

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.