Extract data from HTML table row column

Question

How to extract data from HTML table in PHP. The data is in this format

Table 1

<tr><td class="body" valign="top"><a href="example"><b>DATA</b></a></td><td class="body" valign="top">Data_Text</td></tr>

Table 2

<tr><th><div id="Data">Data</div></th><td>Data_Text_1</td><td>Data_Text_2</td></tr>

Table 3

<tr><td width="120"><a href="example" target="_blank">DATA</a></td><td>Data_Text</td></tr>

I want to get the Data & Data_Text or (Data_Text_1 & Data_Text_2) from the 3 tables.
I've used

$html = file_get_contents($link);
$doc = new DOMDocument();
@$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$nodes  = $xpath->query('//td[]');
$nodes2 = $xpath->query('//td[]');

But it cant show any data !

I'll offer bounty for this question on day after tomorrow

There seems to be some mistake: You cannot obtain "Data_Text" from Table 2 -- it doesn't have a text node with such string value. Please, edit and correct. — Dimitre Novatchev
– Dimitre Novatchev, Commented Apr 29, 2012 at 4:21

pdizz · Accepted Answer · 2012-04-29 15:11:27Z

1

Using simplehtmldom.php...

<?php

include 'simple_html_dom.php';

$html = file_get_html('thetable.html');

$rows = $html->find('tr');
foreach($rows as $row) {
    echo $row->plaintext;
}

?>

or use 'td'...

<?php

include 'simple_html_dom.php';

$html = file_get_html('thetable.html');

$cells = $html->find('td');
foreach($cells as $cell) {
    echo $cell->plaintext;
}

?>

edited Apr 29, 2012 at 15:11

answered Apr 29, 2012 at 3:39

pdizz

4,2404 gold badges31 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Nicolás Ozimica · Accepted Answer · 2012-04-29 03:59:51Z

Given an HTML document called xpathTables.html like this:

<html>
  <body>
    <table>
      <tbody>
        <tr><td class="body" valign="top"><a href="example"><b>DATA</b></a></td><td class="body" valign="top">Data_Text</td></tr>
      </tbody> 
    </table>

    <table>
      <tbody>
        <tr><th><div id="Data">Data</div></th><td>Data_Text_1</td><td>Data_Text_2</td></tr>
      </tbody>
    </table>

    <table>
      <tbody>
        <tr><td width="120"><a href="example" target="_blank">DATA</a></td><td>Data_Text</td></tr>
      </tbody>
    </table>
  </body>
</html>

And this PHP script:

<?php

$link = "xpathTables.html";

$html = file_get_contents($link);
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$tables = $doc->getElementsByTagName('table');

$nodes  = $xpath->query('.//tbody/tr/td/a/b', $tables->item(0));
var_dump($nodes->item(0)->nodeValue);
$nodes  = $xpath->query('.//tbody/tr/td[@class="body"]', $tables->item(0));
var_dump($nodes->item(1)->nodeValue);

$nodes  = $xpath->query('.//tbody/tr/th/div[@id="Data"]', $tables->item(1));
var_dump($nodes->item(0)->nodeValue);
$nodes  = $xpath->query('.//tbody/tr/td', $tables->item(1));
var_dump($nodes->item(0)->nodeValue);
$nodes  = $xpath->query('.//tbody/tr/td', $tables->item(1));
var_dump($nodes->item(1)->nodeValue);

$nodes  = $xpath->query('.//tbody/tr/td/a', $tables->item(2));
var_dump($nodes->item(0)->nodeValue);
$nodes  = $xpath->query('.//tbody/tr/td', $tables->item(2));
var_dump($nodes->item(1)->nodeValue);

You get this output:

string(4) "DATA"
string(9) "Data_Text"
string(4) "Data"
string(11) "Data_Text_1"
string(11) "Data_Text_2"
string(4) "DATA"
string(9) "Data_Text"

I didn't understood well your question, so I made this example in order to show all the text nodes your tables had. If you are only interested in some of those nodes, you should pick the XPath queries that do the job.

I included the tags table and tbody, just to make the example more HTML like.

Dimitre Novatchev · Accepted Answer · 2012-04-29 04:39:28Z

Use this single XPath expression:

/*/table/tr//text()[normalize-space()]

This selects any text-node that consists not only odf white-space characters and that is a descendant of any tr element that is a child of a table element that is a child of the top element of the document.

XSLT - based verification:

 <xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select=
   "/*/table/tr//text()[normalize-space()]"/>

. . . . . . .
  <xsl:for-each select=
    "/*/table/tr//text()[normalize-space()]">
    "<xsl:copy-of select="."/>"
  </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied against the following XML document:

<html>
 <table>
    <tr>
        <td class="body" valign="top">
            <a href="example">
                <b>DATA</b>
            </a>
        </td>
        <td class="body" valign="top">Data_Text</td>
    </tr>
 </table>

 <table>
    <tr>
        <th>
            <div id="Data">Data</div>
        </th>
        <td>Data_Text_1</td>
        <td>Data_Text_2</td>
    </tr>
 </table>

 <table>
    <tr>
        <td width="120">
            <a href="example" target="_blank">DATA</a>
        </td>
        <td>Data_Text</td>
    </tr>
 </table>
</html>

the XPath expression is evaluated and the selected text nodes are output (twice -- once as the result of the evaluation and they appear concatenated, the second time each selected node is output on a separate line and surrounded by quotes):

DATAData_TextDataData_Text_1Data_Text_2DATAData_Text

. . . . . . .

"DATA"

"Data_Text"

"Data"

"Data_Text_1"

"Data_Text_2"

"DATA"

"Data_Text"

Collectives™ on Stack Overflow

Extract data from HTML table row column

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related