How to run node.js in php and how to import the scraping result into php

Question

I am writing a php where one of the function is to scrape data from internet by Puppeteer. Thank you ggorlen for his help, my js work porperly. Now, I want to run the node.js in my php. I searched in the internet and try to imitate some examples but it fails. Here is my php(Bulletin Translator.php):

<!DOCTYPE html>
<html>     
<head>
<meta charset="utf-8" />
<title>contacts.php</title>
</head>
<body text="blue">

<?php
   exec('cd js');
   exec('node index.js'); 
?>

<?php
// Some php code here.
?>

</body>
</html>

The scraping js is put inside the js folder and is shown below: Structure1 Structure2

index.js:

const puppeteer = require('puppeteer');

//var date_in_YMD = new Date();

(async ()=>
{
    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    await page.goto(weather_report_chin_html)
    // let's just call them tweetHandle 
    //const bulletin_urls = await page.$$('div.leftBody > ul[class^="list"]');

    const bulletin_urls = await page.$$('div.leftBody');

    // loop thru all handles
    for(const bulletin_url of bulletin_urls)
    {
        try
        {
            const data = await page.$$eval(".NEW", els => els.map(el => (
            {
                text: el.textContent,
                href: el.href,
            })));
            console.log(data);
        }
        catch(err)
        {
            console.error(err);
        }
    }

    await browser.close()
}) ();

What I shall I do to run node.js in my php? And how shall I import my scraping result into my php? Any suggestion will be appreciated.

Is there some reason you don't just use a headless chrome library for PHP directly? — ADyson
– ADyson, Commented Apr 27, 2024 at 14:48
Just as an FYI, exec runs whatever you pass it in a new child process, and once complete the process is destroyed. This means that you can’t chain multiple calls as you are doing — Chris Haas
– Chris Haas, Commented Apr 27, 2024 at 14:56
If it were me I would scrape the data in an express.js script and "fetch" it from the webpage — pguardiario
– pguardiario, Commented Apr 29, 2024 at 0:32

ggorlen · Accepted Answer · 2024-04-28 02:54:53Z

Please read my answer to your last question. You don't need Puppeteer for this. If you're using PHP, use it directly rather than than Node. Working with PHP is going to be faster, easier to code and more maintainable in every respect:

<?php

use DiDom\Document;

require_once("vendor/autoload.php");

$url = "<Your URL>";
$html = file_get_contents($url);

if (!$html) {
    throw new Exception("Failed to fetch URL");
}

$document = new Document($html);

$data = [];
foreach ($document->find(".NEW") as $element) {
    $text = $element->text();
    $href = $element->getAttribute("href");
    $data[] = ["text" => $text, "href" => $href];
}

echo json_encode($data, JSON_PRETTY_PRINT) . "\n";
echo count($data) . "\n";

?>

I got this working on Ubuntu 22.04 using the following answers:

Since there are actual use cases for running Puppeteer from PHP and extracting the result (this probably isn't one of them), here's how to do it, for reference.

js/index.js:

const puppeteer = require("puppeteer"); // ^22.7.1

const url = "<Your URL>";

let browser;
(async () => {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  await page.setJavaScriptEnabled(false);
  await page.setRequestInterception(true);
  page.on("request", req => {
    req.url() === url ? req.continue() : req.abort();
  });
  await page.goto(url, {waitUntil: "domcontentloaded"});
  const data = await page.$$eval(".NEW", els => els.map(el => ({
    text: el.textContent,
    href: el.href,
  })));
  console.log(JSON.stringify({data}));
})()
  .catch(err => console.log(JSON.stringify({error: err.message})))
  .finally(() => browser?.close());

index.php:

<?php

$result = exec("node js/index");
$result = json_decode($result, true);

if (array_key_exists("error", $result)) {
    var_export($result["error"]);
}
else {
    var_export($result["data"]);
}

?>

Collectives™ on Stack Overflow

How to run node.js in php and how to import the scraping result into php

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related