0

I am writing a php where one of the function is to scrape data from internet by Puppeteer. Thank you ggorlen for his help, my js work porperly. Now, I want to run the node.js in my php. I searched in the internet and try to imitate some examples but it fails. Here is my php(Bulletin Translator.php):

<!DOCTYPE html>
<html>     
<head>
<meta charset="utf-8" />
<title>contacts.php</title>
</head>
<body text="blue">

<?php
   exec('cd js');
   exec('node index.js'); 
?>

<?php
// Some php code here.
?>

</body>
</html>

The scraping js is put inside the js folder and is shown below: Structure1 Structure2

index.js:

const puppeteer = require('puppeteer');

//var date_in_YMD = new Date();

(async ()=>
{
    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    await page.goto(weather_report_chin_html)
    // let's just call them tweetHandle 
    //const bulletin_urls = await page.$$('div.leftBody > ul[class^="list"]');

    const bulletin_urls = await page.$$('div.leftBody');

    // loop thru all handles
    for(const bulletin_url of bulletin_urls)
    {
        try
        {
            const data = await page.$$eval(".NEW", els => els.map(el => (
            {
                text: el.textContent,
                href: el.href,
            })));
            console.log(data);
        }
        catch(err)
        {
            console.error(err);
        }
    }

    await browser.close()
}) ();

What I shall I do to run node.js in my php? And how shall I import my scraping result into my php? Any suggestion will be appreciated.

3
  • Is there some reason you don't just use a headless chrome library for PHP directly? Commented Apr 27, 2024 at 14:48
  • Just as an FYI, exec runs whatever you pass it in a new child process, and once complete the process is destroyed. This means that you can’t chain multiple calls as you are doing Commented Apr 27, 2024 at 14:56
  • If it were me I would scrape the data in an express.js script and "fetch" it from the webpage Commented Apr 29, 2024 at 0:32

1 Answer 1

1

Please read my answer to your last question. You don't need Puppeteer for this. If you're using PHP, use it directly rather than than Node. Working with PHP is going to be faster, easier to code and more maintainable in every respect:

<?php

use DiDom\Document;

require_once("vendor/autoload.php");

$url = "<Your URL>";
$html = file_get_contents($url);

if (!$html) {
    throw new Exception("Failed to fetch URL");
}

$document = new Document($html);

$data = [];
foreach ($document->find(".NEW") as $element) {
    $text = $element->text();
    $href = $element->getAttribute("href");
    $data[] = ["text" => $text, "href" => $href];
}

echo json_encode($data, JSON_PRETTY_PRINT) . "\n";
echo count($data) . "\n";

?>

I got this working on Ubuntu 22.04 using the following answers:


Since there are actual use cases for running Puppeteer from PHP and extracting the result (this probably isn't one of them), here's how to do it, for reference.

js/index.js:

const puppeteer = require("puppeteer"); // ^22.7.1

const url = "<Your URL>";

let browser;
(async () => {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  await page.setJavaScriptEnabled(false);
  await page.setRequestInterception(true);
  page.on("request", req => {
    req.url() === url ? req.continue() : req.abort();
  });
  await page.goto(url, {waitUntil: "domcontentloaded"});
  const data = await page.$$eval(".NEW", els => els.map(el => ({
    text: el.textContent,
    href: el.href,
  })));
  console.log(JSON.stringify({data}));
})()
  .catch(err => console.log(JSON.stringify({error: err.message})))
  .finally(() => browser?.close());

index.php:

<?php

$result = exec("node js/index");
$result = json_decode($result, true);

if (array_key_exists("error", $result)) {
    var_export($result["error"]);
}
else {
    var_export($result["data"]);
}

?>
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.