4
$\begingroup$

In the past I have used the function Import in relation with "Hyperlink" to collect hyperlinks and data from the internet.

For example:

Import["https://www.google.nl/search?q=Wolfram", "Hyperlinks"]

gives a list with all hyperlinks. In the next step I can use these hyperlinks to scrape the web pages. This works fine.

When I want to use the search engine DuckDuckGo, like:

Import["https://duckduckgo.com/?q=%22Wolfram%22&t=h_&ia=web", "Hyperlinks"]

the output is:

{ "https://duckduckgo.com//?t=h_", "https://duckduckgo.com/html/?q=%22Wolfram%22" }

When I go to the webpage of DuckDuckGo and type a question, then I get a full list of search results. My question is:

How can I collect this data using Wolfram Language?

$\endgroup$

1 Answer 1

6
$\begingroup$

Let's step back and do something simpler:

Import["https://duckduckgo.com/?q=%22Wolfram%22&t=h_&ia=web"]
(* "Ignore this box please.

DuckDuckGo

You are being redirected to the non-JavaScript site.
Click here if it doesn't happen automatically." *)

Oh, you are been redirected! To where?

Import[
 "https://duckduckgo.com/?q=%22Wolfram%22&t=h_&ia=web",
 "Hyperlinks"
 ]
(* {"https://duckduckgo.com//?t=h_", "https://duckduckgo.com/html/?q=%22Wolfram%22"} *)

Aha!, here "https://duckduckgo.com/html/?q=%22Wolfram%22"

Then we better use "https://duckduckgo.com/html/?q=" directly

getLinks[query_String] := Union@Import[
   StringTemplate["https://duckduckgo.com/html/?q=%22``%22\""]@
    URLEncode[query]
   , "Hyperlinks"
   ]


getLinks["Mathematica"]
$\endgroup$
4
  • $\begingroup$ Hi, this is a nice answer. Below the duckduckgo page you can go to the next page. Do you have a suggestion how to do that with a Mathematica Script? $\endgroup$ Commented Jul 12, 2018 at 9:00
  • $\begingroup$ @MichielvanMens No, I don't. The next button is a "POST" form, it gets complicated. In any case, you are moving the goalpost to a place that is is no longer a Mathematica question. Let's stay on-topic. $\endgroup$ Commented Jul 12, 2018 at 17:21
  • 1
    $\begingroup$ @MichielvanMens in Mathematica 11.3 look at "WebDriver-Chrome" which allows "ClickElement". $\endgroup$ Commented Jul 17, 2018 at 17:50
  • $\begingroup$ @MichielvanMens I have posted a question and answer that may help you $\endgroup$ Commented Jul 26, 2018 at 7:25

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.