Webpage does not load in python selenium with Chrome

Question

I try to crawl rugdoc.io. When I do so manually, I first see a page which says Checking your browser. Just wait a moment. Then, after a second or so, the actual content gets displayed. When I do so with selenium, I always stay on the wait a moment page.
How can it work manually but not with selenium? How can rugdoc.io know that the webpage is accessed automatically? Does selenium open Chrome with some extra options? My code:

from selenium import webdriver

options = webdriver.ChromeOptions()
options.binary_location = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
driver = webdriver.Chrome(executable_path="/Users/lukas.denk/Downloads/chromedriver", chrome_options=options)

driver.get("https://rugdoc.io/")
time.sleep(10)

#still the "just wait a moment" webpage
loaded_webpage_should_be_here=driver.page_source

Chrome version: 100.0.4896.127 (arm64).
ChromeDriver version: 100.0.4896.60.
MacOs: 12.3.1 - with M1 Max.
selenium version: 4.1.3.
Python version: 3.8

EDIT: It may have sth to do that selenium has problems with webpages that are redirecting to another webpage (see e.g., here). When I visit rugdoc.io, it seems to redirect me to https://rugdoc.io/?__cf_chl_tk=hkaULMeBxwgnTv0SgwmOY62fuDatlRLnupbDymXWWs0-1650454179-0-gaNycGzNBpE and then back to rugdoc.io.
However, the solution in the stackoverflow link proposes to use a driver.navigate().to() function which does not exist in the python selenium.

Claudio Batista · Accepted Answer · 2022-04-20 11:28:23Z

1

Had to run your code to understand your problem :(

The issue you are running into is the DDoS CloudFare protection that won't allow for webdriver requests to go through to protect the site against automatic requests and DDoS :)

You can check this webdriver alternative that doesn't have those restrictions: https://github.com/ultrafunkamsterdam/undetected-chromedriver

answered Apr 20, 2022 at 11:28

Claudio Batista

3411 silver badge10 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Lukas Over a year ago

But do you know how it detects that I am crawling with webdriver? Since webdriver just opens the webpage once, there is no reason for the webdriver to behave differently from a real person. At least in theory.

Claudio Batista Over a year ago

There are several ways that this detection occurs. Check this blog post for some examples and how to (manually) bypass them: piprogramming.org/articles/…

Collectives™ on Stack Overflow

Webpage does not load in python selenium with Chrome

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related