I try to crawl rugdoc.io. When I do so manually, I first see a page which says Checking your browser. Just wait a moment. Then, after a second or so, the actual content gets displayed.
When I do so with selenium, I always stay on the wait a moment page.
How can it work manually but not with selenium? How can rugdoc.io know that the webpage is accessed automatically? Does selenium open Chrome with some extra options?
My code:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.binary_location = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
driver = webdriver.Chrome(executable_path="/Users/lukas.denk/Downloads/chromedriver", chrome_options=options)
driver.get("https://rugdoc.io/")
time.sleep(10)
#still the "just wait a moment" webpage
loaded_webpage_should_be_here=driver.page_source
Chrome version: 100.0.4896.127 (arm64).
ChromeDriver version: 100.0.4896.60.
MacOs: 12.3.1 - with M1 Max.
selenium version: 4.1.3.
Python version: 3.8
EDIT: It may have sth to do that selenium has problems with webpages that are redirecting to another webpage (see e.g., here). When I visit rugdoc.io, it seems to redirect me to https://rugdoc.io/?__cf_chl_tk=hkaULMeBxwgnTv0SgwmOY62fuDatlRLnupbDymXWWs0-1650454179-0-gaNycGzNBpE and then back to rugdoc.io.
However, the solution in the stackoverflow link proposes to use a driver.navigate().to() function which does not exist in the python selenium.