0

I've been trying to capture data from the following link:

I'm able to identify several frames:

SCROLL_PAUSE_TIME = 2
CYCLES=2
browser = webdriver.Firefox(firefox_options=opt)
browser.get(pge)
sleep(1)
comment_button = browser.find_elements_by_class_name('Ob2kfd') 
sleep(1)
comment_button[0].click() 
sleep(1)
html = browser.find_element_by_tag_name('html')
frames = browser.find_elements_by_tag_name('iframe')

this finds the frames:

[<selenium.webdriver.remote.webelement.WebElement 
(session="bbe62090fb83ba8774d855278b17b007", element="0.46172414237768167- 
3")>,
 <selenium.webdriver.remote.webelement.WebElement 
(session="bbe62090fb83ba8774d855278b17b007", element="0.46172414237768167- 
4")>,
 <selenium.webdriver.remote.webelement.WebElement 
 (session="bbe62090fb83ba8774d855278b17b007", 
 element="0.46172414237768167-5")>,
 <selenium.webdriver.remote.webelement.WebElement 
 (session="bbe62090fb83ba8774d855278b17b007", 
 element="0.46172414237768167-6")>,
  <selenium.webdriver.remote.webelement.WebElement 
 (session="bbe62090fb83ba8774d855278b17b007", 
 element="0.46172414237768167-7")>,
 <selenium.webdriver.remote.webelement.WebElement 
 (session="bbe62090fb83ba8774d855278b17b007", 
 element="0.46172414237768167-8")>]

Now the part that does not work... I'm unable to switch to the frame that has the reviews, I've tried many approaches:

browser.switch_to.frame(browser.find_element_by_tag_name("iframe"))

WebDriverWait(browser,10).until(EC.frame_to_be_available_and_switch_to_it((browser.find_element_by_tag_name("iframe"))))

WebDriverWait(browser, 20).until(EC.element_to_be_clickable((browser.find_element_by_tag_name("iframe"))))

browser.switch_to.default_content()

browser.switch_to.parent_frame()

browser.switch_to.frame(frames[0])

browser.switch_to.frame(frames[1])

#etc

I also tried to find frame IDs using browser, but I'm new to this:

browser.switch_to.frame("gci_91f30755d6a6b787dcc2a4062e6e9824.js")

I'm wanting to basically scroll down the reviews, but I'm stuck in the wrong frame it seems:

sleep(2)
for i in range(CYCLES):
    html.send_keys(Keys.DOWN)
    time.sleep(SCROLL_PAUSE_TIME)

but nothing works?

Please note it's not a duplicate, I appreciate that there are a few other posts with similar problems but I've really tried every approach mentioned and nothing seems to work! If someone can help it would really be much appreciated. If you can maybe try from the page link it does not seem to work.

7
  • You have mentioned about unable to switch to the frame that has the reviews. Are you trying to scrape the review texts e.g "Great service, in English too!", "Very good experience, personal are super professional and kind.", etc? Commented Apr 17, 2019 at 8:55
  • yes, i'm wanting to scrape every comment Commented Apr 17, 2019 at 9:02
  • Update the question what exactly you mean by ...every commments... Commented Apr 17, 2019 at 9:03
  • i cant seem to be able to edit the question unfortunately... Commented Apr 17, 2019 at 9:07
  • if I can be shown how to get the scrolling to happen in the right frame as per my question then i will be able to capture the comments.. my problem is not being able to switch frame Commented Apr 17, 2019 at 9:09

1 Answer 1

1

element is not inside any iframe.Try use webdriverwait and css selector.Here I tried with chrome and works fine.Printing me the text 127 Google reviews is that you are looking after.

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from selenium import webdriver

driver=webdriver.Chrome()
driver.get("https://www.google.com/search?rlz=1C1GCEU_en__835__835&ei=gfW0XKjNFeXjkgW8g6PYDA&q=huawei%20stores%20italy&oq=huawei+stores+italy&gs_l=psy-ab.3..0i22i30l3.4596.6067..6522...1.0..0.99.530.7......0....1..gws-wiz.......0i71j0i20i263j0i67j0j33i160.QYVpoia0BL4&npsic=0&rflfq=1&rlha=0&rllag=45499827,9211657,4837&tbm=lcl&rldimm=17275164572016510107&lqi=ChNodWF3ZWkgc3RvcmVzIGl0YWx5IgOIAQFaDwoNaHVhd2VpIHN0b3Jlcw&ved=2ahUKEwiKrMm8g9PhAhXQ4KQKHY9KDc8QvS4wAXoECAoQHQ&rldoc=1&tbs=lrf:!2m1!1e3!2m1!1e16!3sIAE,lf:1,lf_ui:4#rlfi=hd:;si:17275164572016510107,l,ChNodWF3ZWkgc3RvcmVzIGl0YWx5IgOIAQFaDwoNaHVhd2VpIHN0b3Jlcw;mv:!1m2!1d45.5258666!2d9.274078399999999!2m2!1d45.443338999999995!2d9.1152195;tbs:lrf:!2m1!1e3!2m1!1e16!3sIAE,lf:1,lf_ui:4")
element=WebDriverWait(driver,20).until(expected_conditions.element_to_be_clickable((By.CSS_SELECTOR,'span.fl span a span')))
print(element.text)

elements=WebDriverWait(driver,20).until(expected_conditions.presence_of_all_elements_located((By.CSS_SELECTOR,'div.Jtu6Td span')))

for ele in elements:
    print(ele.text)

First print will print

127 Google reviews

Second loop will print 3 review comments to see more comments you have to click on More Google review .I think that part you will do that.

Very good experience, personal are super professional and kind. I became now huawei client :)
The team is  really kind and extremely prepared
Good store everything you want from Huawei.

EDITED:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException


driver=webdriver.Chrome()
driver.get("https://www.google.com/search?rlz=1C1GCEU_en__835__835&ei=gfW0XKjNFeXjkgW8g6PYDA&q=huawei%20stores%20italy&oq=huawei+stores+italy&gs_l=psy-ab.3..0i22i30l3.4596.6067..6522...1.0..0.99.530.7......0....1..gws-wiz.......0i71j0i20i263j0i67j0j33i160.QYVpoia0BL4&npsic=0&rflfq=1&rlha=0&rllag=45499827,9211657,4837&tbm=lcl&rldimm=17275164572016510107&lqi=ChNodWF3ZWkgc3RvcmVzIGl0YWx5IgOIAQFaDwoNaHVhd2VpIHN0b3Jlcw&ved=2ahUKEwiKrMm8g9PhAhXQ4KQKHY9KDc8QvS4wAXoECAoQHQ&rldoc=1&tbs=lrf:!2m1!1e3!2m1!1e16!3sIAE,lf:1,lf_ui:4#rlfi=hd:;si:17275164572016510107,l,ChNodWF3ZWkgc3RvcmVzIGl0YWx5IgOIAQFaDwoNaHVhd2VpIHN0b3Jlcw;mv:!1m2!1d45.5258666!2d9.274078399999999!2m2!1d45.443338999999995!2d9.1152195;tbs:lrf:!2m1!1e3!2m1!1e16!3sIAE,lf:1,lf_ui:4")
element=WebDriverWait(driver,20).until(expected_conditions.element_to_be_clickable((By.CSS_SELECTOR,'span.fl span a span')))
print(element.text)
no_of_review=int(element.text.split()[0])
print(no_of_review)

elemore=WebDriverWait(driver,20).until(expected_conditions.element_to_be_clickable((By.XPATH,'//span[text()="More Google reviews"]')))
driver.execute_script("arguments[0].click();",elemore)

all_reviews = WebDriverWait(driver, 3).until(expected_conditions.presence_of_all_elements_located((By.CSS_SELECTOR, 'div.gws-localreviews__google-review')))

while len(all_reviews) < no_of_review:
    driver.execute_script('arguments[0].scrollIntoView(true);', all_reviews[-1])
    WebDriverWait(driver, 1).until_not(expected_conditions.presence_of_element_located((By.CSS_SELECTOR, 'div[class$="activityIndicator"]')))
    all_reviews = driver.find_elements_by_css_selector('div.gws-localreviews__google-review')

reviews = []
for review in all_reviews:
    try:
         full_text_element = review.find_element_by_css_selector('span.review-full-text')
         reviews.append(full_text_element)
    except NoSuchElementException:
        full_text_element = review.find_element_by_css_selector('span[class^="r-"]')
        reviews.append(full_text_element.get_attribute('textContent'))

print(reviews)
Sign up to request clarification or add additional context in comments.

2 Comments

Hi, thank you for your help. I try to click on the button but it does not work: comment_button = browser.find_elements_by_class_name('xwP61c') comment_button[0].click() but nothing happens
try follwoing code to click on the more button. elemore=WebDriverWait(driver,20).until(expected_conditions.element_to_be_clickable((By.XPATH,'//span[text()="More Google reviews"]'))) driver.execute_script("arguments[0].click();",elemore)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.