在Python中使用Selenium从XPath中提取文本的问题

分享于2022年07月17日 python selenium selenium-webdriver xpath 问答
【问题标题】:在Python中使用Selenium从XPath中提取文本的问题(Problem in extracting text from Xpath using selenium in python)
【发布时间】:2022-01-27 00:38:31
【问题描述】:

我正在使用 Selenium 从以下页面中提取数据。

页面网址:www2.miami-dadeclerk.com/cef/CitationSearch.aspx

点击对开本:0131230371470。 点击第一个。

我已使用以下代码提取某些信息:

templist = []

status = driver.find_element_by_xpath('.//*[@id="lblCitationHeader"]').text
total_due = driver.find_element_by_xpath('.//*[@id="lblCitationHeader"]').text
issue_dept = driver.find_element_by_xpath('.//*[@id="form1"]/div[4]/div[9]/div/div/div[2]/table/tbody/tr[5]/td[2]').text
lien_placed = driver.find_element_by_xpath('.//*[@id="lblLienPlaced"]').text

Table_dict = {
    'Status': status,
    'Total Due': total_due,
    'Issuing Department': issue_dept,
    'Lien_Placed': lien_placed
    }

templist.append(Table_dict)
df = pd.DataFrame(templist)

结果如下:

    Status  Total Due   Issuing Department  Lien_Placed
0   Citation No.: 2010 - S001916 Issue Date: 1/ ... Citation No.: 2010 - S001916 Issue Date: 1/ ... 05 ANIMAL SERVICES DEPARTMENT (305) 629-7387    

这里 lblCitationHeader 下的所有数据都在 Status 和 Total due 下。

为此,我提取了他们的 Xpath:

Status: //*[@id="lblCitationHeader"]/text()[3]
Total Due: //*[@id="lblCitationHeader"]/text()[4]

当我在代码中输入上述内容时:

status = driver.find_element_by_xpath('.//*[@id="lblCitationHeader"]/text()[3]').text

下面的错误来了:

Message: invalid selector: Unable to locate an element with the xpath expression .//*[@id="lblCitationHeader"]/text()[3]"] because of the following error:
SyntaxError: Failed to execute 'evaluate' on 'Document': The string './/*[@id="lblCitationHeader"]/text()[3]"]' is not a valid XPath expression.
  (Session info: chrome=96.0.4664.110)

我了解 Xpath 用于定位元素而不是文本。但是我无法找到存储文本的部分并将其返回。

图片供参考: Problem in extracting text from Xpath using selenium in python

我要提取的数据是:-

状态 应付总额 发行部 留置权


【解决方案1】:

对于当前文档的 STATUS TOTAL DUE ISSUING DEPT 字段都有一个值并提取你需要诱导的值 WebDriverWait 对于 visibility_of_element_located() ,您可以使用以下任一 Locator Strategies

代码块:

driver.get("https://www2.miami-dadeclerk.com/cef/CitationSearch.aspx")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.LINK_TEXT, "Folio"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input#txtFolioNumber"))).send_keys("0131230371470")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input#btnFolioSearch"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "table tbody>tr>td>a>span"))).click()
status = driver.execute_script('return arguments[0].childNodes[5].textContent;', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span#lblCitationHeader")))).strip()
total_due = driver.execute_script('return arguments[0].lastChild.textContent;', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span#lblCitationHeader")))).strip()
issue_dept = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//strong[contains(., 'Issuing Department:')]//following::td[1]/span"))).text
print(f"{status}--{total_due}--{issue_dept}")

控制台输出:

* DEPARTMENT CLOSED *--$0.00--05 ANIMAL SERVICES DEPARTMENT (305) 629-7387

注意 :您必须添加以下导入:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC