Ghost.py返回列表形式

问题描述:

我刚刚安装了Ghost.py为了刮一些网站,需要我有JavaScript。无论如何,在当前页面上获得可重复使用的表单列表,就像机械化模块将使用mechanize.Browser().forms()一样?或者,如果没有,我可以通过页面(所有JavaScript的东西已经加载)到机械库,并让它填写/提交表单?Ghost.py返回列表形式

+0

http://stackoverflow.com/questions/15513699/how-can-i-extract-the-list-of-urls-obtained-during-a-html-page-render- in-python – 2013-05-27 07:31:32

硒可以为你做到这一点,如果你不介意浏览器弹出在你的屏幕上。它也可以无头奔跑,但这很棘手。一个简单的办法:

from selenium import webdriver 

driver = webdriver.Firefox() 

url = "http://www.w3schools.com/html/html_forms.asp" 
driver.get(url) 
# get a list of the page's forms as Selenium WebElements 
# (webdriver API ref: http://selenium-python.readthedocs.org/en/latest/api.html) 
forms = driver.find_elements_by_xpath(".//form") 
for i, form in enumerate(forms): 
    print i, form.text 

# the last form, index number 5, has input tags of type "text" and "submit" 
""" 
<form name="input0" target="_blank" action="html_form_action.asp" method="get"> 
" 
Username: " 
<input type="text" name="user" size="20"> 
<input type="submit" value="Submit"> 
</form> 
""" 

# get the input WebElements from this form WebElement 
inputs = forms[5].find_elements_by_xpath(".//input") 
# write text to the text input, then submit the form 
inputs[0].send_keys('hihi frds!') 
inputs[1].submit() 
+0

好帖子...但是我可以不使用PhantomJS驱动程序而不是Firefox webdriver,通过只更改1行(driver = webdriver.PhantomJS)来完成这个任务。 –