在for语句中，我能够获得预期的结果。但为什么我不能用while语句得到预期的结果？

问题描述：

我想用网络浏览器检查'Web Scraping with Pytho code'的操作。在for语句中，我能够获得预期的结果。但是，尽管如此，我无法获得预期的结果。在for语句中，我能够获得预期的结果。但为什么我不能用while语句得到预期的结果？

刮通过跟踪维基百科

的URL

环境

·的Python 3.6.0

·瓶0.13-dev的

·mod_wsgi的-4.5.15

Apache错误日志

无输出

ERR_EMPTY_RESPONSE。

刮痧没有完成处理

index.py

from urllib.request import urlopen 
from bs4 import BeautifulSoup 
from bottle import route, view 
import datetime 
import random 
import re 

@route('/') 
@view("index_template") 

def index(): 
    random.seed(datetime.datetime.now()) 
    html = urlopen("https://en.wikipedia.org/wiki/Kevin_Bacon") 
    internalLinks=[] 
    links = getLinks("/wiki/Kevin_Bacon") 
    while len(links) > 0: 
     newArticle = links[random.randint(0, len(links)-1)].attrs["href"] 
     internalLinks.append(newArticle) 
     links = getLinks(newArticle) 
    return dict(internalLinks=internalLinks) 

def getLinks(articleUrl): 
    html = urlopen("http://en.wikipedia.org"+articleUrl) 
    bsObj = BeautifulSoup(html, "html.parser") 
    return bsObj.find("div", {"id":"bodyContent"}).findAll("a", href=re.compile("^(/wiki/)((?!:).)*$"))

在for语句中，我能得到预期的结果。

结果Web浏览器输出的

['/wiki/Michael_C._Hall', '/wiki/Elizabeth_Perkins', 
'/wiki/Paul_Erd%C5%91s', '/wiki/Geoffrey_Rush', 
'/wiki/Virtual_International_Authority_File']

index.py

from urllib.request import urlopen 
from bs4 import BeautifulSoup 
from bottle import route, view 
import datetime 
import random 
import re 
@route('/') 
@view("index_template") 
def index(): 
    random.seed(datetime.datetime.now()) 
    html = urlopen("https://en.wikipedia.org/wiki/Kevin_Bacon") 
    internalLinks=[] 
    links = getLinks("/wiki/Kevin_Bacon") 
    for i in range(5): 
     newArticle = links[random.randint(0, len(links)-1)].attrs["href"] 
     internalLinks.append(newArticle) 
    return dict(internalLinks=internalLinks) 
def getLinks(articleUrl): 
    html = urlopen("http://en.wikipedia.org"+articleUrl) 
    bsObj = BeautifulSoup(html, "html.parser") 
    return bsObj.find("div", {"id":"bodyContent"}).findAll("a", href=re.compile("^(/wiki/)((?!:).)*$"))

你有没有尝试添加一个断点，并跟踪你的代码，看看它能走多远？或者至少添加一些'print'语句来查看它提取的结果是什么？ – Soviut

另外，请删除与您的问题无关的所有代码。 wsgi代码，视图等等。他们很难弄清楚应该关注什么。 – Soviut

我删除了wsgi代码。 – re1

答

您links列表的长度从未到达0因此它会继续运行，而循环，直到连接时间出。

您的for循环的工作原理是迭代range，所以一旦达到最大范围，它就会退出。

您从来没有解释过为什么要使用while循环，但是如果您希望在经过一定次数的迭代后退出，您需要使用计数器。

counter = 0 

# this will exit on the 5th iteration 
while counter < 5: 
    print counter # do something 

    counter += 1 # increment the counter after each iteration

前面将打印

0 1 2 3 4

我误解了由于跟踪链接导致链表长度达到0 – re1

只是要清楚你没有链表，你有一个链接列表;） – Soviut

在for语句中，我能够获得预期的结果。但为什么我不能用while语句得到预期的结果？

相关推荐