scrapy如何从miltiable来源的收益

问题描述：

前几天我问这个：scrapy getting values from multiple sites scrapy如何从miltiable来源的收益

，我已经学会了如何从WEBSITE1传递价值WEBSITE2。这让我从这两个网站的收益率信息，这不能解决，当我有10个不同的网站。

我可以保持从函数传递值到函数，但它似乎是愚蠢的。更有效的方法是将信息接收到解析函数并从那里产生。这里是我想要实现的伪代码。

import scrapy 

class GotoSpider(scrapy.Spider): 
    name = 'goto' 
    allowed_domains = ['first.com', 'second.com', 'third.com'] 
    start_urls = ['http://first.com/'] 

def parse(self, response): 
    name = response.xpath(...) 
    price1 = scrapy.Request(second.com, callback = self.parse_check) 
    price2 = scrapy.Request(third.com, callback = self.parse_check2) 
    yield(name, price1, price2) 


def parse_check(self, response): 
    price = response.xpath(...) 
    return price 

def parse_check(self, response): 
    price = response.xpath(...) 
    return price

答

查看scrapy-inline-requests，这可能是你在找什么。你的例子会变成类似于：

import scrapy 
from inline_requests import inline_requests 

class GotoSpider(scrapy.Spider): 
    name = 'goto' 
    allowed_domains = ['first.com', 'second.com', 'third.com'] 
    start_urls = ['http://first.com/'] 

    @inline_requests 
    def parse(self, response): 
     name = response.xpath(...) 

     response1 = yield scrapy.Request(second.com) 
     price1 = response1.xpath(...) 
     response2 = yield scrapy.Request(third.com) 
     price2 = response2.xpath(...) 

     yield dict(name, price1, price2)

hmm看起来很容易。但为什么它不是内置函数？我认为这是一个普遍的想法，刮一些网站，并在旅途中比较价格。 – daniel

如果我没有记错，Github上有关于将其集成到Scrapy中的一些讨论，但不记得细节。这对开发者来说更是一个问题。 –

scrapy如何从miltiable来源的收益

相关推荐