Scrapy/Python/XPath - 如何从数据中提取数据？

问题描述：

我是Scrapy的新手，我刚开始研究XPath。Scrapy/Python/XPath - 如何从数据中提取数据？

我想从div中的html列表项中提取标题和链接。下面的代码是我怎么想我会去这样做，（选择UL格，通过ID，然后通过列表项循环）：

def parse(self, response): 
    for t in response.xpath('//*[@id="categories"]/ul'): 
     for x in t.xpath('//li'): 
      item = TgmItem() 
      item['title'] = x.xpath('a/text()').extract() 
      item['link'] = x.xpath('a/@href').extract() 
      yield item

但我收到了相同的结果，这样的尝试：

def parse(self, response): 
    for x in response.xpath('//li'): 
     item = TgmItem() 
     item['title'] = x.xpath('a/text()').extract() 
     item['link'] = x.xpath('a/@href').extract() 
     yield item

凡导出CSV文件包含从源代码从上到下L1数据...

我不是专家，我做了一些尝试，如果任何人都可以对一些线索这将不胜感激。

答

您需要先从一个点的内环内使用您的XPath表达式：

for t in response.xpath('//*[@id="categories"]/ul'): 
    for x in t.xpath('.//li'):

这将使它在当前元素，而不是整个页面的范围搜索。

在Working with relative XPaths查看更多解释。

Scrapy/Python/XPath - 如何从数据中提取数据？

相关推荐