没有模块命名记录,而试图记录scrapy抓取
问题描述:
当我试图记录使用Frontera和scrapy抓取时它给出了一个错误,说没有模块命名记录,但是,我无法理解为什么它会出现我遵循了从official link录制的步骤。 请帮助,并感谢你的一样。 回溯是:没有模块命名记录,而试图记录scrapy抓取
2017-07-04 15:38:57 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: alexa)
2017-07-04 15:38:57 [scrapy.utils.log] INFO: Overridden settings: {'AUTOTHROTTLE_MAX_DELAY': 3.0, 'DOWNLOAD_MAXSIZE': 10485760, 'SPIDER_MODULES': ['alexa.spiders'], 'CONCURRENT_REQUESTS_PER_DOMAIN': 10, 'CONCURRENT_REQUESTS': 256, 'RANDOMIZE_DOWNLOAD_DELAY': False, 'RETRY_ENABLED': False, 'DUPEFILTER_CLASS': 'alexa.bloom_filter1.BLOOMDupeFilter', 'AUTOTHROTTLE_START_DELAY': 0.25, 'REACTOR_THREADPOOL_MAXSIZE': 20, 'BOT_NAME': 'alexa', 'AJAXCRAWL_ENABLED': True, 'COOKIES_ENABLED': False, 'SCHEDULER': 'frontera.contrib.scrapy.schedulers.frontier.FronteraScheduler', 'DOWNLOAD_TIMEOUT': 120, 'AUTOTHROTTLE_ENABLED': True, 'NEWSPIDER_MODULE': 'alexa.spiders'}
2017-07-04 15:38:57 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.throttle.AutoThrottle']
Unhandled error in Deferred:
2017-07-04 15:38:57 [twisted] CRITICAL: Unhandled error in Deferred:
2017-07-04 15:38:57 [twisted] CRITICAL:
Traceback (most recent call last):
File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
result = g.send(result)
File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/crawler.py", line 95, in crawl
six.reraise(*exc_info)
File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/crawler.py", line 77, in crawl
self.engine = self._create_engine()
File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/crawler.py", line 102, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/core/engine.py", line 69, in __init__
self.downloader = downloader_cls(crawler)
File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/core/downloader/__init__.py", line 88, in __init__
self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/middleware.py", line 34, in from_settings
mwcls = load_object(clspath)
File "/root/scrapy/scrapy/local/lib/python2.7/site-packages/scrapy/utils/misc.py", line 44, in load_object
mod = import_module(module)
File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
ImportError: No module named recording
答
我跟随official doc有同样的问题。我在scrapinghub blogpost后发现了一个solutiobn。
问题是,官方文档已弃用。它使用不存在了中间件:
SPIDER_MIDDLEWARES.update({
'frontera.contrib.scrapy.middlewares.recording.CrawlRecorderSpiderMiddleware': 1000,
})
DOWNLOADER_MIDDLEWARES.update({
'frontera.contrib.scrapy.middlewares.recording.CrawlRecorderDownloaderMiddleware': 1000,
})
除了使用recording
中间件的,你需要使用一个scheduler
。
SPIDER_MIDDLEWARES.update({
'frontera.contrib.scrapy.middlewares.schedulers.SchedulerSpiderMiddleware': 1000,
})
DOWNLOADER_MIDDLEWARES.update({
'frontera.contrib.scrapy.middlewares.schedulers.SchedulerDownloaderMiddleware': 1000,
})
我建议你在[frontera on GitHub](https://github.com/scrapinghub/frontera/issues/new)上打开一个问题。 –