spark streaming 与 storm的对比
feature |
strom (trident) | spark streaming | 说明 |
并行框架 |
基于DAG的任务并行计算引擎(task parallel continuous computational engine Using DAG) |
基于spark的数据并行计算引擎(data parallel general purpose batch processing engine) |
|
数据处理模式 |
(one at a time)一次处理一个事件(消息) trident: (Micro-batch)一次 处理多个事件 |
(Micro-batch)一次 处理多个事件 |
|
延时 |
小于一秒 trident(数秒) |
数秒) |
Replies
|
容错 |
至少一次 trident:精确一次 |
精确一次 | |
源出处 |
BackType and Twitter |
UCB | |
实现语言 |
Clojure | scala | |
API支持 |
java、python、ruby等 |
jscala、java、python |
|
平台集成 |
NA(基于zookeeper) |
spark(所以可以统一(或共用)时事处理与历史数据的处理) |
|
产品、支持 |
Storm has been around for several years and has run in production at Twitter since 2011, as well as at many other companies |
Meanwhile, Spark Streaming is a newer project; its only production deployment (that I am aware of) has been at Sharethrough since 2013. |
|
计算理论框架 |
Storm is the streaming solution in the Hortonworks Hadoop data platform |
Spark Streaming is in both MapR's distribution and Cloudera's Enterprise data platform. Databricks |
|
集群集成,部署方式 |
依赖zookeeper,standalone,messo |
standalone,yarn,messo |
|
google trend |
|
||
bug燃烧图 |
https://issues.apache.org/jira/browse/STORM/ |
https://issues.apache.org/jira/browse/SPARK/ |
可见spark问题解决比storm要及时得多 |
Thanks for the article!
ReplyCould you please explain this point in a bit more detail? "But, it relies on transactions to update state, which is slower and often has to be implemented by the user."
If I want to write my output to a persistent store e.g. redis, then why would it be slower in Storm than in Spark Streaming?