hadoop/hdfs QJM 笔记

 在hadoop hdfs 中有一个角色叫做 JournalNode, 作用是储存对hdfs修改的日志,如果NN挂掉,通过重新播放日志来恢复。

在原来的模式下,这些日志文件都是放在active的NameNode(NN) 中,又starndy 的NN 定期来合并这些日志文件(压缩等待),然后将合并后的文件合并到active 的NN,关键问题是,如果这个NN挂掉了那么整个集群就挂掉了,为了解决这个问题.

1. 将日志文件由NN写到几台机器上journalnode,几台的原因是担心有机器挂掉.

2. 设置两个NN,一个是active的,一个是backup的,如果active的挂掉了,通过zookeeper,让backup的NN变成active的NN,通过重播journalnode上的日志,让backup的NN的数据同步到active NN 挂掉的状态, 这样集群的容错能力更强。

下面是原文.

Background

Prior to Hadoop 2.0.0, the NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a single NameNode, and if that machine or process became unavailable, the cluster as a whole would be unavailable until the NameNode was either restarted or brought up on a separate machine.

This impacted the total availability of the HDFS cluster in two major ways:

  • In the case of an unplanned event such as a machine crash, the cluster would be unavailable until an operator restarted the NameNode.

  • Planned maintenance events such as software or hardware upgrades on the NameNode machine would result in windows of cluster downtime.

The HDFS High Availability feature addresses the above problems by providing the option of running two redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby. This allows a fast failover to a new NameNode in the case that a machine crashes, or a graceful administrator-initiated failover for the purpose of planned maintenance.


从NN 的角度

1. 现在有一台NN, 有多台journalNode,NN要往 journalndoe 写数据,应该有一个类来干这件事情.就是下图的QutomJournalManager.

hadoop/hdfs QJM 笔记

看到AsyncLoggerSet 这个就是与多台Journalnode 通信的set,实际保存的是由ICPLoggerChannel.Factory 创造的AsyncLogger.如下图

hadoop/hdfs QJM 笔记


2.既然现在有了与多个journalnode通信的管理类,NN只需要调用 QuorumJournalManager 的api就可以了,接下来看一下这些Api是怎么实现的, 拿doFinalize() 方法来实现。

a. QuorumJournalManager 方法

hadoop/hdfs QJM 笔记

b. 接着看loggers 具体实现,针对每一个AsyncLogger 都会返回一个ListenableFuture。(AsyncLoggerSet)

hadoop/hdfs QJM 笔记

(IPCLoggerChannel)

hadoop/hdfs QJM 笔记

看到这里就没有看了,估计后面也是RPC 的一些东西。

现在解析图b 中的 QuorumCall.create(calls),记录那些与不同的Journalnode 通信的结果.(AsyncLoggerSet)

hadoop/hdfs QJM 笔记

(QuorumCall)

hadoop/hdfs QJM 笔记

最后通过与与不同的journalnode的通信结果来处理(QuorumJournalManager)

hadoop/hdfs QJM 笔记

hadoop/hdfs QJM 笔记


前面都是NN端的操作,接下来就是Journalnode接收端的具体操作了,主要是类JournalNode,是对本地log操作的一层简单的封装,有相应的操作,如上图的doFinalize(String journalId)

hadoop/hdfs QJM 笔记



参考链接:https://blog.csdn.net/androidlushangderen/article/details/48415073