HDFS数据副本

背景知识

  1. 位于不同机柜的两台机器进行通信,需要通过交换机路由,增加延时;
  2. 相同机柜内的网络带宽是要高于机柜间的带宽;
  3. 整个机柜故障的概率远低于单台机器;

Communication between two nodes in different racks has to go through switches. In most cases, network bandwidth between machines in the same rack is greater than network bandwidth between machines in different racks. The chance of rack failure is far less than that of node failure;

副本存放策略

  数据副本的如何存放,直接影响到HDFS的可靠性和性能。以默认3副本为例进行说明,HDFS当前策略是分别在与writer相同和不同机柜内选择一台机器存放副本,如下图所示:
HDFS数据副本
这种策略在保障数据可靠性的同时,提升了写性能(跨机房写次最小化);

This policy improves write performance without compromising data reliability or read performance.

同机房优先原则

  在处理读请求时,HDFS会优先选取离Reader最近的,以便最小化全局带宽消耗;

To minimize global bandwidth consumption and read latency, HDFS tries to satisfy a read request from a replica that is closest to the reader. If there exists a replica on the same rack as the reader node, then that replica is preferred to satisfy the read request. If HDFS cluster spans multiple data centers, then a replica that is resident in the local data center is preferred over any remote replica.

副本恢复

  当某台DataNode出现故障时,NameNode会从从先选取一台正常的DataNode,然后进行副本拷贝,以保证副本数据恢复到正常水平;

参考:

  1. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html