Hadoop2.8.5的YARN的高可用集群搭建(YARN HA)

参照:Hadoop2.8.5的HDFS的高可用集群搭建(HDFS HA)

https://blog.csdn.net/u014635374/article/details/104997451   搭建后HDFS HA 后只需要安装下面修改yarn-site.xml文件即可             

          YARN HA搭建

与HDFS HA类似,YARN集群也可以搭建HA功能,下面我们来讲解YARN集群的HA架构原理和HA的具体搭建步骤。

7.1 架构原理

     在Hadoop的YARN集群中,ResourceManager负责跟踪集群中的资源,以及调度应用程序(例如MapReduce作业)。在Hadoop2.4之前,集群中只有一个ResourceManager,当其中一个岩机时,将影响整个集群。高可用特性增加了冗余的形式,即一个活动/备用的ResourceManager对,以便可用进行故障转移。

     YARN HA的架构如下图所示:

                                                 Hadoop2.8.5的YARN的高可用集群搭建(YARN HA)

    

与HDFS HA类似,同一时间只有一个ResourceManager处于活动状态,当不启用自动故障转移时,我们必须手动将其中一个ResourceManager转换为活动状态。可用结合Zookeeper实现自动故障转移,当活动ResourceManager无响应或故障时,另一个ResourceManager自动被Zookeeper选为活动ResourceManager。与HDFS HA不同的是,ResourceManager中的ZKFC只作为ResourceManager的一个线程运行,而不是一个独立的进程。

7.2  YARN集群的搭建

   集群打架的总体思路:先在centosnode01节点上配置完毕之后,在发送到centosnode02,centosnode03,centoshadoop4节点上,集群中的各个节点角色分配如下表:

                                                     YARN结合Zookeeper搭建HA 的集群角色分配

             节点

               角色

         centoshadoop1

ResourceManger

NodeManger

QuorumPeerMain ------Zookeeper进程

         centoshadoop2

ResourceManger

NodeManger

QuorumPeerMain ------Zookeeper进程

         centoshadoop3

NodeManager

QuorumPeerMain ------Zookeeper进程

centoshadoop4

NodeManager

QuorumPeerMain ------Zookeeper进程

YARN HA的配置步骤:

7.2.1 yarn-site.xml文件配置

YARN HA 的配置需要在Hadoop配置文件yarn-site.xml中继续加入新的配置项,以完成HA功能。yarn-site.xml文件的完整配置内容如下:

 <!-- NodeManager上运行的附属服务,需配置成mapreduce_shuffle才可以运行MapReduce程序 -->

    <property>

         <name>yarn.nodemanager.aux-services</name>

         <value>mapreduce_shuffle</value>

    </property>

    <!-- 指定ResourceManager所在的节点与访问端口(默认端口是8032),若不指定,ResourceManager将默认在执行YARN启动命令的节点上启动-->

    <property>

        <name>yarn.resourcemanager.address.rm1</name>

        <value>centoshadoop1:8032</value>

    </property>

    <property>

        <name>yarn.resourcemanager.scheduler.address.rm1</name>

        <value>centoshadoop1:8030</value>

    </property>

    <property>

        <name>yarn.resourcemanager.resource-tracker.address.rm1</name>

        <value>centoshadoop1:8031</value>

    </property>

    <property>

        <name>yarn.resourcemanager.admin.address.rm1</name>

        <value>centoshadoop1:8033</value>

    </property>

    <!--配置rm2-->

    <property>

        <name>yarn.resourcemanager.address.rm2</name>

        <value>centoshadoop2:8032</value>

    </property>

    <property>

        <name>yarn.resourcemanager.scheduler.address.rm2</name>

        <value>centoshadoop2:8030</value>

    </property>

    <property>

        <name>yarn.resourcemanager.webapp.address.rm2</name>

        <value>centoshadoop2:8088</value>

    </property>

   <property>

        <name>yarn.resourcemanager.resource-tracker.address.rm2</name>

        <value>centoshadoop2:8031</value>

    </property>

    <property>

        <name>yarn.resourcemanager.admin.address.rm2</name>

        <value>centoshadoop2:8033</value>

    </property>

    <property>

        <name>yarn.resourcemanager.ha.admin.address.rm2</name>

        <value>centoshadoop2:23142</value>

    </property>

    <property>

        <name>yarn.resourcemanager.ha.admin.address.rm1</name>

        <value>centoshadoop1:23142</value>

    </property>

    <!-- YARN HA配置-->

    <!-- 开启ResourceManager HA功能-->

    <property>

         <name>yarn.resourcemanager.ha.enabled</name>

         <value>true</value>

    </property>

     <!--标志ResourceManager-->

    <property>

         <name>yarn.resourcemanager.cluster-id</name>

         <value>yarncluster</value>

    </property>

    <!-- 集群中ResourceManager的ID列表,后面的配置将引用该ID-->

    <property>

        <name>yarn.resourcemanager.ha.rm-ids</name>

        <value>rm1,rm2</value>

    </property>

    <!--开启故障自动切换-->

    <property>

         <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>

         <value>true</value>

    </property>

    <!--  ResourceManager1所在的节点主机名-->

    <property>

         <name>yarn.resourcemanager.hostname.rm1</name>

         <value>centoshadoop1</value>

    </property>

     <!--

       在centoshadoop1上配置rm1,在centoshadoop2上配置rm2,

       注意:一般都喜欢把配置好的文件远程复制到其它机器上,但这个在YARN的另一个机器上一定要修改

    -->

    <property>

        <name>yarn.resourcemanager.ha.id</name>

        <value>rm1</value>

        <description>If we want to launch more than one RM in single node,we need this configuration</description>

    </property>

    <!--  ResourceManager2所在的节点主机名-->

    <property>

         <name>yarn.resourcemanager.hostname.rm2</name>

         <value>centoshadoop2</value>

    </property>

    <!-- ResourceManager1 的Web页面访问地址-->

    <property>

         <name>yarn.resourcemanager.webapp.address.rm1</name>

         <value>centoshadoop1:8088</value>

    </property>

    <!-- ResourceManager2 的Web页面访问地址-->

    <property>

         <name>yarn.resourcemanager.webapp.address.rm2</name>

         <value>centoshadoop2:8088</value>

    </property>

    <!--配置与zookeeper的连接地址-->

    <property>

        <name>yarn.resourcemanager.zk-state-store.address</name>

        <value>node-1:2181,node-2:2181,node-3:2181</value>

    </property>

    <!--Zookeeper集群列表 -->

    <property>

         <name>yarn.resourcemanager.zk-address</name>

         <value>node-1:2181,node-2:2181,node-3:2181</value>

    </property>

    <!--  启用ResourceManager 重启的功能,默认为false-->

    <property>

         <name>yarn.resourcemanager.recovery.enabled</name>

         <value>true</value>

    </property>

    <!-- 用于ResourceManager状态存储的类-->

    <property>

         <name>yarn.resourcemanager.store.class</name>

<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>

    </property>

  <property>

        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

        <value>org.apache.hadoop.mapred.ShuffleHandler</value>

    </property>

    <!-- 日志聚集功能使能 -->

    <property>

        <name>yarn.log-aggregation-enable</name>

        <value>true</value>

    </property>

    <!-- 日志保留时间设置7天 -->

    <property>

        <name>yarn.log-aggregation.retain-seconds</name>

        <value>604800</value>

    </property>

    <property>

        <name>yarn.nodemanager.local-dirs</name>

        <value>/home/hadoop/yarn/local</value>

    </property>

    <property>

        <name>yarn.nodemanager.log-dirs</name>

        <value>/home/hadoop/yarn/logs</value>

    </property>

    <property>

        <name>mapreduce.shuffle.port</name>

        <value>23080</value>

    </property>

 

    <!--故障处理类-->

    <property>

        <name>yarn.client.failover-proxy-provider</name>

       <value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>

    </property>

    <!-- /usr/local/zookeeper/data  为zookeeper的数据存储目录需要跟zoo.cfg中配置保持一致-->

    <property>

        <name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>

        <value>/usr/local/zookeeper/data</value>

       <description>Optionalsetting.Thedefaultvalueis/yarn-leader-election</description>

    </property>

 

分别在centoshadoop1,centoshadoop2节点初见下面的两个目录

mkdir -p /home/hadoop/yarn/local

mkdir -p /home/hadoop/yarn/logs

centoshadoop2上启动报错

 WARN org.apache.hadoop.ipc.Client:

Failed to connect to server: centoshadoop1/192.168.227.140:8031: retries get failed due to exceeded maximum allowed retries number: 0

 

firewall-cmd --zone=public --add-port=8030/tcp --permanent

firewall-cmd --zone=public --add-port=8031/tcp --permanent

firewall-cmd --zone=public --add-port=8032/tcp --permanent

firewall-cmd --zone=public --add-port=8033/tcp --permanent  

firewall-cmd --zone=public --add-port=23142/tcp --permanent

Firewall-cmd --reload

分发至其他三台机器

scp -r yarn-site.xml

[email protected]:/home/hadoop/hadoop-ha/hadoop/hadoop-2.8.5/etc/hadoop/

修改centoshadoop2上的yarn-site.xml文件配置,如下

 <!--

       在centoshadoop1上配置rm1,在centoshadoop2上配置rm2,

       注意:一般都喜欢把配置好的文件远程复制到其它机器上,但这个在YARN的另一个机器上一定要修改

-->

<property>

        <name>yarn.resourcemanager.ha.id</name>

        <value>rm2</value>

        <description>If we want to launch more than one RM in single node,we need this configuration</description>

</property>

停止各台机器的resourcemanager进程

cd /home/hadoop/hadoop-ha/hadoop/hadoop-2.8.5/

sbin/yarn-daemon.sh stop resourcemanager

启动centoshadoop1,centoshadoop2节点的进程

sbin/yarn-daemon.sh stop resourcemanager

 

监控centoshadoop1与centoshadoop2的resourcemanager文件

cd /home/hadoop/hadoop-ha/hadoop/hadoop-2.8.5/logs

tail -f yarn-hadoop-resourcemanager-centoshadoop1.log

tail -f yarn-hadoop-resourcemanager-centoshadoop2.log

 

分别监控四台机器的nodemanager日志

tail -f yarn-hadoop-nodemanager-centoshadoop1.log

tail -f yarn-hadoop-nodemanager-centoshadoop2.log

tail -f yarn-hadoop-nodemanager-centoshadoop3.log

tail -f yarn-hadoop-nodemanager-centoshadoop4.log

访问http://centoshadoop1:8088/cluster

 

                                            Hadoop2.8.5的YARN的高可用集群搭建(YARN HA)

如果访问设备ResourceManager地址http://centoshadoop:8088,发现自动跳转到ResourceManager的地址http://centoshadoop1:8088。这是因为活动状态的ResourceManager在centoshadoop1节点上,访问设备ResourceManager会自动跳转到活动的ResourceManager。

 

Yarn HA 高可用测试

停止centoshadoop节点的resourcemanager进程

sbin/yarn-daemon.sh stop resourcemanager

http://centoshadoop2:8088/cluster

                                        Hadoop2.8.5的YARN的高可用集群搭建(YARN HA)

              

注意:搭建的过程中要监控每个节点的日志,直到所有节点的日志正常