Hadoop的ResourceManager的HA连接到ResourceManager中的/0.0.0.0:8032
问题描述:
扩展的问题之一: Hadoop: Connecting to ResourceManager failedHadoop的ResourceManager的HA连接到ResourceManager中的/0.0.0.0:8032
的Hadoop 2.6.1
我有ResourceManager的HA配置。
当我杀死'本地'ResourceManager(检查集群)时,发生故障切换,并且其他服务器上的ResourceManager变为活动状态。不幸的是,当我尝试使用“本地”实例nodemanager运行作业时,它不会将请求“故障转移”到活动的ResourceManager。
[email protected]:~$ jps
26738 Jps
23463 DataNode
23943 DFSZKFailoverController
24297 NodeManager
25690 ResourceManager
23710 JournalNode
23310 NameNode
#kill and start ResourceManager, so the failover occur
[email protected]:~$ kill -9 25690
~/hadoop/sbin/yarn-daemon.sh start resourcemanager
[email protected]:~$ ~/hadoop/bin/yarn rmadmin -getServiceState rm1
standby
[email protected]:~$ ~/hadoop/bin/yarn rmadmin -getServiceState rm2
active
#run my class:
14:56:51.476 [main] INFO o.apache.samza.job.yarn.ClientHelper - trying to connect to RM 0.0.0.0:8032
2015-10-29 14:56:51 RMProxy [INFO] Connecting to ResourceManager at /0.0.0.0:8032
14:56:51.572 [main] DEBUG o.a.h.s.a.util.KerberosName - Kerberos krb5 configuration not found, setting default realm to empty
2015-10-29 14:56:51 NativeCodeLoader [WARN] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14:56:51.575 [main] DEBUG o.a.hadoop.util.PerformanceAdvisory - Falling back to shell based
2015-10-29 14:56:52 Client [INFO] Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-10-29 14:56:53 Client [INFO] Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
纱的site.xml
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>clusterstaging</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2,rm3</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>stg-hadoop106</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>stg-hadoop107</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm3</name>
<value>stg-hadoop108</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>A:2181,B:2181,C:2181</value>
</property>
我没有配置
<name>yarn.resourcemanager.hostname</name>
,因为它应该工作 '是' - 纠正我,如果我错了:)
我尝试过
<name>yarn.client.failover-proxy-provider</name>
但没有成功
任何想法? 也许我错误地期待客户端找到活动的RM节点?
您知道如何在'自动故障切换'选项中切换节点主动/备用?
~/hadoop/bin/yarn rmadmin -failover rm1 rm2
Exception in thread "main" java.lang.UnsupportedOperationException: RMHAServiceTarget doesn't have a corresponding ZKFC address
~/hadoop/bin/yarn rmadmin -transitionToActive rm1 rm2
Automatic failover is enabled for [email protected]
Refusing to manually manage HA state, since it may cause
答
如果您在自动故障切换模式下启用HA-RM,则无法触发活动状态以待机或反之。并且您应该提供yarn.client.failover-proxy-provider
参数,这是客户端用于故障切换到活动RM的类。并且配置yarn.resourcemanager.hostname
来识别RM(即,rm1,rm2)。
如果自动故障切换不使,您可以触发使用下面 yarn rmadmin -transitionToStandby rm1
请做上述变化,并给予答复与结果