Trafodion Troubleshooting-HBase is not available
现象
刚启动完HBase,使用Trafodion用户执行hbcheck检查hbase状态,发现HBase不可用,报错如下,
HBase is not available
HBase not available. Waiting 10 seconds.
ZooKeeper Quorum: dev02.esgyn.cn,dev01.esgyn.cn,dev03.esgyn.cn, ZooKeeper Port : 2181
Caught an exception trying to check the status of the HBase cluster: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=2, exceptions:
Mon Mar 18 11:04:41 CST 2019, RpcRetryingCaller{globalStartTime=1552878281536, pause=100, retries=2}, org.apache.hadoop.hbase.MasterNotRunningException: com.google.protobuf.ServiceException: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ipc.ServerNotRunningYetException): org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet
at org.apache.hadoop.hbase.master.HMaster.checkServiceStarted(HMaster.java:2366)
at org.apache.hadoop.hbase.master.MasterRpcServices.isMasterRunning(MasterRpcServices.java:938)
at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:55654)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2191)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:183)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:163)
分析
HBase在启动时有一个Region上线的过程,如果集群中Region个数较多,Region上线的时间也会相应更长。
解决
通过CDH Manager界面或Ambari界面监控HBase Region变化情况,发现Region在一段时间内从0增大到最大值。以CDH Manager为例,图表如下,
当Region上线个数达到最大值时,再执行hbcheck检查状态,输出如下,
[[email protected] ~]$ hbcheck
Stderr being written to the file: /opt/trafodion/esgyndb/logs/hbcheck.log
ZooKeeper Quorum: dev02.esgyn.cn,dev01.esgyn.cn,dev03.esgyn.cn, ZooKeeper Port : 2181
HBase is available!
HBase version: 1.2.0-cdh5.13.3
HMaster: dev02.esgyn.cn,60000,1552878205113
Number of RegionServers available:3
RegionServer #1: dev02.esgyn.cn,60020,1552878204669
RegionServer #2: dev03.esgyn.cn,60020,1552878201843
RegionServer #3: dev04.esgyn.cn,60020,1552878202102
Number of Dead RegionServers:0
Number of regions: 3986
Number of regions in transition: 0
Average load: 1328.6666666666667