Hbase启动失败 zookeeper MetaTableLocator Failed verification of
用 Hadoop namenode -fromat 格式化后,用./start-Hbase.sh 启动HMaster和HRegionServer,但是过几秒种后HMaster进程自动关闭,HRegionServer进程没有关闭,查看日志报如下错:
2015-04-08 10:49:12,164 INFO [worker01:16020.activeMasterManager] master.MasterFileSystem: Log folder hdfs://ns1/hbase/WALs/worker05,16020,1428461295266 belongs to an existing region server
2015-04-08 10:49:12,177 INFO [worker01:16020.activeMasterManager] master.MasterFileSystem: Log folder hdfs://ns1/hbase/WALs/worker06,16020,1428461284585 belongs to an existing region server
2015-04-08 10:49:12,180 INFO [worker01:16020.activeMasterManager] master.MasterFileSystem: Log folder hdfs://ns1/hbase/WALs/worker07,16020,1428461270920 belongs to an existing region server
2015-04-08 10:49:12,300 INFO [worker01:16020.activeMasterManager] zookeeper.MetaTableLocator: Failed verification of hbase:meta,,1 at address=worker05,16020,1428456823337, exception=org.apache.hadoop.hbase.NotServingRegionException: Region hbase:meta,,1 is not online on worker05,16020,1428461295266
at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.Java:2740)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:859)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegionInfo(RSRpcServices.java:1137)
at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:20862)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)
2015-04-08 10:49:12,305 INFO [worker01:16020.activeMasterManager] master.MasterFileSystem: Log dir for server worker05,16020,1428456823337 does not exist
2015-04-08 10:49:12,305 INFO [worker01:16020.activeMasterManager] master.SplitLogManager: dead splitlog workers [worker05,16020,1428456823337]
2015-04-08 10:49:12,306 INFO [worker01:16020.activeMasterManager] master.SplitLogManager: started splitting 0 logs in [] for [worker05,16020,1428456823337]
2015-04-08 10:49:12,322 INFO [worker01:16020.activeMasterManager] master.SplitLogManager: finished splitting (more than or equal to) 0 bytes in 0 log files in [] in 16ms
2015-04-08 10:49:12,323 INFO [worker01:16020.activeMasterManager] zookeeper.MetaTableLocator: Deleting hbase:meta region location in ZooKeeper
2015-04-08 10:49:12,361 INFO [worker01:16020.activeMasterManager] master.AssignmentManager: Assigning hbase:meta,,1.1588230740 to worker07,16020,1428461270920
2015-04-08 10:49:12,361 INFO [worker01:16020.activeMasterManager] master.RegionStates: Transition {1588230740 state=OFFLINE, ts=1428461352349, server=null} to {1588230740 state=PENDING_OPEN, ts=1428461352361, server=worker07,16020,1428461270920}
2015-04-08 10:49:12,435 INFO [worker01:16020.activeMasterManager] master.ServerManager: AssignmentManager hasn't finished failover cleanup; waiting
2015-04-08 10:49:12,503 INFO [AM.ZK.Worker-pool3-t1] master.RegionStates: Transition {1588230740 state=PENDING_OPEN, ts=1428461352361, server=worker07,16020,1428461270920} to {1588230740 state=OPENING, ts=1428461352503, server=worker07,16020,1428461270920}
2015-04-08 10:49:13,096 INFO [AM.ZK.Worker-pool3-t2] master.RegionStates: Transition {1588230740 state=OPENING, ts=1428461352503, server=worker07,16020,1428461270920} to {1588230740 state=OPEN, ts=1428461353096, server=worker07,16020,1428461270920}
2015-04-08 10:49:13,101 INFO [AM.ZK.Worker-pool3-t2] coordination.ZkOpenRegionCoordination: Handling OPENED of 1588230740 from worker01,16020,1428461337650; deleting unassigned node
2015-04-08 10:49:13,110 INFO [AM.ZK.Worker-pool3-t3] master.RegionStates: Onlined 1588230740 on worker07,16020,1428461270920
2015-04-08 10:49:13,111 INFO [worker01:16020.activeMasterManager] master.HMaster: hbase:meta assigned=1, rit=false, location=worker07,16020,1428461270920
2015-04-08 10:49:13,206 INFO [worker01:16020.activeMasterManager] hbase.MetaMigrationConvertingToPB: hbase:meta doesn't have any entries to update.
2015-04-08 10:49:13,206 INFO [worker01:16020.activeMasterManager] hbase.MetaMigrationConvertingToPB: META already up-to date with PB serialization
2015-04-08 10:49:13,224 INFO [worker01:16020.activeMasterManager] master.AssignmentManager: Clean cluster startup. Assigning user regions
2015-04-08 10:49:13,230 INFO [worker01:16020.activeMasterManager] master.AssignmentManager: Joined the cluster in 24ms, failover=false
2015-04-08 10:49:13,247 INFO [worker01:16020.activeMasterManager] master.TableNamespaceManager: Namespace table not found. Creating...
2015-04-08 10:49:13,296 FATAL [worker01:16020.activeMasterManager] master.HMaster: Failed to become active master
org.apache.hadoop.hbase.TableExistsException: hbase:namespace
at org.apache.hadoop.hbase.master.handler.CreateTableHandler.checkAndSetEnablingTable(CreateTableHandler.java:151)
at org.apache.hadoop.hbase.master.handler.CreateTableHandler.prepare(CreateTableHandler.java:124)
at org.apache.hadoop.hbase.master.TableNamespaceManager.createNamespaceTable(TableNamespaceManager.java:233)
at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:86)
at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:868)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:719)
at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:165)
at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1425)
at java.lang.Thread.run(Thread.java:745)
2015-04-08 10:49:13,298 FATAL [worker01:16020.activeMasterManager] master.HMaster: Master server abort: loaded coprocessors are: []
2015-04-08 10:49:13,298 FATAL [worker01:16020.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown.
org.apache.hadoop.hbase.TableExistsException: hbase:namespace
at org.apache.hadoop.hbase.master.handler.CreateTableHandler.checkAndSetEnablingTable(CreateTableHandler.java:151)
at org.apache.hadoop.hbase.master.handler.CreateTableHandler.prepare(CreateTableHandler.java:124)
at org.apache.hadoop.hbase.master.TableNamespaceManager.createNamespaceTable(TableNamespaceManager.java:233)
at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:86)
at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:868)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:719)
at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:165)
at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1425)
at java.lang.Thread.run(Thread.java:745)
2015-04-08 10:49:13,298 INFO [worker01:16020.activeMasterManager] regionserver.HRegionServer: STOPPED: Unhandled exception. Starting shutdown.
2015-04-08 10:49:13,298 INFO [master/worker01/192.168.217.11:16020] regionserver.HRegionServer: Stopping infoServer
2015-04-08 10:49:13,312 INFO [master/worker01/192.168.217.11:16020] mortbay.log: Stopped[email protected]:16030
2015-04-08 10:49:13,415 INFO [master/worker01/192.168.217.11:16020] regionserver.HRegionServer: stopping server worker01,16020,1428461337650
2015-04-08 10:49:13,415 INFO [master/worker01/192.168.217.11:16020] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x24c969390e4001e
2015-04-08 10:49:13,430 INFO [master/worker01/192.168.217.11:16020-EventThread] zookeeper.ClientCnxn: EventThread shut down
2015-04-08 10:49:13,430 INFO [master/worker01/192.168.217.11:16020] zookeeper.ZooKeeper: Session: 0x24c969390e4001e closed
2015-04-08 10:49:13,431 INFO [master/worker01/192.168.217.11:16020] regionserver.HRegionServer: stopping server worker01,16020,1428461337650; all regions closed.
2015-04-08 10:49:13,431 INFO [master/worker01/192.168.217.11:16020] master.HMaster: Stopping master jetty server
2015-04-08 10:49:13,431 INFO [master/worker01/192.168.217.11:16020] mortbay.log: Stopped[email protected]:16010
2015-04-08 10:49:13,433 INFO [worker01,16020,1428461337650-BalancerChore] balancer.BalancerChore: worker01,16020,1428461337650-BalancerChore exiting
2015-04-08 10:49:13,433 INFO [worker01,16020,1428461337650-ClusterStatusChore] balancer.ClusterStatusChore: worker01,16020,1428461337650-ClusterStatusChore exiting
2015-04-08 10:49:13,433 INFO [CatalogJanitor-worker01:16020] master.CatalogJanitor: CatalogJanitor-worker01:16020 exiting
2015-04-08 10:49:13,434 INFO [worker01:16020.oldLogCleaner] cleaner.LogCleaner: worker01:16020.oldLogCleaner exiting
2015-04-08 10:49:13,434 INFO [worker01:16020.oldLogCleaner] master.ReplicationLogCleaner: Stopping replicationLogCleaner-0x14c9693ba2f001f, quorum=worker06:2181,worker05:2181,worker07:2181, baseZNode=/hbase
2015-04-08 10:49:13,435 INFO [worker01:16020.archivedHFileCleaner] cleaner.HFileCleaner: worker01:16020.archivedHFileCleaner exiting
2015-04-08 10:49:13,504 INFO [worker01:16020.oldLogCleaner] zookeeper.ZooKeeper: Session: 0x14c9693ba2f001f closed
2015-04-08 10:49:13,505 INFO [worker01:16020.activeMasterManager-EventThread] zookeeper.ClientCnxn: EventThread shut down
2015-04-08 10:49:13,608 INFO [master/worker01/192.168.217.11:16020] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x14c9693ba2f001e
2015-04-08 10:49:13,761 INFO [worker01:16020.activeMasterManager-EventThread] zookeeper.ClientCnxn: EventThread shut down
2015-04-08 10:49:13,762 INFO [master/worker01/192.168.217.11:16020] zookeeper.ZooKeeper: Session: 0x14c9693ba2f001e closed
2015-04-08 10:49:13,847 INFO [worker01,16020,1428461337650.splitLogManagerTimeoutMonitor] master.SplitLogManager$TimeoutMonitor: worker01,16020,1428461337650.splitLogManagerTimeoutMonitor exiting
2015-04-08 10:49:13,848 INFO [master/worker01/192.168.217.11:16020] flush.MasterFlushTableProcedureManager: stop: server shutting down.
2015-04-08 10:49:13,848 INFO [master/worker01/192.168.217.11:16020] ipc.RpcServer: Stopping server on 16020
2015-04-08 10:49:13,848 INFO [RpcServer.listener,port=16020] ipc.RpcServer: RpcServer.listener,port=16020: stopping
2015-04-08 10:49:13,855 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopped
2015-04-08 10:49:13,855 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopping
2015-04-08 10:49:13,945 INFO [master/worker01/192.168.217.11:16020] zookeeper.RecoverableZooKeeper: Node /hbase/rs/worker01,16020,1428461337650 already deleted, retry=false
2015-04-08 10:49:13,949 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2015-04-08 10:49:13,950 INFO [master/worker01/192.168.217.11:16020] zookeeper.ZooKeeper: Session: 0x14c9693ba2f001d closed
2015-04-08 10:49:13,950 INFO [master/worker01/192.168.217.11:16020] regionserver.HRegionServer: stopping server worker01,16020,1428461337650; zookeeper connection closed.
2015-04-08 10:49:13,950 INFO [master/worker01/192.168.217.11:16020] regionserver.HRegionServer: master/worker01/192.168.217.11:16020 exiting
2015-04-08 10:49:13,952 INFO [Shutdown] mortbay.log: Shutdown hook executing
2015-04-08 10:49:13,952 INFO [Shutdown] mortbay.log: Shutdown hook complete
我网上查了很多资料,都没有找到答案。最后想到这是因为我格式化了HDFS后产生的问题,原来是好的。是不是因为zookeeper中某些文件遗留的原因导致的呢?所以我将dataDir=/opt/zookeeper-3.4.5/data 目录下的version-2给删除,并且重启了服务器。然后重启开启后,HMaster进程正常运行。再打开 dataDir=/opt/zookeeper-3.4.5/data目录发现产生了新的version-2 ,并且多了 zookeeper_server.pid这个文件。我不知道当时是否是因为缺少zookeeper_server.pid这个文件导致的原因呢。
所以发现格式化HDFS还是可能会产生很多意想不到的问题的。所以今天产生的这些问题告诉我,重新格式化之前,最好先关闭相关进程,并且删除hadoop下的tmp目录(master和slave都要删),并且zookeeper下的dataDir下的文件也要注意,然后严格按照下面步骤格式化:
###注意:严格按照下面的步骤
1.启动zookeeper集群(分别在itcast05、itcast06、tcast07上启动zk)
cd /itcast/zookeeper-3.4.5/bin/
./zkServer.sh start
#查看状态:一个leader,两个follower
./zkServer.sh status
2.启动journalnode(分别在在itcast05、itcast06、tcast07上执行)
cd /itcast/hadoop-2.4.1
sbin/hadoop-daemon.sh start journalnode
#运行jps命令检验,itcast05、itcast06、itcast07上多了JournalNode进程
3.格式化HDFS
#在itcast01上执行命令:
hdfs namenode -format
#格式化后会在根据core-site.xml中的hadoop.tmp.dir配置生成个文件,这里我配置的是/itcast/hadoop-2.4.1/tmp,然后将/itcast/hadoop-2.4.1/tmp拷贝到itcast02的/itcast/hadoop-2.4.1/下。
scp -r tmp/ itcast02:/itcast/hadoop-2.4.1/(如果tmp没有东西,此步可以跳过)
4.如果zookeeper设置了高可用HA,否则跳过该步,格式化ZK(在itcast01上执行即可)
hdfs zkfc -formatZK
5.启动HDFS(在itcast01上执行)
sbin/start-dfs.sh
6.启动YARN(#####注意#####:是在itcast03上执行start-yarn.sh,把namenode和resourcemanager分开是因为性能问题,因为他们都要占用大量资源,所以把他们分开了,他们分开了就要分别在不同的机器上启动)
sbin/start-yarn.sh
7.启动Hbase
hbase/bin/start-hbase.sh