hadoop的dataNode节点起不来
今天发现hadoop集群挂掉了一个dataNode节点:
此时发现一个dataNode节点已经挂了快60个小时了(由于我被临时调用到别的工作中,以为这磁盘很坚强,加上加班,所以疏忽了,有点不称职了)
赶快启动单节点
hadoop-daemon.sh start datanode
顿时*花一紧,没起来.赶快查日志,日志文件上面那个
************************************************************/
2018-08-30 13:50:27,885 INFO FO org.apache.hadoop.hdfs.server.datanode.DataNode: re: registered UNIX signal handlers for [TERM, HUP, INT]
2018-08-30 13:50:28,213 WARN RN org.apache.hadoop.hdfs.server.datanode.DataNode: In: Invalid id dfs.datanode.data.dir /da /data10/dn :
:
java.io.FileNotFoundEundException: File file:/data10/dn does not exist
at at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetdGetFileStatus(us(RawLocalFileSystem.java:524)
)
at at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStnkStatusInternal(al(RawLocalFileSystem.java:737)
)
at at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatusatus(us(RawLocalFileSystem.java:514)
)
at at org.apache.hadoop.fs.FilterFileSystem.getFileStatusatus(us(FilterFileSystem.java:398)
)
at at org.apache.hadoop.util.DiskChecker.mkdirsWithExihExistsAndPermissionCheck(ck(DiskChecker.java:139)
)
at at org.apache.hadoop.util.DiskChecker.checkDir(Dis(ir(DiskChecker.java:156)
)
at at org.apache.hadoop.hdfs.server.datanode.DataNode$Dat$de$DataNodeDiskChecker.checkDir(Dat(ir(DataNode.java:2057)
)
at at org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLageLocations(ns(DataNode.java:2099)
)
at at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(nce(ce(DataNode.java:2081)
)
at at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDateDataNode(de(DataNode.java:1973)
)
at at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNodaNode(de(DataNode.java:2020)
)
at at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(Dan(in(DataNode.java:2196)
)
at at org.apache.hadoop.hdfs.server.datanode.DataNode.main(Dat(in(DataNode.java:2220)
20)
2018-08-30 13:50:28,263 INFO FO org.apache.hadoop.metrics2.impl.MetricsConfignfig: loaded properties from om hadoop-metrics2.propertperties
2018-08-30 13:50:28,319 INFO FO org.apache.hadoop.metrics2.impl.MetricsSystemstemImpl: Scheduled snapshot period at 10 second(s).
2018-08-30 13:50:28,319 INFO FO org.apache.hadoop.metrics2.impl.MetricsSystemstemImpl: DataNode metrics system started
2018-08-30 13:50:28,322 INFO FO org.apache.hadoop.hdfs.server.data
异常
2018-08-30 13:50:28,213 WARN RN org.apache.hadoop.hdfs.server.datanode.DataNode: In: Invalid id dfs.datanode.data.dir /da /data10/dn :
:
java.io.FileNotFoundEundException: File file:/data10/dn does not exist
去找文件
恭喜中奖:磁盘挂了
此时需要做的事情是检查别的节点的磁盘使用量.如果磁盘够用,再看下原有的表的Region是不是够用.
我个人认为,dataNode进程挂了,应该是影响到hbase的数据flush,说白了就是数据可能无法写到hdfs的上面了(原谅我的无知已经确定这个想法是错误的了,后续别的文章我会介绍这个)
此时的hbase的截图如下
region正常,这个问题后续再确定(有更紧的活要干,我只是个搬砖的)