Hadoop HA的实现
一、集群的规划
Zookeeper集群:
192.168.100.12 (bigdata12)
192.168.100.13 (bigdata13)
192.168.100.14 (bigdata14)
Hadoop集群:
192.168.100.12 (bigdata12) NameNode1 ResourceManager1 Journalnode
192.168.100.13 (bigdata13) NameNode2 ResourceManager2 Journalnode
192.168.100.14 (bigdata14) DataNode1 NodeManager1
192.168.100.15 (bigdata15) DataNode2 NodeManager2
二、准备工作
1、关闭防火墙
systemctl stop firewalld.service
systemctl disable firewalld.service
2、安装JDK
tar -zxvf jdk-8u144-linux-x64.tar.gz -C ~/training
- 配置环境变量
在bigdata12,bigdata13,bigdata14上
vi ~/.bash_profile
JAVA_HOME=/root/training/jdk1.8.0_144
export JAVA_HOME
PATH=$JAVA_HOME/bin:$PATH
export PATH
HADOOP_HOME=/root/training/hadoop-2.7.3
export=HADOOP_HOME
PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export PATH
ZOOKEEPER_HOME=/root/training/zookeeper-3.4.10
export ZOOKEEPER_HOME
PATH=$ZOOKEEPER_HOME/bin:$PATH
export PATH
在bigdata15上
vi ~/.bash_profile
JAVA_HOME=/root/training/jdk1.8.0_144
export JAVA_HOME
PATH=$JAVA_HOME/bin:$PATH
export PATH
HADOOP_HOME=/root/training/hadoop-2.7.3
export=HADOOP_HOME
PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export PATH
对所有服务器生效环境变量
source ~/.bash_profile
- 配置免密码登录
(1) 每台机器产生自己的公钥和私钥
ssh-****** -t rsa 在每一台机器都要执行
(2) 每台机器把自己的公钥给别人 在四台机器上都要执行
ssh-copy-id -i .ssh/id_rsa.pub [email protected]
ssh-copy-id -i .ssh/id_rsa.pub [email protected]
ssh-copy-id -i .ssh/id_rsa.pub [email protected]
ssh-copy-id -i .ssh/id_rsa.pub [email protected]
(*)保证每台机器的时间同步
在Xshell工具中打开Tools选择第一个(Send key Input To All Sessions)执行这一条命令date -s "2018-08-08 22:32:15"是立即生效了,但是重启后,系统时间还是原来的。要想永久生效,需要修改配置文件,具体步骤详见我的****博客中的Hadoop全分布式搭建。
如果时间不一样,执行MapReduce程序的时候可能存在问题
5、配置主机名
vi /etc/hosts
192.168.100.12 bigdata12
192.168.100.13 bigdata13
192.168.100.14 bigdata14
192.168.100.15 bigdata15
三、配置Zookeeper(在192.168.100.12安装)
在主节点(bigdata12)上配置ZooKeeper
(*)配置/root/training/zookeeper-3.4.10/conf/zoo.cfg文件
dataDir=/root/training/zookeeper-3.4.10/tmp
server.1=bigdata12:2888:3888
server.2=bigdata13:2888:3888
server.3=bigdata14:2888:3888
(*)在/root/training/zookeeper-3.4.10/tmp目录下创建一个myid的空文件
echo 1 > /root/training/zookeeper-3.4.10/tmp/myid
(*)将配置好的zookeeper拷贝到bigdata13,bigdata14节点,同时修改各自的myid文件,在bigdata13上把1修改为2;在bigdata14上把1修改为3。
scp -r /root/training/zookeeper-3.4.10/ bigdata13:/root/training
scp -r /root/training/zookeeper-3.4.10/ bigdata14:/root/training
- 安装Hadoop集群(在bigdata12上安装)
tar -zxvf hadoop-2.7.3.tar.gz -C /root/training/
1、修改hadoop-env.sh
export JAVA_HOME=/root/training/jdk1.8.0_144
2、修改core-site.xml
<configuration>
<!-- 指定hdfs的nameservice为ns1 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1</value>
</property>
<!-- 指定hadoop临时目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/root/training/hadoop-2.7.3/tmp</value>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>bigdata12:2181,bigdata13:2181,bigdata14:2181</value>
</property>
</configuration>
3、修改hdfs-site.xml(配置这个nameservice中有几个namenode)
<configuration>
<!--指定hdfs的nameservice为ns1,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<!-- ns1下面有两个NameNode,分别是nn1,nn2 -->
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>
<!-- nn1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>bigdata12:9000</value>
</property>
<!-- nn1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns1.nn1</name>
<value>bigdata12:50070</value>
</property>
<!-- nn2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns1.nn2</name>
<value>bigdata13:9000</value>
</property>
<!-- nn2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns1.nn2</name>
<value>bigdata13:50070</value>
</property>
<!-- 指定NameNode的日志在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://bigdata12:8485;bigdata13:8485;/ns1</value>
</property>
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/root/training/hadoop-2.7.3/journal</value>
</property>
<!-- 开启NameNode失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- 使用sshfence隔离机制时需要ssh免登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<!-- 配置sshfence隔离机制超时时间 -->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
</configuration>
4、修改mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
5、修改yarn-site.xml
<configuration>
<!-- 开启RM高可靠 -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 指定RM的cluster id -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<!-- 指定RM的名字 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 分别指定RM的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>bigdata12</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>bigdata13</value>
</property>
<!-- 指定zk集群地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>bigdata12:2181,bigdata13:2181,bigdata14:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
6、修改slaves
bigdata14
bigdata15
7、将配置好的hadoop拷贝到其他节点
scp -r /root/training/hadoop-2.7.3/ [email protected]:/root/training/
scp -r /root/training/hadoop-2.7.3/ [email protected]:/root/training/
scp -r /root/training/hadoop-2.7.3/ [email protected]:/root/training/
五、启动Zookeeper集群
zkServer.sh start
六、在bigdata12和bigdata13上启动journalnode
hadoop-daemon.sh start journalnode
七、格式化HDFS(在bigdata12上执行)
1. hdfs namenode -format
注意查看日志:18/08/09 00:00:15 INFO common.Storage: Storage directory /root/training/hadoop-2.7.3/tmp/dfs/name has been successfully formatted.
2. 将/root/training/hadoop-2.7.3/tmp拷贝到bigdata13的/root/training/hadoop-2.7.3下
3. 格式化zookeeper
hdfs zkfc -formatZK
日志:18/08/09 00:02:42 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/ns1 in ZK
八、在bigdata12上启动Hadoop集群
start-dfs.sh
start-yarn.sh
日志:
Starting namenodes on [bigdata12 bigdata13]
bigdata12: starting namenode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-namenode-hadoop13.out
bigdata13: starting namenode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-namenode-hadoop12.out
bigdata14: starting datanode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-datanode-hadoop15.out
bigdata15: starting datanode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-datanode-hadoop14.out
bigdata13: starting zkfc, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-zkfc-bigdata13.out
bigdata12: starting zkfc, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-zkfc-bigdata12.out
bigdata13上的ResourceManager需要单独启动
命令:yarn-daemon.sh start resourcemanager
在网页中查看:
查看bigdata12的进程号,kill -9 5382 ,把bigdata12的进程号杀掉,然后查看bigdata13
由上图可以得出,Hadoop的HA功能已经实现。实现此功能需要借助ZooKeeper的集群。
出现的错误:
启动的时候出现了错误,缺少一个NameNode
查看日志信息
是格式化NameNode出现了问题,需要重新格式化,把原来的tmp目录删掉,日志文件删掉,(注意:格式化的时候,原来数据节点上的数据会丢失的)。