linux下安装高可用hdfs以及基于yarn资源管理框架
一、安装高可用hdfs
1.安装配置zookeeper(选三台服务器,node5,6,7)
1)将zookeeper拷贝至linux的/home下,并解压
tar -zxvf zookeeper-3.4.6.tar.gz
2)创建并修改zoo.cfg
vi conf/zoo.cfg
内容为:
tickTime=2000
dataDir=/opt/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=node5:2888:3888 //server.1中的1为服务编码(myid的值)
server.2=node6:2888:3888
server.3=node7:2888:3888
3)创建zookeeper的myid(三台服务器都要创建)
mkdir /opt/zookeeper(dataDir目录下)
vi /opt/zookeeper/myid
node5中值为1,node6为2,node7为3
4)将zookeeper拷至另两台机器上
scp -r zookeeper-3.4.6/ [email protected]:/home/
scp -r zookeeper-3.4.6/ [email protected]:/home/
5)启动zookeeper
A)配置环境变量:
export PATH=$PATH:/home/zookeeper-3.4.6/bin
B)赋值环境变量到另两台服务器:
scp ~/.bash_profile [email protected]:~/
scp ~/.bash_profile [email protected]:~/
C)启动(三台安装zookeeper的都启动)
zkServer.sh start
D)查看zookeeper的启动日志
tail -100 zookeeper.out(一开始可能会出现错误,过段时间就好了)
2.删除配置非ha的hdfs时产生的文件,每个服务器都删
1)删除masters
rm -rf /home/hadoop-2.5.1/etc/hadoop/masters
2)删除hdfs配置时产生的数据文件
rm -rf /opt/hadoop-2.5
3.修改配置文件(node5,6,7,8四台服务器,设置node5和node8为namenode)
1)vi /home/hadoop-2.5.1/etc/hadoop/hdfs-site.xml
<configuration> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.nameservices</name> <value>zxl</value> </property> <property> <name>dfs.ha.namenodes.zxl</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.zxl.nn1</name> <value>node5:9000</value> </property> <property> <name>dfs.namenode.rpc-address.zxl.nn2</name> <value>node8:9000</value> </property> <property> <name>dfs.namenode.http-address.zxl.nn1</name> <value>node5:50070</value> </property> <property> <name>dfs.namenode.http-address.zxl.nn2</name> <value>node8:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://node6:8485;node7:8485;node8:8485/abc</value> //注:现实生产中,node6,7,8最好选择三个无关的服务器,abc名称随意取 //上面的;号表示在三台服务器上都加目录abc </property> <property> <name>dfs.client.failover.proxy.provider.zxl</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_dsa</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/opt/journalnode</value> </property> </configuration> |
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://zxl</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop-2.5</value> </property> </configuration> |
vi /home/hadoop-2.5.1/etc/hadoop/hdfs-site.xml,加上内容
<property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> |
vi /home/hadoop-2.5.1/etc/hadoop/core-site.xml
<property> <name>ha.zookeeper.quorum</name> <value>node5:2181,node6:2181,node7:2181</value> </property> |
node6
node7
node8 //指定datanode
6)拷贝node5下的配置文件到其他服务器
scp /home/hadoop-2.5.1/etc/hadoop/* [email protected]:/home/hadoop-2.5.1/etc/hadoop/
scp /home/hadoop-2.5.1/etc/hadoop/* [email protected]:/home/hadoop-2.5.1/etc/hadoop/
scp /home/hadoop-2.5.1/etc/hadoop/* [email protected]:/home/hadoop-2.5.1/etc/hadoop/
4.启动检查
1)启动Journalnode(只能在datanode上启动,node6,7,8)
hadoop-daemon.sh start journalnode(这只是启动单台journalnode,而不是集群)
2)查看启动日志(检查是否出错)
tail -100 /home/hadoop-2.5.1/logs/hadoop-root-journalnode-node6.log
5.初始化namenode(在node5或node8上启动,上面已将其配置为namenode)
本次在node8上
hdfs namenode -format(最好将/etc/hosts中的127.0.0.1注释掉)
6.在node5上拷贝node8上初始化产生的元数据
scp -r [email protected]:/opt/hadoop-2.5 /opt
拷贝好后,再完成下面步骤:
A)启动刚刚格式化的namenode(node5):hadoop-daemon.sh start namenode
B)在没有格式化的namenode上执行:hdfs namenode -bootstrapStandby(node8)
C)启动第二个namenode(node8)
7.格式化zookeeper(在任意一台namenode机器上进行)
hdfs zkfc -formatZK
8.启动(在node5上执行脚本,node5已经配置了免密码登录,连接各服务器不需输入密码)
首先停止上面节点:stop-dfs.sh,再全面启动:start-dfs.sh
1)如果出现如下错误 node7: datanode running as process 1312. Stop it first.
进入node7输入命令 kill -9 1312
然后单独启动node7的datanode,命令为 hadoop-daemon.sh start datanode
2)输入jps查看进程(QuorumPeerMain显示的是zookeeper的进程)
如果出现zookeeper的进程关闭了
重新启动zookeeper,命令为 zkServer.sh start
9.访问测试
1)浏览器输入:192.168.13.135:50070
192.168.13.138:50070
查看两个namenode是否正常
2)此时node5的namenode为active,node8的为standby
强行关闭node5的namenode,看node8的是否接管,即查看是否高可用
node5中输入jps查看namenode进程号并删除 kill -9 1856
3)如通过浏览器访问node8发现状态还是standby,即出错
查看node8的日志 tail -100 /home/hadoop-2.5.1/logs/hadoop-root-zkfc-node8.log
发现node8并没有设置对自己或外部机器免密码登录(通过ssh nodex命令查看)
操作:a)rm -rf ~/.ssh/* 删除整个文件目录
b)关闭整个集群(在node5中输入,因为node5已设好免登陆)
stop-dfs.sh
c)对node8设置本机免密码登录
ssh-****** -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
对node8设置外机免密码登录
scp ~/.ssh/id_dsa.pub [email protected]:/opt(node8中输入)
cat /opt/id_dsa.pub >> ~/.ssh/authorized_keys(node5中输入)
因为node8的认证文件删了,所以还要重新配置node5到node8的免登录
scp ~/.ssh/id_dsa.pub [email protected]:/opt(node5中输入)
cat /opt/id_dsa.pub >> ~/.ssh/authorized_keys(node8中输入)
10.重新访问测试
1)杀掉全部进程 killall java(因为有时候关闭hdfs关不成功)
2)启动zookeeper(node5,6,7)
zkServer.sh start
3)node5上启动hdfs
start-dfs.sh(如果出错hadoop-daemon.sh start namenode可以重启namenode)
4)此时node5状态还是active
关闭node5的namenode进程 kill -9 4043
5)此时浏览器已不能访问node5,node8的状态变为active
二、基于yarn资源管理框架
配置yarn(resourcemanager,相当于namenode,不过没有数据)
1)修改配置文件(node5,8为resourcemanager)
A)vi /home/hadoop-2.5.1/etc/hadoop/yarn-site.xml,内容为
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>zxl2</value> //只要不和上面配置的zxl重名就行 </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>node5</value> //此时使用node5就要去hosts文件中去配置 </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>node8</value> </property> <property> //zookeeper的集群 <name>yarn.resourcemanager.zk-address</name> <value>node5:2181,node6:2181,node7:2181</value> </property> </configuration> |
B)vi /home/hadoop-2.5.1/etc/hadoop/mapred-site.xml
不过mapred-site.xml不存在,需要对其重命名:
mv mapred-site.xml.template mapred-site.xml,内容为:
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> //指定mapreduce执行在yarn框架中 </property> </configuration> |
注:只需配置resource manager,而不需要配置node manager,但node manager需
和datanode在同一服务器上,根据data node的机器知道node manager的机器
2)拷贝配置文件
scp /home/hadoop-2.5.1/etc/hadoop/* [email protected]:/home/hadoop-2.5.1/etc/hadoop/
scp /home/hadoop-2.5.1/etc/hadoop/* [email protected]:/home/hadoop-2.5.1/etc/hadoop/
scp /home/hadoop-2.5.1/etc/hadoop/* [email protected]:/home/hadoop-2.5.1/etc/hadoop/
3)启动(在node5上,node5配置了免登录)
start-yarn.sh(注意该脚本命令不会启动备用resource manager,需要手动启动)
(start-all.sh为启动所有,关闭时用stop-dfs.sh和stop-yarn.sh)
手动启动(node8中)yarn-daemon.sh start resourcemanager
4)浏览器访问
http://node5:8088 //8088是resource manager application的监控端口
当node5正常运行时,访问http://node8:8088会跳转到node5
当终止node5运行时,此时node8会接管node5
http://node5:50070 //显示状态active
http://node8:50070 //显示状态standby