大型架构及配置技术大数据(三)之常用组件,Kafka集群,Hadoop高可用
一、常用组件Zookeeper
1.Zookeeper是什么
Zookeeper是一个开源的分 布式应用程序协调服务
Zookeeper是用来保证数据在集群间的事务一致性
Zookeeper应用场景
集群分布式锁
集群统一命名服务
分布式协调服务
2.Zookeeper角色与特性
- Leader :接受所有Follower的提案请求并统一协调发起提案的投票,负责与所有的Follower进行内部数据交换
Follower :直接为客户端服务并参与提案的投票,同时与Leader进行数据交换
- Observer :直接为客户端服务但并不参与提案的投票,同时也与eader进行数据交换
3.Zookeeper角色与选举
服务在启动的时候是没有角色的( LOOKING )
角色是通过选举产生的
选举产生一个Leader ,剩下的是Follower
选举Leader原则
集群中超过半数机器投票选择Leader
假如集群中拥有n台服务器,那么Leader必须得到n/2+ 1台服务器的投票
如果Leader死亡,重新选举Leader
如果死亡的机器数量达到一半, 则集群挂掉
如果无法得到足够的投票数量,就重新发起投票, 如果参与投票的机器不足n/2+1 ,则集群停止工作
Observer不计算在投票总设备数量里面
4.Zookeeper可伸缩扩展性原理与设计
客户端提交一个请求,若是读请求,则由每台Server的本地副本数据库直接响应。若是写请求,需要通过一致性协议(Zab)来处理
Zab协议规定:来自Client的所有写请求都要转发给ZK服务中唯一-的Leader ,由L eader根据该请求发起一个Proposal。然后其他的Server对该Proposal进行Vote。之后Leader对Vote进行收集,当Vote数量过半时eader会向所有的Server发送一个通知消息。 最后当Client所连接的Server收到该消息时,会把该操作更新到内存中并对Client的写请求做出回应
ZooKeeper在上述协议中实际扮演了两个职能。一方面从客户端接受连接与操作请求,另一方面对操作结果进行投票。这两个职能在Zookeeper集群扩展的时候彼此制约
从Zab协议对写请求的处理过程中可以发现,增加Follower的数量,则增加了协议投票过程的压力。因为Leader节点必须等待集群中过半Server响应投票, 是节点的增加使得部分计算机运行较慢,从而拖慢整个投票过程的可能性也随之提高,随着集群变大,写操作也会随之下降
所以,我们不得不在增加Client数量的期望和我们希望保持较好吞吐性能的期望间进行权衡。要打破这一耦合关系我们引入了不参与投票的服务器Observer. Observer可以接受客户端的连接,并将写请求转发给L eader节点。但Leader节点不会要求Observer参加投票,仅仅在上述第3步那样,和其他服务节点一起得到投票结果
Observer的扩展,给Zookeeper的可伸缩性带来了全新的景象。加入很多Observer节点,无须担心严重影响写吞吐量。但并非是无懈可击,因为协议中的通知阶段,仍然与服务器的数量呈线性关系。但是这里的串行开销非常低。因此,我们可以认为在通知服务器阶段的开销不会成为瓶颈
Observer提升读性能的可伸缩性
Observer提供了广域网能力
5.Zookeeper集群
Zookeeper集群的安装配置
配置文件改名zoo.cfg
# mv zoo_ sample.cfg zoo.cfg
zoo.cfg文件最后添加如下内容
server.1=node61:2888:3888
server.2=node62:2888:3888
server.3=node63:2888:3888
server.4=nn60:2888:3888:observer
zoo.cfg集群的安装配置
创建datadir指定的目录
# mkdir /tmp/zookeeper
在目录下创建id对应主机名的myid文件
关于myid文件
-myid文件中只有一个数字
注意: 请确保每个server的myid文件中id数字不同
server.id中的id与myid中的id必须一致
id的范围是1~255
Zookeeper集群的安装配置
启动集群,查看验证(在所有集群节点执行)
# /usr/local/zookeeper/bin/zkServer.sh start
查看角色
# /usr/local/ zookeeper/bin/zkServer.sh status
Zookeeper管理文档
http://zookeeper.apache.org/doc/r3.4.10/zookeeperAdmin.html
[[email protected] ~]# cat /etc/hosts ==>>所有机器配置并且安装(java-1.8.0-openjdk-devel)
192.168.4.60 nn60
192.168.4.61 node61
192.168.4.62 node62
192.168.4.63 node63
192.168.4.64 node64
[[email protected] ~]# tar -xf zookeeper-3.4.13.tar.gz
[[email protected] ~]# mv zookeeper-3.4.13 /usr/local/zookeeper
[[email protected] ~]# cd /usr/local/zookeeper/conf/
[[email protected] conf]# mv zoo_sample.cfg zoo.cfg
[[email protected] conf]# vim zoo.cfg ==>>在文件中添加如下内容
server.1=node61:2888:3888
server.2=node62:2888:3888
server.3=node63:2888:3888
server.4=nn60:2888:3888:observer
[root[email protected] conf]# for i in {61..63};do rsync -aSH --delete /usr/local/zookeeper/ 192.168.4.$i:/usr/local/zookeeper/ -e 'ssh' & done ==>>拷贝zookeeper发哦其他主机
[1] 10367
[2] 10368
[3] 10369
[[email protected] conf]# mkdir /tmp/zookeeper
[[email protected] conf]# ssh node61 mkdir /tmp/zookeeper
[[email protected] conf]# ssh node62 mkdir /tmp/zookeeper
[[email protected] conf]# ssh node63 mkdir /tmp/zookeeper
[[email protected] conf]# echo 4 >/tmp/zookeeper/myid
[[email protected] conf]# ssh node61 'echo 1 >/tmp/zookeeper/myid'
[[email protected] conf]# ssh node62 'echo 2 >/tmp/zookeeper/myid'
[[email protected] conf]# ssh node63 'echo 3 >/tmp/zookeeper/myid'
[[email protected] conf]# /usr/local/zookeeper/bin/zkServer.sh start
[[email protected] conf]# ssh node61 '/usr/local/zookeeper/bin/zkServer.sh start'
[[email protected] conf]# ssh node62 '/usr/local/zookeeper/bin/zkServer.sh start'
[[email protected] conf]# ssh node63 '/usr/local/zookeeper/bin/zkServer.sh start'
[[email protected] conf]# ssh node63 'netstat -ntlup|grep :2181'
tcp6 0 0 :::2181 :::* LISTEN 5133/java
[[email protected] conf]# ssh node62 'netstat -ntlup|grep :2181'
tcp6 0 0 :::2181 :::* LISTEN 4144/java
[[email protected] conf]# ssh node61 'netstat -ntlup|grep :2181'
[[email protected] conf]# /usr/local/zookeeper/bin/zkServer.sh status
[[email protected] ~]# /usr/local/zookeeper/bin/zkServer.sh stop
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED
[[email protected] ~]# yum -y install telnet
Trying 192.168.4.61...
Connected to node61.
Escape character is '^]'.
Connection closed by foreign host.
[[email protected] conf]# /usr/local/zookeeper/bin/zkServer.sh start
[[email protected] conf]# vim api.sh
#!/bin/bash
function getstatus(){
exec 9<>/dev/tcp/$1/2181 2>/dev/null
echo stat >&9
MODE=$(cat <&9 |grep -Po "(?<=Mode:).*")
exec 9<&-
echo ${MODE:-NULL}
}
for i in node{61..63} nn01;do
echo -ne "${i}\t"
getstatus ${i}
done
[[email protected] conf]# chmod 755 api.sh
[[email protected] conf]# ./api.sh
node61 follower
node62 leader
node63 follower
nn01 observer
二、Kafka集群(配置在node61,node62,node63上)
1.什么是Kafka
Kafka是什么
Kafka是由LinkedIn开发的一个分布式的消息系统
Kafka是使用Scala编写
Kafka是一种消息中间件
为什么要使用Kafka
解耦、几余、提高扩展性、缓冲
保证顺序,灵活,削峰填谷
异步通信
2.Kafka角色
Kafka角色与集群结构
producer :生产者,负责发布消息
consumer :消费者,负责读取处理消息
topic :消息的类别
Parition :每个Topic包含一个或多个Partition
Broker : Kafka集群包含一个或多个服务器
Kafka通过Zookeeper管理集群配置, 选举Leader
3.Kafka集群安装与配置
Kafka角色与集群结构
Kafka集群的安装配置
Kafka集群的安装配置依赖Zookeeper,搭建Kafka集群之前,请先创建好一个可用的Zookeeper集群
安装OpenJDK运行环境
同步Kafka拷贝到所有集群主机
修改配置文件
启动与验证
server.properties
broker.id
每台服务器的broker.id都不能相同
zookeeper.connect
zookeeper集群地址,不用都列出,写一部分即可
Kafka集群的安装配置
在所有主机启动服务
# ./bin/kafka-server-start.sh -daemon config/server.properties
验证
jps命令应该能看到Kafka模块
netstat应该能看到9092在监听
集群验证与消息发布
创建一个topic
# ./bin/kafka-topics.sh --create -partitions 1 --replication-factor 1 --zookeeper node63:2181 --topic aa
生产者
# ./bin/kafka-console-producer.sh --broker-list node62:9092 --topic aa
消费者
#./bin/kafka-console-consumer.sh --bootstrap-server node61:9092 --topic aa
[[email protected] ~]# tar -xf kafka_2.12-2.1.0.tgz
[[email protected] ~]# mv kafka_2.12-2.1.0 /usr/local/kafka
[[email protected] ~]# cd /usr/local/kafka/config/
[[email protected] config]# vim server.properties ==>>修改配置文件
21 broker.id=61
123 zookeeper.connect=node61:2181,node62:2181,node63:2181
[[email protected] config]# for i in 62 63;do rsync -aSH --delete /usr/local/kafka 192.168.4.$i:/usr/local/;done
[[email protected] ~]# vim +21 /usr/local/kafka/config/server.properties ==>>修改broker.id
21 broker.id=62
[[email protected] ~]# vim +21 /usr/local/kafka/config/server.properties
21 broker.id=63
[[email protected] ~]# /usr/local/kafka/bin/kafka-server-start.sh -daemon /usr/local/kafka/config/server.properties ==>>启动集群
[[email protected] ~]# jps
4388 DataNode
5111 QuorumPeerMain
5660 Kafka ==>>出现Kafka
5677 Jps
[[email protected] ~]# /usr/local/kafka/bin/kafka-server-start.sh -daemon /usr/local/kafka/config/server.properties
[[email protected] ~]# jps
4144 QuorumPeerMain
3590 DataNode
5452 Kafka
5469 Jps
[[email protected] ~]# /usr/local/kafka/bin/kafka-server-start.sh -daemon /usr/local/kafka/config/server.properties
[[email protected] ~]# jps
4176 DataNode
5546 Kafka
5611 Jps
5133 QuorumPeerMain
[[email protected] ~]# /usr/local/kafka/bin/kafka-topics.sh --create --partitions 1 --replication-factor 1 --zookeeper node63:2181 --topic aa ==>>测试配置,创建一个topic
Created topic "aa".
[[email protected] ~]# /usr/local/kafka/bin/kafka-console-producer.sh --broker-list node62:9092 --topic aa ==>>模拟生产者,发布消息
>>add
>a
>d
[[email protected] ~]# /usr/local/kafka/bin/kafka-console-consumer.sh --bootstrap-server node62:9092 --topic aa ==>>模拟消费者,接受消息
>>add
>a
>d
三、Hadoop高可用
1)Hadoop高可用
1.为什么需要NameNode
NameNode是HDFS的核心配置, HDFS又是Hadoop核心组件, NameNode在Hadoop集群中至关重要
NameNode宕机,将导致集群不可用,如果NameNode数据丢失将导致整个集群的数据丢失,而NameNode的数据的更新又比较频繁,实现NameNode高可用势在必行
2.解决方案
官方提供了两种解决方案
HDFS with NFS
HDFS with QJM
两种方案异同
NfS QJM
NN NN
ZK ZK
ZKFailoverController ZKFailoverController
NFS JournalNode
HA方案对比
都能实现热备
都是一个Active NN和一个Standby NN
都使用Zookeeper和ZKFC来实现自动失效恢复
失效切换都使用Fencin配置的方法来Active NN
NFS数据共享变更方案把数据存储在共享存储里,我们还需要考虑NFS的高可用设计
QJM不需要共享存储,但需要让每一一个DN都知道两个NN的位置,并把块信息和心跳包发送给Active和Standby这两个NN
3.使用方案
使用原因( QJM )
解决NameNode单点故障问题
Hadoop给出了HDFS的高可用HA方案: HDFS通常由两个NameNode组成,一个处于Active状态,另一个处于Standby状态。Active NameNode对外提供服务,比如处理来自客户端的RPC请求,而Standby NameNode则不对外提供服务,仅同步Active NameNode的状态,以便能够在它失败时进行切换
典型的HA集群
NameNode会被配置在两台独立的机器上,在任何时候,一个NameNode处于活动状态,而另一个NameNode则处于备份状态
活动状态的NameNode会响应集群中所有的客户端,备份状态的NameNode只是作为一个副本,保证在必要的时候提供一个快速的转移
4.NameNode高可用
NameNode高可用架构
为了让Standby Node与Active Node保持同步, 这两个Node都与一组称为JNS的互相独立的进程保持通信( Journal Nodes )。当Active Node更新了namespace,它将记录修改日志发送给JNS的多数派。Standby Node将会从NS中读取这些edits ,并持续关注它们对日志的变更
Standby Node将日志变更应用在自己的namespace中,当Failover发生时, Standby将会在提升自己为Active之前,确保能够从JNS中读取所有的edits ,即在Failover发生之前Standy持有的namespace与Active保持完全同步
NameNode更新很频繁,为了保持主备数据的一致性,为了支持快速Failover , Standby Node持有集群中blocks的最新位置是非常必要的。为了 达到这一目的,DataNodes上需要同时配置这两个Namenode的地址,同时和它们都建立心跳连接, 并把block位置发送给它们
任何时刻,只能有一个Active NameNode ,否则会导致集群操作混乱,两个NameNode将会有两种不同的数据状态,可能会导致数据丢失或状态异常,这种情况通常称为"split-brain" (脑裂,三节点通讯阻断,即集群中不同的DataNode看到了不同的Active NameNodes )
对于JNS而言, 任何时候只允许一个NameNode作为writer ;在Failover期间,原来的Standby Node将会接管Active的所有职能,负责向JNS写入日志记录,这种机制阻止了其他NameNode处于Active状态的问题
5.NameNode架构图
NameNode高可用架构图
系统规划
主机 角色 软件
192.168.4.60 NameNode1 Hadoop
192.168.4.66 NameNode2 Hadoop
192.168.4.61node61 DataNode journalNode Zookeeper HDFS Zookeeper
192.168.4.62node62 DataNode journalNode Zookeeper HDFS Zookeeper
192.168.4.63node63 DataNode journalNode Zookeeper HDFS Zookeeper
[[email protected] ~]# /usr/local/hadoop/sbin/stop-all.sh ==>>停止所有服务
[[email protected] ~]# /usr/local/zookeeper/bin/zkServer.sh start ==>>一台一台机器启动服务
[[email protected] ~]# /usr/local/zookeeper/bin/zkServer.sh start
[[email protected] ~]# /usr/local/zookeeper/bin/zkServer.sh start
[[email protected] ~]# /usr/local/zookeeper/bin/zkServer.sh start
[[email protected] ~]# /usr/local/zookeeper/conf/api.sh
node61 follower
node62 leader
node63 follower
nn01 observer
[[email protected] ~]# vim /etc/hosts
192.168.4.60 nn60
192.168.4.66 nn66
192.168.4.61 node61
192.168.4.62 node62
192.168.4.63 node63
[[email protected] ~]# for i in {61..63} 66;do rsync -aSH --delete /etc/hosts 192.168.4.$i:/etc/hosts -e 'ssh' & done
[1] 12162
[2] 12163
[3] 12164
[4] 12165
[[email protected] ~]# rsync -aSH /usr/local/hadoop/ 192.168.4.66:/usr/local/hadoop/
[[email protected] ~]# vim /etc/ssh/ssh_config
Host *
GSSAPIAuthentication yes
StrictHostKeyChecking no
[[email protected] ~]# mkdir /var/hadoop
[[email protected] ~]# yum -y install java-1.8.0-openjdk-devel
[[email protected] ~]# scp /root/.ssh/id_rsa nn66:/root/.ssh/
[[email protected] ~]# rm -fr /var/hadoop/* ==>>所有主机删除/var/hadoop/*
[[email protected] ~]# ssh node61 'rm -fr /var/hadoop/*'
[[email protected] ~]# ssh node62 'rm -fr /var/hadoop/*'
[[email protected] ~]# ssh node63 'rm -fr /var/hadoop/*'
[[email protected] ~]# ssh nn66 'rm -fr /var/hadoop/*'
6.core-site配置
core-site.xml文件
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value> ==>>mycluster名字随意,相当于一个组,访问这个组
</property>
<property>
<name>hadoop.tmp.dir</name >
<value>/var/hadoop</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>node61:2181,node62:2181,node63:2181</value>
</property>
[[email protected] ~]# vim /usr/local/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/hadoop</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>node61:2181,node62:2181,node63:2181</value> ==>>zookeeper的地址
</property>
<property>
<name>hadoop.proxyuser.nfsuser.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.nfsuser.hosts</name>
<value>*</value>
</property>
</configuration>
7.hdfs-site配置
hdfs-site.xml文件
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
SecondaryNameNode在高可用里没有用,这里把它关闭
NameNode在后面定义
<!-- 指定hdfs的nameservices名称为mycluster -->
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
指定集群的两个NaneNode的名称分别为nn1,nn2
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
配置nn1,nn2的rpc通信端口
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>nn60:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>nn66:8020</value>
</property>
配置nn1,nn2的http通信端口
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>nn60:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>nn66:50070</value>
</property>
指定NameNode元数据存储在journalnode中的路径
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node1:8485;node2:8485;node3:8485/mycluster</value>
</property>
指定journalnode日志文件存储的路径
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/var/hadoop/journal</value>
</property>
指定HDFS客户端连接Active NameNode的java类
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
配置隔离机制为SSH
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
指定**的位置
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
开启自动故障转移
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
[[email protected] ~]# vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>nn60:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>nn66:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>nn60:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>nn66:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node61:8485;node62:8485;node63:8485/mycluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/var/hadoop/journal</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
8.yarn高可用
ResourceManager高可用
RM的高可用原理与NN样,需要依赖ZK来实现,这里配置文件的关键部分,感兴趣的同学可以自己学习和测试
yarn.resourcemanager.hostname
同理因为使用集群模式,该选项应该关闭
9.yarn-site配置
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>node61:2181,node62:2181,node63:2181</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-ha</value>
</property>
< property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>nn60</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>nn66</value>
</property>
[[email protected] ~]# vim /usr/local/hadoop/etc/hadoop/yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>node61:2181,node62:2181,node63:2181</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-ha</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>nn60</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>nn66</value>
</property>
</configuration>
[[email protected] ~]# for i in {61..63} 66;do rsync -aSH --delete /usr/local/hadoop/ 192.168.4.$i:/usr/local/hadoop/ -e 'ssh' & done ==>>同步所有机器
[1] 12461
[2] 12462
[3] 12463
[4] 12464
[[email protected] ~]# for i in {60..63} 66;do ssh 192.168.4.$i 'rm -fr /usr/local/hadoop/logs';done ==>>方便排错
2)高可用验证
1.集群初始化
初始化
ALL:所有机器
nodeX:node61 node62 node63
ALL:同步配置到所有集群机器
NN1:初始化ZK集群
# ./bin/hdfs zkfc -formatZK
nodeX:启动journalnode服务
# ./sbin/hadoop-daemon.sh start journalnode
NN1:格式化
# ./bin/hdfs namenode -format
NN2:数据同步到本地/var/hadoop/dfs
# rsync -aSH nn01:/var/hadoop/dfs /var/hadoop/
NN1:初始化JNS
# ./bin/hdfs namenode -initializeSharedEdits
nodeX:停止journalnode服务
# ./sbin/hadoop-daemon.sh stop journalnode
启动集群
NN1:启动hdfs
# ./sbin/start-dfs.sh
NN1:启动yarn
井./sbin/start-yarn.sh
NN2:启动热备ResourceManager
# ./sbin/yarn-daemon.sh start resourcemanager
[[email protected] ~]# /usr/local/hadoop/bin/hdfs zkfc -formatZK ==>>初始化ZK集群
[[email protected] ~]# /usr/local/hadoop/sbin/hadoop-daemon.sh start journalnode ==>>在节点上启动journalnode服务
starting journalnode, logging to /usr/local/hadoop/logs/hadoop-root-journalnode-node61.out
[[email protected] ~]# jps
5111 QuorumPeerMain
7480 JournalNode
7516 Jps
[[email protected] ~]# /usr/local/hadoop/sbin/hadoop-daemon.sh start journalnode
starting journalnode, logging to /usr/local/hadoop/logs/hadoop-root-journalnode-node62.out
[[email protected] ~]# jps
4144 QuorumPeerMain
6064 JournalNode
6113 Jps
[[email protected] ~]# /usr/local/hadoop/sbin/hadoop-daemon.sh start journalnode
starting journalnode, logging to /usr/local/hadoop/logs/hadoop-root-journalnode-node63.out
[[email protected] ~]# jps
5133 QuorumPeerMain
7581 JournalNode
7613 Jps
[[email protected] ~]# /usr/local/hadoop/bin/hdfs namenode -format ==>>格式化
[[email protected] ~]# ls /var/hadoop/
dfs
[[email protected] ~]# cd /var/hadoop/
[[email protected] hadoop]# ls
[[email protected] hadoop]# rsync -aSH nn60:/var/hadoop/ /var/hadoop/
[[email protected] hadoop]# ls
dfs
[[email protected] ~]# /usr/local/hadoop/bin/hdfs namenode -initializeSharedEdits ==>>初始化JNS
[[email protected] ~]# /usr/local/hadoop/sbin/hadoop-daemon.sh stop journalnode ==>>停止journalnode服务
stopping journalnode
[[email protected] ~]# jps
7585 Jps
5111 QuorumPeerMain
[[email protected] ~]# /usr/local/hadoop/sbin/hadoop-daemon.sh stop journalnode
stopping journalnode
[[email protected] ~]# jps
4144 QuorumPeerMain
6145 Jps
[[email protected] ~]# /usr/local/hadoop/sbin/hadoop-daemon.sh stop journalnode
stopping journalnode
[[email protected] ~]# jps
5133 QuorumPeerMain
7662 Jps
[[email protected] ~]# /usr/local/hadoop/sbin/start-all.sh ==>>启动所有集群
[[email protected] ~]# /usr/local/hadoop/sbin/yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-nn66.out
2.集群验证
获取NameNode状态
# ./bin/hdfs haadmin -getServiceState nn1
# ./bin/hdfs haadmin -getServiceState nn2
获取ResourceManager状态
# ./bin/yarn rmadmin -getServiceState rm1
# ./bin/yarn rmadmin -getServiceState rm2
获取节点信息
# ./bin/hdfs dfsadmin -report
# ./bin/yarn node -list
访问集群文件
# ./bin/hadoop fs -mkdir /input
# ./bin/hadoop fs -Is hdfs://mycluster/
主从切换Activate
# ./sbin/hadoop-daemon.sh stop namenode
[[email protected] ~]# /usr/local/hadoop/bin/hdfs haadmin -getServiceState nn1
active
[[email protected] ~]# /usr/local/hadoop/bin/hdfs haadmin -getServiceState nn2
standby
[[email protected] ~]# /usr/local/hadoop/bin/yarn rmadmin -getServiceState rm1
active
[[email protected] ~]# /usr/local/hadoop/bin/yarn rmadmin -getServiceState rm2
standby
[[email protected] ~]# /usr/local/hadoop/bin/hdfs dfsadmin -report
Live datanodes (3):
[[email protected] ~]# /usr/local/hadoop/bin/yarn node -list
Total Nodes:3
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
node61:45669 RUNNING node61:8042 0
node62:41813 RUNNING node62:8042 0
node63:41828 RUNNING node63:8042 0
[[email protected] ~]# /usr/local/hadoop/bin/hadoop fs -ls /
[[email protected] ~]# /usr/local/hadoop/bin/hadoop fs -mkdir /abc
[[email protected] ~]# /usr/local/hadoop/bin/hadoop fs -ls /
Found 1 items
drwxr-xr-x - root supergroup 0 2020-08-09 16:35 /abc
[[email protected] ~]# /usr/local/hadoop/bin/hadoop fs -put *.txt /abc
[[email protected] ~]# /usr/local/hadoop/bin/hadoop fs -ls hdfs://mycluster/aa
[[email protected] ~]# /usr/local/hadoop/bin/hdfs haadmin -getServiceState nn1
active
[[email protected] ~]# /usr/local/hadoop/sbin/hadoop-daemon.sh stop namenode
stopping namenode
[[email protected] ~]# /usr/local/hadoop/bin/hdfs haadmin -getServiceState nn1
20/08/09 16:39:27 INFO ipc.Client: Retrying connect to server: nn60/192.168.4.60:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
Operation failed: Call From nn60/192.168.4.60 to nn60:8020 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
[[email protected] ~]# /usr/local/hadoop/bin/hdfs haadmin -getServiceState nn2
active
[[email protected] ~]# /usr/local/hadoop/bin/hdfs haadmin -getServiceState nn2
active
[[email protected] ~]# /usr/local/hadoop/bin/yarn rmadmin -getServiceState rm1
active
[[email protected] ~]# /usr/local/hadoop/sbin/yarn-daemon.sh stop resourcemanager
stopping resourcemanager
[[email protected] ~]# /usr/local/hadoop/bin/yarn rmadmin -getServiceState rm2
active