5节点 Hadoop +zookeeper 完全分布式集群(HA)
主机 信息规划
master slave1
slave2 slave3
slave4
namenode 是
是 否 否
否
datanode 否
否 是 是
是
resourcemanager 是
是 否
否 否
journalnode 是
是 是 是
是
zookeeper 是
是 是 是
是
软件规划
软件 版本 说明
JDK JDK 1.7
最新稳定
zookeeper zookeeper 3.4.6
稳定版本
hadoop hadoop 2.7.3
稳定版本
集群安装前的检查
时钟同步 date
hosts 文件检查 /etc/hosts
禁用防火墙 chkconfig iptables off
[[email protected] software]# cat /etc/hosts | grep cdh
192.168.137.141 cdh1
192.168.137.142 cdh2
192.168.137.143 cdh3
192.168.137.144 cdh4
192.168.137.145 cdh5
[[email protected]cdh1 software]# cat /etc/profile |grep JAVA
############JAVA_HOME##########3
export JAVA_HOME=/usr/java/jdk1.7.0_79
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
----- 时钟同步检查
查看当前系统时间
date
如果系统时间与当前时间不一致,进行以下操作。
以cdh1 为NTP 服务器端,其他节点的时间 以CDH1 的为准
ntpdate pool.ntp.org
# Hosts on local network are less restricted.
restrict 192.168.189.0 mask 255.255.255.0 nomodify notrap
-》第二处修改:将三个服务地址注释掉,因为不是外网不需要
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool
#server 0.centos.pool.ntp.org
#server 1.centos.pool.ntp.org
#server 2.centos.pool.ntp.org
-》第三处修改:添加设置ntp本地服务
server 127.127.1.0 # local clock
fudge 127.127.1.0 stratum 10
-》配置完成后,需要重启ntpd服务
service ntpd restart
crontab -e制定定时任务自动同步时间:
##sync time
0-59/10 * * * * /usr/sbin/ntpdate cdh1
执行时间同步:
/usr/sbin/ntpdate cdh1
vi /etc/sysconfig/ntpd
vi /etc/sysconfig/ntpd
与BIOS时间同步
# Drop root to id 'ntp:ntp' by default.
SYNC_HWCLOCK=yes 是否允许与BIOS系统时间同步
OPTIONS="-u ntp:ntp -p /var/run/ntpd.pid -g"
(3) 创建 用户组 用户 sudo权限设置及 ssh无密码登录
groupadd hadoop
useradd -g hadoop hadoop
passwd hadoop --- 设置密码
[[email protected] ~]# cat /etc/sudoers |grep hadoop
hadoop ALL=(ALL) NOPASSWD:ALL,/usr/sbin/passwd root
su - hadoop
ssh-****** -t rsa
cd .ssh/
cat id_rsa.pub >> authorized_keys
chmod 600 authorized_keys
集群所有节点都要行上面的操作
将所有节点中的共钥id_ras.pub拷贝到hadoop-master中的authorized_keys文件中。
cat ~/.ssh/id_rsa.pub | ssh [email protected] 'cat >> ~/.ssh/authorized_keys' 所有节点都需要执行这条命令
然后将hadoop-master中的authorized_keys文件分发到所有节点上面。
scp -r authorized_keys [email protected]:~/.ssh/
scp -r authorized_keys [email protected]:~/.ssh/
scp -r authorized_keys [email protected]:~/.ssh/
scp -r authorized_keys [email protected]:~/.ssh/
(4)脚本工具的使用
mkdir -p /home/hadoop/tools
然后上传软件并赋予 执行权限 ‘
[[email protected] tools]$ cat deploy.conf
#### NOTES
# There is crontab job using this config file which would compact log files and remove old log file.
# please be carefully while modifying this file until you know what crontab exactly do
#hdp
hadoop-master,all,namenode,zookeeper,resourcemanager,
cdh1,all,slave,namenode,zookeeper,resourcemanager,
cdh2,all,slave,datanode,zookeeper,
cdh3,all,slave,datanode,zookeeper,
cdh4,all,slave,datanode,zookeeper,
上面的顺序不能颠倒
[[email protected] tools]$ chmod u+x deploy.sh
[[email protected] tools]$ chmod u+x runRemoteCmd.sh
同时我们需要将/home/hadoop/tools目录配置到PATH路径中。
[[email protected] tools]$ su root
Password:
[[email protected] tools]# vi /etc/profile
PATH=/home/hadoop/tools:$PATH
export PATH
runRemoteCmd.sh "mkdir /home/hadoop/app" all
(3) 部署 zookeeper
tar xf zookeeper-3.4.5-cdh5.9.1.tar.gz
mv zookeeper-3.4.5-cdh5.9.1 hadoop-2.6.0-cdh5.9.1 sqoop-1.4.6-cdh5.9.1 hive-1.1.0-cdh5.9.1 hbase-1.2.0-cdh5.9.1 ./app/
[[email protected] conf]$ pwd
/home/hadoop/app/zookeeper-3.4.5-cdh5.9.1/conf
[[email protected] conf]$ cp -rp zoo_sample.cfg zoo.cfg
zoo_.cfg添加一下内容
dataDir=/home/hadoop/data/zookeeper/zkdata
dataLogDir=/home/hadoop/data/zookeeper/zkdatalog
# the port at which the clients will connect
clientPort=2181
server.1=cdh1:2888:3888
server.2=cdh2:2888:3888
server.3=cdh3:2888:3888
mkdir -p /home/hadoop/data/zookeeper/zkdata
mkdir -p /home/hadoop/data/zookeeper/zkdatalog
deploy.sh zookeeper-3.4.5-cdh5.9.1 /home/hadoop/app all
通过远程命令runRemoteCmd.sh在所有的节点上面创建目录:
[[email protected] app]$ runRemoteCmd.sh "mkdir -p /home/hadoop/data/zookeeper/zkdata" all //创建数据目录
[[email protected] app]$ runRemoteCmd.sh "mkdir -p /home/hadoop/data/zookeeper/zkdatalog" all //创建日志目录
[[email protected] zkdata]$ pwd
/home/hadoop/data/zookeeper/zkdata
[[email protected] zkdata]$ cat myid
1
修改环境变量
[[email protected] ~]$ cat .bash_profile |grep ZOOKEEPER
#############ZOOKEEPER_HOME##
export ZOOKEEPER_HOME=/home/hadoop/app/zookeeper-3.4.5-cdh5.9.1
export PATH=$PATH:$ZOOKEEPER_HOME/bin
deploy.sh /home/hadoop/.bash_profile /home/hadoop/ all 分发环境变量
runRemoteCmd.sh "source /home/hadoop/.bash_profile" all
runRemoteCmd.sh "$ZOOKEEPER_HOME/bin/zkServer.sh start" zookeeper 启动
[[email protected] app]$ runRemoteCmd.sh "jps" zookeeper 检查 zookeeper 状态
[[email protected] app]$ runRemoteCmd.sh "$ZOOKEEPER_HOME/bin/zkServer.sh status" zookeeper
==============安装 Hadoop 集群=============
[[email protected] hadoop]$ pwd
/home/hadoop/app/hadoop-2.6.0-cdh5.9.1/etc/hadoop
在这几个hadoop-env.sh mapred-env.sh yarn-env.sh 手动添加 Java_home
export JAVA_HOME=/usr/java/jdk1.7.0_79
修改 core-site.xml 文件
<property>
<name>fs.defaultFS</name>
<value>hdfs://cluster1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/data/tmp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>cdh1:2181,cdh2:2181,cdh3:2181</value>
</property>
修改 hdfs-site.xml 文件
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 数据块副本数为3-->
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<!-- 权限默认配置为false-->
<property>
<name>dfs.nameservices</name>
<value>cluster1</value>
</property>
<!-- 命名空间,它的值与fs.defaultFS的值要对应,namenode高可用之后有两个namenode,cluster1是对外提供的统一入口-->
<property>
<name>dfs.ha.namenodes.cluster1</name>
<value>cdh1,cdh2</value>
</property>
<!-- 指定 nameService 是 cluster1 时的nameNode有哪些,这里的值也是逻辑名称,名字随便起,相互不重复即可-->
<property>
<name>dfs.namenode.rpc-address.cluster1.cdh1</name>
<value>cdh1:9000</value>
</property>
<!-- djt11 rpc地址-->
<property>
<name>dfs.namenode.http-address.cluster1.cdh1</name>
<value>cdh1:50070</value>
</property>
<!-- djt11 http地址-->
<property>
<name>dfs.namenode.rpc-address.cluster1.cdh2</name>
<value>cdh2:9000</value>
</property>
<!-- djt12 rpc地址-->
<property>
<name>dfs.namenode.http-address.cluster1.cdh2</name>
<value>cdh2:50070</value>
</property>
<!--djt12 http地址-->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 启动故障自动恢复-->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://cdh1:8485;cdh2:8485;cdh3:8485/cluster1</value>
</property>
<!-- 指定journal-->
<property>
<name>dfs.client.failover.proxy.provider.cluster1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 指定 cluster1 出故障时,哪个实现类负责执行故障切换-->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/data/journaldata/jn</value>
</property>
<!-- 指定JournalNode集群在对nameNode的目录进行共享时,自己存储数据的磁盘路径-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(/bin/true)</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>10000</value>
</property>
<!-- 脑裂默认配置-->
<property>
<name>dfs.namenode.handler.count</name>
<value>100</value>
</property>
修改 slave
cdh2
cdh3
修改 mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- jobhistory properties -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>cdh1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>cdh1:19888</value>
</property>
修改 yarn-site.xml
<property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>2000</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-rm-cluster</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>cdh1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>cdh2</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.zk.state-store.address</name>
<value>cdh1:2181,cdh2:2181,cdh3:2181</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>cdh1:2181,cdh2:2181,cdh3:2181</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>cdh1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>cdh1:8034</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>cdh1:8088</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>cdh2:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>cdh2:8034</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>cdh2:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
runRemoteCmd.sh "/home/hadoop/app/hadoop-2.6.0-cdh5.9.1/sbin/hadoop-daemon.sh start journalnode" all
hdfs namenode -format
runRemoteCmd.sh "/home/hadoop/app/hadoop/sbin/hadoop-daemon.sh stop journalnode" all 在CDH1 节点 执行
sbin/start-dfs.sh
start-dfs.sh
start-yarn.sh