5节点 Hadoop +zookeeper 完全分布式集群(HA)

5节点 Hadoop +zookeeper 完全分布式集群(HA)

5节点 Hadoop +zookeeper 完全分布式集群(HA)

主机 信息规划 


master slave1 slave2 slave3 slave4
namenode
datanode
resourcemanager
journalnode
zookeeper


软件规划 


软件 版本 说明
JDK JDK 1.7 最新稳定
zookeeper zookeeper 3.4.6 稳定版本
hadoop hadoop 2.7.3 稳定版本


集群安装前的检查 


时钟同步 date
hosts 文件检查 /etc/hosts 
禁用防火墙 chkconfig iptables off







[[email protected] software]# cat /etc/hosts  | grep cdh
192.168.137.141        cdh1 
192.168.137.142        cdh2 
192.168.137.143        cdh3 
192.168.137.144        cdh4
192.168.137.145        cdh5 






[[email protected]cdh1 software]# cat /etc/profile |grep JAVA
############JAVA_HOME##########3
export JAVA_HOME=/usr/java/jdk1.7.0_79
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH








----- 时钟同步检查 
查看当前系统时间
date 
如果系统时间与当前时间不一致,进行以下操作。
以cdh1 为NTP 服务器端,其他节点的时间 以CDH1 的为准


ntpdate pool.ntp.org


# Hosts on local network are less restricted.
restrict 192.168.189.0 mask 255.255.255.0 nomodify notrap


-》第二处修改:将三个服务地址注释掉,因为不是外网不需要
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool 
#server 0.centos.pool.ntp.org
#server 1.centos.pool.ntp.org
#server 2.centos.pool.ntp.org
-》第三处修改:添加设置ntp本地服务
server  127.127.1.0     # local clock
fudge   127.127.1.0 stratum 10
-》配置完成后,需要重启ntpd服务
service ntpd restart




crontab -e制定定时任务自动同步时间:
##sync time
0-59/10 * * * * /usr/sbin/ntpdate cdh1


执行时间同步:
/usr/sbin/ntpdate cdh1


 vi /etc/sysconfig/ntpd


 vi /etc/sysconfig/ntpd
与BIOS时间同步
# Drop root to id 'ntp:ntp' by default.
SYNC_HWCLOCK=yes   是否允许与BIOS系统时间同步
OPTIONS="-u ntp:ntp -p /var/run/ntpd.pid -g"


(3) 创建 用户组 用户 sudo权限设置及 ssh无密码登录


groupadd  hadoop
useradd -g hadoop hadoop
passwd hadoop   --- 设置密码 




[[email protected] ~]# cat /etc/sudoers |grep hadoop
hadoop  ALL=(ALL)       NOPASSWD:ALL,/usr/sbin/passwd root


su  - hadoop 
ssh-****** -t rsa
cd .ssh/
cat id_rsa.pub >> authorized_keys
chmod 600 authorized_keys
集群所有节点都要行上面的操作
将所有节点中的共钥id_ras.pub拷贝到hadoop-master中的authorized_keys文件中。
 cat ~/.ssh/id_rsa.pub | ssh [email protected] 'cat >> ~/.ssh/authorized_keys'  所有节点都需要执行这条命令


然后将hadoop-master中的authorized_keys文件分发到所有节点上面。
 scp -r authorized_keys [email protected]:~/.ssh/


scp -r authorized_keys [email protected]:~/.ssh/


scp -r authorized_keys [email protected]:~/.ssh/


scp -r authorized_keys [email protected]:~/.ssh/






(4)脚本工具的使用


mkdir -p /home/hadoop/tools
然后上传软件并赋予 执行权限 ‘


[[email protected] tools]$ cat deploy.conf
#### NOTES
# There is crontab job using this config file which would compact log files and remove old log file.
# please be  carefully while modifying this file until you know what crontab exactly do
#hdp
hadoop-master,all,namenode,zookeeper,resourcemanager,
cdh1,all,slave,namenode,zookeeper,resourcemanager,
cdh2,all,slave,datanode,zookeeper,
cdh3,all,slave,datanode,zookeeper,
cdh4,all,slave,datanode,zookeeper,


上面的顺序不能颠倒 




[[email protected] tools]$ chmod u+x deploy.sh
[[email protected] tools]$ chmod u+x runRemoteCmd.sh


同时我们需要将/home/hadoop/tools目录配置到PATH路径中。


[[email protected] tools]$ su root
Password:
[[email protected] tools]# vi /etc/profile
PATH=/home/hadoop/tools:$PATH 
export PATH
 


runRemoteCmd.sh "mkdir /home/hadoop/app" all 


(3)  部署 zookeeper 
tar xf zookeeper-3.4.5-cdh5.9.1.tar.gz  
 mv zookeeper-3.4.5-cdh5.9.1 hadoop-2.6.0-cdh5.9.1 sqoop-1.4.6-cdh5.9.1 hive-1.1.0-cdh5.9.1 hbase-1.2.0-cdh5.9.1 ./app/






[[email protected] conf]$ pwd
/home/hadoop/app/zookeeper-3.4.5-cdh5.9.1/conf


[[email protected] conf]$ cp -rp zoo_sample.cfg  zoo.cfg


zoo_.cfg添加一下内容 
dataDir=/home/hadoop/data/zookeeper/zkdata  
dataLogDir=/home/hadoop/data/zookeeper/zkdatalog
# the port at which the clients will connect
clientPort=2181
server.1=cdh1:2888:3888
server.2=cdh2:2888:3888
server.3=cdh3:2888:3888


mkdir -p /home/hadoop/data/zookeeper/zkdata
mkdir -p /home/hadoop/data/zookeeper/zkdatalog
deploy.sh  zookeeper-3.4.5-cdh5.9.1 /home/hadoop/app all 


 
通过远程命令runRemoteCmd.sh在所有的节点上面创建目录:


[[email protected] app]$ runRemoteCmd.sh "mkdir -p /home/hadoop/data/zookeeper/zkdata" all //创建数据目录 
[[email protected] app]$ runRemoteCmd.sh "mkdir -p /home/hadoop/data/zookeeper/zkdatalog" all //创建日志目录
 
[[email protected] zkdata]$ pwd
/home/hadoop/data/zookeeper/zkdata
[[email protected] zkdata]$ cat myid
1


修改环境变量 


[[email protected] ~]$ cat .bash_profile |grep ZOOKEEPER
#############ZOOKEEPER_HOME##
export ZOOKEEPER_HOME=/home/hadoop/app/zookeeper-3.4.5-cdh5.9.1
export PATH=$PATH:$ZOOKEEPER_HOME/bin


deploy.sh /home/hadoop/.bash_profile  /home/hadoop/ all     分发环境变量 
runRemoteCmd.sh "source /home/hadoop/.bash_profile" all    
runRemoteCmd.sh "$ZOOKEEPER_HOME/bin/zkServer.sh start" zookeeper     启动 


[[email protected] app]$ runRemoteCmd.sh "jps" zookeeper   检查 zookeeper 状态 


[[email protected] app]$  runRemoteCmd.sh "$ZOOKEEPER_HOME/bin/zkServer.sh status" zookeeper




==============安装 Hadoop 集群============= 


[[email protected] hadoop]$ pwd
/home/hadoop/app/hadoop-2.6.0-cdh5.9.1/etc/hadoop


在这几个hadoop-env.sh   mapred-env.sh  yarn-env.sh  手动添加 Java_home 
export JAVA_HOME=/usr/java/jdk1.7.0_79




修改  core-site.xml  文件 
<property>
                <name>fs.defaultFS</name>
                <value>hdfs://cluster1</value>
</property>
<property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/data/tmp</value>
</property>
<property>
        <name>ha.zookeeper.quorum</name>
        <value>cdh1:2181,cdh2:2181,cdh3:2181</value>
</property>








修改 hdfs-site.xml  文件
<property>
                <name>dfs.replication</name>
                <value>3</value>
    </property>
        <!-- 数据块副本数为3-->
        <property>
                <name>dfs.permissions</name>
                <value>false</value>
        </property>
        <property>
                <name>dfs.permissions.enabled</name>
                <value>false</value>
        </property>
        <!-- 权限默认配置为false-->
        <property>
                <name>dfs.nameservices</name>
                <value>cluster1</value>
        </property>
        <!-- 命名空间,它的值与fs.defaultFS的值要对应,namenode高可用之后有两个namenode,cluster1是对外提供的统一入口-->
        <property>
                <name>dfs.ha.namenodes.cluster1</name>
                <value>cdh1,cdh2</value>
        </property>
        <!-- 指定 nameService 是 cluster1 时的nameNode有哪些,这里的值也是逻辑名称,名字随便起,相互不重复即可-->
        <property>
                <name>dfs.namenode.rpc-address.cluster1.cdh1</name>
                <value>cdh1:9000</value>
        </property>
        <!-- djt11 rpc地址-->
        <property>
                <name>dfs.namenode.http-address.cluster1.cdh1</name>
                <value>cdh1:50070</value>
        </property>
        <!-- djt11 http地址-->
        <property>
                <name>dfs.namenode.rpc-address.cluster1.cdh2</name>
                <value>cdh2:9000</value>
        </property>
        <!-- djt12 rpc地址-->
        <property>
                <name>dfs.namenode.http-address.cluster1.cdh2</name>
                <value>cdh2:50070</value>
        </property>
        <!--djt12 http地址-->
        <property>
                <name>dfs.ha.automatic-failover.enabled</name>
                <value>true</value>
    </property>
        <!-- 启动故障自动恢复-->
        <property>
                <name>dfs.namenode.shared.edits.dir</name>
                <value>qjournal://cdh1:8485;cdh2:8485;cdh3:8485/cluster1</value>
        </property>
        <!-- 指定journal-->
        <property>
                <name>dfs.client.failover.proxy.provider.cluster1</name>
                <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
        <!-- 指定 cluster1 出故障时,哪个实现类负责执行故障切换-->
    <property>
                <name>dfs.journalnode.edits.dir</name>
                <value>/home/hadoop/data/journaldata/jn</value>
    </property>
        <!-- 指定JournalNode集群在对nameNode的目录进行共享时,自己存储数据的磁盘路径-->
        <property>
                <name>dfs.ha.fencing.methods</name>
                <value>shell(/bin/true)</value>
    </property>
    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
        <property>
        <name>dfs.ha.fencing.ssh.connect-timeout</name>
        <value>10000</value>
    </property>
        <!-- 脑裂默认配置-->
    <property>
                <name>dfs.namenode.handler.count</name>
                <value>100</value>
    </property>




修改 slave 
cdh2
cdh3


修改 mapred-site.xml 
<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>


<!-- jobhistory properties -->
<property>
  <name>mapreduce.jobhistory.address</name>
  <value>cdh1:10020</value>
</property>


<property>
  <name>mapreduce.jobhistory.webapp.address</name>
  <value>cdh1:19888</value>
</property>




修改 yarn-site.xml 






<property>
        <name>yarn.resourcemanager.connect.retry-interval.ms</name>
        <value>2000</value>
</property>
<property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
</property>
<property>
        <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
        <value>true</value>
</property>
<property>
        <name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
        <value>true</value>
</property>
<property>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>yarn-rm-cluster</value>
</property>
<property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
</property>
<property>
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>cdh1</value>
</property>
<property>
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>cdh2</value>
</property>
<property>
        <name>yarn.resourcemanager.recovery.enabled</name>
        <value>true</value>
</property>
<property>
        <name>yarn.resourcemanager.zk.state-store.address</name>
        <value>cdh1:2181,cdh2:2181,cdh3:2181</value>
</property>
<property>
        <name>yarn.resourcemanager.zk-address</name>
        <value>cdh1:2181,cdh2:2181,cdh3:2181</value>
</property>
<property>
        <name>yarn.resourcemanager.address.rm1</name>
        <value>cdh1:8032</value>
</property>
<property>
        <name>yarn.resourcemanager.scheduler.address.rm1</name>
        <value>cdh1:8034</value>
</property>
<property>
        <name>yarn.resourcemanager.webapp.address.rm1</name>
        <value>cdh1:8088</value>
</property>
<property>
        <name>yarn.resourcemanager.address.rm2</name>
        <value>cdh2:8032</value>
</property>
<property>
        <name>yarn.resourcemanager.scheduler.address.rm2</name>
        <value>cdh2:8034</value>
</property>
<property>
        <name>yarn.resourcemanager.webapp.address.rm2</name>
        <value>cdh2:8088</value>
</property>
<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
</property>
<property>
        <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>








runRemoteCmd.sh "/home/hadoop/app/hadoop-2.6.0-cdh5.9.1/sbin/hadoop-daemon.sh start journalnode" all
 hdfs namenode -format


 runRemoteCmd.sh "/home/hadoop/app/hadoop/sbin/hadoop-daemon.sh stop journalnode" all   在CDH1  节点 执行 


sbin/start-dfs.sh 
 start-dfs.sh 
start-yarn.sh