Docker搭建hadoop完全分布式集群
一、环境
1、Linux
[[email protected] docker-hadoop]# uname -a
Linux localhost.localdomain 3.10.0-957.10.1.el7.x86_64 #1 SMP Mon Mar 18 15:06:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[[email protected] docker-hadoop]# cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core)
2、docker
[[email protected] docker-hadoop]# docker version
Client:
Version: 18.09.4
API version: 1.39
Go version: go1.10.8
Git commit: d14af54266
Built: Wed Mar 27 18:34:51 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.4
API version: 1.39 (minimum version 1.12)
Go version: go1.10.8
Git commit: d14af54
Built: Wed Mar 27 18:04:46 2019
OS/Arch: linux/amd64
Experimental: false
[[email protected] docker-hadoop]#
3、java
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
4、hadoop
[[email protected] /]# hadoop version
Hadoop 2.7.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.3.jar
[[email protected] /]#
二、机器规划
3台机器,一主二从
主机名: hadoop2、ip地址: 172.19.0.2 (master)
主机名: hadoop3、ip地址: 172.19.0.3 (slaves)
主机名: hadoop4、ip地址: 172.19.0.4 (slaves)
三、构建镜像
1、构建centos-ssh镜像
注:docker hub上已经有安装好ssh服务的docker:komukomo/centos-sshd,我这里直接拉取,不去自己搭建了
运行一个容器:
docker run -itd --name centos-ssh komukomo/centos-sshd /bin/bash
开启ssh服务:
[[email protected] /]# /usr/sbin/sshd
[[email protected] /]#
查看ssh服务是否已开启:
[[email protected] /]# netstat -antp | grep sshd
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 19/sshd
tcp 0 0 :::22 :::* LISTEN 19/sshd
[[email protected] /]#
设置免密登录:
ssh-****** -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
验证免密登录是否已生效(密码为root):
[[email protected] /]# ssh [email protected]
The authenticity of host 'localhost (127.0.0.1)' can't be established.
RSA key fingerprint is e5:ab:55:1b:73:c4:51:33:c6:3b:45:a0:b2:34:e7:74.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
[[email protected] ~]# ssh [email protected]
Last login: Fri Apr 12 07:20:00 2019 from localhost
[[email protected] ~]# exit
logout
Connection to localhost closed.
[[email protected] ~]#
根据该容器构建一个centos-ssh镜像:
[[email protected] ~]# docker commit centos-ssh centos-ssh
sha256:97ef260595ae36d81c9f26b6ed0ed5d13502b7699e079554928ec8cc6fc1b159
[[email protected] ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
centos-ssh latest 97ef260595ae 4 seconds ago 410MB
centos7-ssh latest 4e4796f7e8ef About an hour ago 289MB
<none> <none> e08ee32cfd93 About an hour ago 289MB
<none> <none> 3cff40060339 About an hour ago 289MB
centos-tools latest bb563754f296 4 hours ago 391MB
jquery134/mycentos v1.0 8c63d14863d3 4 days ago 354MB
tomcat latest f1332ae3f570 13 days ago 463MB
nginx latest 2bcb04bdb83f 2 weeks ago 109MB
centos latest 9f38484d220f 4 weeks ago 202MB
ubuntu latest 94e814e2efa8 4 weeks ago 88.9MB
jdeathe/centos-ssh latest f68976440f24 6 weeks ago 226MB
komukomo/centos-sshd latest d969d0bdc7ac 2 years ago 289MB
[[email protected] ~]#
2、根据centos-ssh镜像构建hadoop镜像
构建时宿主机上的目录结构:
Dockerfile文件内容:
FROM centos-ssh
ADD jdk-8u101-linux-x64.tar.gz /usr/local/
RUN mv /usr/local/jdk1.8.0_101 /usr/local/jdk1.8
ENV JAVA_HOME /usr/local/jdk1.8
ENV PATH $JAVA_HOME/bin:$PATH
ADD hadoop-2.7.3.tar.gz /usr/local
RUN mv /usr/local/hadoop-2.7.3 /usr/local/hadoop
ENV HADOOP_HOME /usr/local/hadoop
ENV PATH $HADOOP_HOME/bin:$PATH
RUN yum install -y which sudo
构建hadoop镜像:
[[email protected] Hadoop]# docker build -t="hadoop" .
3、根据hadoop镜像创建三个容器,并在容器中开启ssh服务
创建自定义网络:
[[email protected] Hadoop]# docker network create --subnet=172.19.0.0/16 mynetwork
522bc0ed2d6048e5f303245d0c85ae36e62d0735f1d2e9ca5c73a11f103c1954
[[email protected] Hadoop]# docker network ls
NETWORK ID NAME DRIVER SCOPE
76cf156331a1 bridge bridge local
2059529b97fd bridge1 bridge local
2c43b9b438d5 host host local
522bc0ed2d60 mynetwork bridge local
6c75caf7d102 none null local
[[email protected] Hadoop]#
运行容器:
[[email protected] Hadoop]# docker run -itd --name hadoop2 --net mynetwork --ip 172.19.0.2 --add-host hadoop2:172.19.0.2 --add-host hadoop3:172.19.0.3 --add-host hadoop4:172.19.0.4 -d -P hadoop /bin/bash
10c1a242c22efd92d8f9007f4f51f5ff6c9e4511daa6d5fd29152ab1ac43c0e5
[[email protected] Hadoop]# docker run -itd --name hadoop3 --net mynetwork --ip 172.19.0.3 --add-host hadoop2:172.19.0.2 --add-host hadoop3:172.19.0.3 --add-host hadoop4:172.19.0.4 -d -P hadoop /bin/bash
8276aa51a9584ba23aab9cbcc069a157ea34f95cb21eba67189f1bc7347cca81
[[email protected] Hadoop]# docker run -itd --name hadoop4 --net mynetwork --ip 172.19.0.4 --add-host hadoop2:172.19.0.2 --add-host hadoop3:172.19.0.3 --add-host hadoop4:172.19.0.4 -d -P hadoop /bin/bash
ea17f5a50d5a1c5e2effe26c84e93387440debb91316026a9c7f5dc3700cca56
[[email protected] Hadoop]#
分别开启三个容器的ssh服务:
[[email protected] Hadoop]# docker exec -d hadoop2 /usr/sbin/sshd
[[email protected] Hadoop]# docker exec -d hadoop3 /usr/sbin/sshd
[[email protected] Hadoop]# docker exec -d hadoop4 /usr/sbin/sshd
验证环境包括java、hadoop、ssh、网络连通性、免密登录:
[[email protected] /]# java -version
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
[[email protected] /]# javac -version
javac 1.8.0_101
[[email protected] /]# ssh [email protected]
Last login: Fri Apr 12 08:07:46 2019 from hadoop2
[[email protected] ~]# exit;
logout
Connection to 172.19.0.3 closed.
[[email protected] /]# hadoop version
Hadoop 2.7.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.3.jar
[[email protected] /]# ping hadoop3
PING hadoop3 (172.19.0.3) 56(84) bytes of data.
64 bytes from hadoop3 (172.19.0.3): icmp_seq=1 ttl=64 time=0.248 ms
64 bytes from hadoop3 (172.19.0.3): icmp_seq=2 ttl=64 time=0.145 ms
^C
--- hadoop3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1936ms
rtt min/avg/max/mdev = 0.145/0.196/0.248/0.053 ms
[[email protected] /]# ping hadoop4
PING hadoop4 (172.19.0.4) 56(84) bytes of data.
64 bytes from hadoop4 (172.19.0.4): icmp_seq=1 ttl=64 time=0.233 ms
64 bytes from hadoop4 (172.19.0.4): icmp_seq=2 ttl=64 time=0.095 ms
^C
--- hadoop4 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1754ms
rtt min/avg/max/mdev = 0.095/0.164/0.233/0.069 ms
[[email protected] /]# ping hadoop4
4、配置hadoop
在/usr/local/hadoop/etc/hadoop/hadoop-env.sh中,添加JAVA_HOME信息:
export JAVA_HOME=/usr/local/jdk1.8
core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop2/</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
</configuration>
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop2:9001</value>
<description># 通过web界面来查看HDFS状态 </description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
<description># 每个Block有2个备份</description>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop2:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop2:19888</value>
</property>
</configuration>
yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop2:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop2:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop2:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop2:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop2:8088</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>1</value>
</property>
</configuration>
slaves
hadoop3
hadoop4
将配置好的hadoop拷贝好到hadoop3、hadoop4中:
scp -rq /usr/local/hadoop hadoop3:/usr/local
scp -rq /usr/local/hadoop hadoop4:/usr/local
执行格式化:
bin/hdfs namenode -format
在master主机上执行start-all.sh脚本启动集群:
查看集群启动结果:
hadoop2上
[[email protected] bin]# jps
643 ResourceManager
310 NameNode
492 SecondaryNameNode
956 Jps
[[email protected] bin]#
hadoop3上
[[email protected] /]# jps
369 Jps
153 DataNode
250 NodeManager
[[email protected] /]#
hadoop4上
[[email protected] /]# jps
144 NodeManager
263 Jps
47 DataNode
[[email protected] /]#
注:可以将hadoop2、hadoop3、hadoop4提交为镜像,方便以后修改端口映射等操作:
[[email protected] Hadoop]# docker run -itd --name hadoop2 --net mynetwork --ip 172.19.0.2 --add-host hadoop2:172.19.0.2 --add-host hadoop3:172.19.0.3 --add-host hadoop4:172.19.0.4 -d -p 8088:8088 -p 50070:50070 -p 19888:19888 hadoop2 /bin/bash
10c1a242c22efd92d8f9007f4f51f5ff6c9e4511daa6d5fd29152ab1ac43c0e5
[[email protected] Hadoop]# docker run -itd --name hadoop3 --net mynetwork --ip 172.19.0.3 --add-host hadoop2:172.19.0.2 --add-host hadoop3:172.19.0.3 --add-host hadoop4:172.19.0.4 -d -P hadoop3 /bin/bash
8276aa51a9584ba23aab9cbcc069a157ea34f95cb21eba67189f1bc7347cca81
[[email protected] Hadoop]# docker run -itd --name hadoop4 --net mynetwork --ip 172.19.0.4 --add-host hadoop2:172.19.0.2 --add-host hadoop3:172.19.0.3 --add-host hadoop4:172.19.0.4 -d -P hadoop4 /bin/bash
ea17f5a50d5a1c5e2effe26c84e93387440debb91316026a9c7f5dc3700cca56
[[email protected] Hadoop]#