Docker搭建hadoop完全分布式集群

一、环境

1、Linux

[[email protected] docker-hadoop]# uname -a
Linux localhost.localdomain 3.10.0-957.10.1.el7.x86_64 #1 SMP Mon Mar 18 15:06:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[[email protected] docker-hadoop]# cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core) 

2、docker

[[email protected] docker-hadoop]# docker version
Client:
 Version:           18.09.4
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        d14af54266
 Built:             Wed Mar 27 18:34:51 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.4
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       d14af54
  Built:            Wed Mar 27 18:04:46 2019
  OS/Arch:          linux/amd64
  Experimental:     false
[[email protected] docker-hadoop]# 

3、java

java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)

4、hadoop

[[email protected] /]# hadoop version
Hadoop 2.7.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.3.jar
[[email protected] /]# 

二、机器规划

3台机器,一主二从
主机名: hadoop2、ip地址: 172.19.0.2  (master)

主机名: hadoop3、ip地址: 172.19.0.3   (slaves)
主机名: hadoop4、ip地址: 172.19.0.4    (slaves)

三、构建镜像

1、构建centos-ssh镜像

注:docker hub上已经有安装好ssh服务的docker:komukomo/centos-sshd,我这里直接拉取,不去自己搭建了

 

Docker搭建hadoop完全分布式集群

运行一个容器:

docker run -itd --name centos-ssh komukomo/centos-sshd /bin/bash

开启ssh服务:

[[email protected] /]# /usr/sbin/sshd 
[[email protected] /]# 

查看ssh服务是否已开启:

[[email protected] /]# netstat -antp | grep sshd
tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN      19/sshd             
tcp        0      0 :::22                       :::*                        LISTEN      19/sshd             
[[email protected] /]# 

设置免密登录:

ssh-****** -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Docker搭建hadoop完全分布式集群

验证免密登录是否已生效(密码为root):

[[email protected] /]# ssh [email protected]
The authenticity of host 'localhost (127.0.0.1)' can't be established.
RSA key fingerprint is e5:ab:55:1b:73:c4:51:33:c6:3b:45:a0:b2:34:e7:74.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
[[email protected] ~]# ssh [email protected]
Last login: Fri Apr 12 07:20:00 2019 from localhost
[[email protected] ~]# exit
logout
Connection to localhost closed.
[[email protected] ~]# 

根据该容器构建一个centos-ssh镜像:

[[email protected] ~]# docker commit centos-ssh centos-ssh
sha256:97ef260595ae36d81c9f26b6ed0ed5d13502b7699e079554928ec8cc6fc1b159
[[email protected] ~]# docker images
REPOSITORY             TAG                 IMAGE ID            CREATED             SIZE
centos-ssh             latest              97ef260595ae        4 seconds ago       410MB
centos7-ssh            latest              4e4796f7e8ef        About an hour ago   289MB
<none>                 <none>              e08ee32cfd93        About an hour ago   289MB
<none>                 <none>              3cff40060339        About an hour ago   289MB
centos-tools           latest              bb563754f296        4 hours ago         391MB
jquery134/mycentos     v1.0                8c63d14863d3        4 days ago          354MB
tomcat                 latest              f1332ae3f570        13 days ago         463MB
nginx                  latest              2bcb04bdb83f        2 weeks ago         109MB
centos                 latest              9f38484d220f        4 weeks ago         202MB
ubuntu                 latest              94e814e2efa8        4 weeks ago         88.9MB
jdeathe/centos-ssh     latest              f68976440f24        6 weeks ago         226MB
komukomo/centos-sshd   latest              d969d0bdc7ac        2 years ago         289MB
[[email protected] ~]# 

2、根据centos-ssh镜像构建hadoop镜像

构建时宿主机上的目录结构:

Docker搭建hadoop完全分布式集群

Dockerfile文件内容:

FROM centos-ssh
ADD jdk-8u101-linux-x64.tar.gz /usr/local/
RUN mv /usr/local/jdk1.8.0_101 /usr/local/jdk1.8
ENV JAVA_HOME /usr/local/jdk1.8
ENV PATH $JAVA_HOME/bin:$PATH

ADD hadoop-2.7.3.tar.gz /usr/local
RUN mv /usr/local/hadoop-2.7.3 /usr/local/hadoop
ENV HADOOP_HOME /usr/local/hadoop
ENV PATH $HADOOP_HOME/bin:$PATH

RUN yum install -y which sudo

构建hadoop镜像:

[[email protected] Hadoop]# docker build -t="hadoop" .

3、根据hadoop镜像创建三个容器,并在容器中开启ssh服务

创建自定义网络:

[[email protected] Hadoop]# docker network create --subnet=172.19.0.0/16 mynetwork
522bc0ed2d6048e5f303245d0c85ae36e62d0735f1d2e9ca5c73a11f103c1954
[[email protected] Hadoop]# docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
76cf156331a1        bridge              bridge              local
2059529b97fd        bridge1             bridge              local
2c43b9b438d5        host                host                local
522bc0ed2d60        mynetwork           bridge              local
6c75caf7d102        none                null                local
[[email protected] Hadoop]# 

运行容器:

[[email protected] Hadoop]# docker run -itd --name hadoop2  --net mynetwork  --ip 172.19.0.2 --add-host hadoop2:172.19.0.2 --add-host hadoop3:172.19.0.3 --add-host hadoop4:172.19.0.4 -d -P hadoop /bin/bash
10c1a242c22efd92d8f9007f4f51f5ff6c9e4511daa6d5fd29152ab1ac43c0e5
[[email protected] Hadoop]# docker run -itd --name hadoop3  --net mynetwork  --ip 172.19.0.3 --add-host hadoop2:172.19.0.2 --add-host hadoop3:172.19.0.3 --add-host hadoop4:172.19.0.4 -d -P hadoop /bin/bash
8276aa51a9584ba23aab9cbcc069a157ea34f95cb21eba67189f1bc7347cca81
[[email protected] Hadoop]# docker run -itd --name hadoop4  --net mynetwork  --ip 172.19.0.4 --add-host hadoop2:172.19.0.2 --add-host hadoop3:172.19.0.3 --add-host hadoop4:172.19.0.4 -d -P hadoop /bin/bash
ea17f5a50d5a1c5e2effe26c84e93387440debb91316026a9c7f5dc3700cca56
[[email protected] Hadoop]# 

分别开启三个容器的ssh服务:

[[email protected] Hadoop]# docker exec -d hadoop2 /usr/sbin/sshd
[[email protected] Hadoop]# docker exec -d hadoop3 /usr/sbin/sshd
[[email protected] Hadoop]# docker exec -d hadoop4 /usr/sbin/sshd

验证环境包括java、hadoop、ssh、网络连通性、免密登录:

[[email protected] /]# java -version
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
[[email protected] /]# javac -version
javac 1.8.0_101
[[email protected] /]# ssh [email protected]
Last login: Fri Apr 12 08:07:46 2019 from hadoop2
[[email protected] ~]# exit;
logout
Connection to 172.19.0.3 closed.
[[email protected] /]# hadoop version
Hadoop 2.7.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.3.jar
[[email protected] /]# ping hadoop3
PING hadoop3 (172.19.0.3) 56(84) bytes of data.
64 bytes from hadoop3 (172.19.0.3): icmp_seq=1 ttl=64 time=0.248 ms
64 bytes from hadoop3 (172.19.0.3): icmp_seq=2 ttl=64 time=0.145 ms
^C
--- hadoop3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1936ms
rtt min/avg/max/mdev = 0.145/0.196/0.248/0.053 ms
[[email protected] /]# ping hadoop4
PING hadoop4 (172.19.0.4) 56(84) bytes of data.
64 bytes from hadoop4 (172.19.0.4): icmp_seq=1 ttl=64 time=0.233 ms
64 bytes from hadoop4 (172.19.0.4): icmp_seq=2 ttl=64 time=0.095 ms
^C
--- hadoop4 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1754ms
rtt min/avg/max/mdev = 0.095/0.164/0.233/0.069 ms
[[email protected] /]# ping hadoop4

 

4、配置hadoop

在/usr/local/hadoop/etc/hadoop/hadoop-env.sh中,添加JAVA_HOME信息:

 export JAVA_HOME=/usr/local/jdk1.8

Docker搭建hadoop完全分布式集群

 

 core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
	<property>
		<name>fs.default.name</name>
		<value>hdfs://hadoop2/</value>
	</property>
	<property>
		<name>io.file.buffer.size</name>
		<value>131072</value>
	</property>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/home/hadoop/tmp</value>
		<description>Abase for other temporary directories.</description>
	</property>
</configuration>

Docker搭建hadoop完全分布式集群

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
	<property>
		<name>dfs.namenode.secondary.http-address</name>
		<value>hadoop2:9001</value>
		<description># 通过web界面来查看HDFS状态 </description>
	</property>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>/home/hadoop/dfs/name</value>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>/home/hadoop/dfs/data</value>
	</property>
	<property>
		<name>dfs.replication</name>
		<value>2</value>
		<description># 每个Block有2个备份</description>
	</property>
		<property>
		<name>dfs.webhdfs.enabled</name>
		<value>true</value>
	</property>
</configuration>

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
	<property>
		<name>mapreduce.jobhistory.address</name>
		<value>hadoop2:10020</value>
	</property>
	<property>
		<name>mapreduce.jobhistory.webapp.address</name>
		<value>hadoop2:19888</value>
	</property>
</configuration>

yarn-site.xml

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
	<!-- Site specific YARN configuration properties -->
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>
		<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
		<value>org.apache.hadoop.mapred.ShuffleHandler</value>
	</property>
	<property>
		<name>yarn.resourcemanager.address</name>
		<value>hadoop2:8032</value>
	</property>
	<property>
		<name>yarn.resourcemanager.scheduler.address</name>
		<value>hadoop2:8030</value>
	</property>
	<property>
		<name>yarn.resourcemanager.resource-tracker.address</name>
		<value>hadoop2:8031</value>
	</property>
	<property>
		<name>yarn.resourcemanager.admin.address</name>
		<value>hadoop2:8033</value>
	</property>
	<property>
		<name>yarn.resourcemanager.webapp.address</name>
		<value>hadoop2:8088</value>
	</property>
	<property>
		<name>yarn.nodemanager.resource.memory-mb</name>
		<value>1024</value>
	</property>
	<property>
		<name>yarn.nodemanager.resource.cpu-vcores</name>
		<value>1</value>
	</property>
</configuration>

slaves

hadoop3
hadoop4

将配置好的hadoop拷贝好到hadoop3、hadoop4中:

scp  -rq /usr/local/hadoop   hadoop3:/usr/local
scp  -rq /usr/local/hadoop   hadoop4:/usr/local

执行格式化:

bin/hdfs namenode -format

在master主机上执行start-all.sh脚本启动集群:

查看集群启动结果:

hadoop2上

[[email protected] bin]# jps
643 ResourceManager
310 NameNode
492 SecondaryNameNode
956 Jps
[[email protected] bin]# 

hadoop3上

[[email protected] /]# jps
369 Jps
153 DataNode
250 NodeManager
[[email protected] /]# 

hadoop4上

[[email protected] /]# jps
144 NodeManager
263 Jps
47 DataNode
[[email protected] /]# 

注:可以将hadoop2、hadoop3、hadoop4提交为镜像,方便以后修改端口映射等操作:

Docker搭建hadoop完全分布式集群

[[email protected] Hadoop]# docker run -itd --name hadoop2  --net mynetwork  --ip 172.19.0.2 --add-host hadoop2:172.19.0.2 --add-host hadoop3:172.19.0.3 --add-host hadoop4:172.19.0.4 -d -p 8088:8088 -p 50070:50070 -p 19888:19888 hadoop2 /bin/bash
10c1a242c22efd92d8f9007f4f51f5ff6c9e4511daa6d5fd29152ab1ac43c0e5
[[email protected] Hadoop]# docker run -itd --name hadoop3  --net mynetwork  --ip 172.19.0.3 --add-host hadoop2:172.19.0.2 --add-host hadoop3:172.19.0.3 --add-host hadoop4:172.19.0.4 -d -P hadoop3 /bin/bash
8276aa51a9584ba23aab9cbcc069a157ea34f95cb21eba67189f1bc7347cca81
[[email protected] Hadoop]# docker run -itd --name hadoop4  --net mynetwork  --ip 172.19.0.4 --add-host hadoop2:172.19.0.2 --add-host hadoop3:172.19.0.3 --add-host hadoop4:172.19.0.4 -d -P hadoop4 /bin/bash
ea17f5a50d5a1c5e2effe26c84e93387440debb91316026a9c7f5dc3700cca56
[[email protected] Hadoop]#