Linux下Hadoop2.7.1集群环境的搭建(3台为例)

一、基础环境

在Linux上安装Hadoop之前,需要先安装两个程序:

1.1 安装说明

1. JDK 1.6或更高版本(本文所提到的安装的是jdk1.7); redHat自带的jdk一般不用,删除后重新装自己需要的

2. SSH(安全外壳协议),推荐安装MobaXterm_Personal。(功能的,好用)

二、Host配置

由于我搭建Hadoop集群包含三台机器,所以需要修改调整各台机器的hosts文件配置,进入/etc/hosts,配置主机名和ip的映射,命令如下:

vim /etc/hosts

如果没有足够的权限,可以切换用户为root。

三台机器的内容统一增加以下host配置:

可以通过hostname来修改服务器名称为redHat1,redHat2,redHat3

Linux下Hadoop2.7.1集群环境的搭建(3台为例)

三、Hadoop的安装与配置

3.1 创建文件目录

为了便于管理,给redHat1的hdfs的NameNode、DataNode及临时文件,在用户目录下创建目录:

/data/hdfs/name

/data/hdfs/data

/data/hdfs/tmp

然后将这些目录通过scp命令拷贝到redHat2和redHat3的相同目录下。

3.2 下载

首先到Apache官网下载Hadoop,从中选择推荐的下载镜像,我选择hadoop-2.7.1的版本,并使用以下命令下载到redHat1机器的

/data目录:

wget http://archive.apache.org/dist/hadoop/core/hadoop-2.7.1/hadoop-2.7.1.tar.gz

然后使用以下命令将hadoop-2.7.1.tar.gz 解压缩到/data目录

tar -zxvf hadoop-2.7.1.tar.gz

3.3 配置环境变量

回到/data目录,配置hadoop环境变量,命令如下:

vim /etc/profile

在/etc/profile添加如下内容

Linux下Hadoop2.7.1集群环境的搭建(3台为例)

立刻让hadoop环境变量生效,执行如下命令:

source /etc/profile

再使用hadoop命令,发现可以有提示了,则表示配置生效了。

hadoop

Linux下Hadoop2.7.1集群环境的搭建(3台为例)

3.4 Hadoop的配置

进入hadoop-2.7.1的配置目录:

cd /data/hadoop-2.7.1/etc/hadoop

依次修改core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml以及slaves文件。

Linux下Hadoop2.7.1集群环境的搭建(3台为例)

3.4.1 修改core-site.xml

vim core-site.xml

 

<configuration>
 <property>
   <name>hadoop.tmp.dir</name>
   <value>file:/data/hdfs/tmp</value>
   <description>A base for other temporary directories.</description>
 </property>
 <property>
   <name>io.file.buffer.size</name>
   <value>131072</value>
 </property>
 <property>
   <name>fs.default.name</name>
   <value>hdfs://redHat1:9000</value>
 </property>
 <property>
   <name>hadoop.proxyuser.root.hosts</name>
   <value>*</value>
 </property>
 <property>
   <name>hadoop.proxyuser.root.groups</name>
   <value>*</value>
 </property>
</configuration>

注意:hadoop.tmp.dir的value填写对应前面创建的目录

 

3.4.2 修改vim hdfs-site.xml

vim hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 <!--
   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at
 
     http://www.apache.org/licenses/LICENSE-2.0
 
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License. See accompanying LICENSE file.
 -->
 
 <!-- Put site-specific property overrides in this file. -->
 
<configuration>
 <property>
   <name>dfs.replication</name>
   <value>2</value>
 </property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/data/hdfs/name</value>
   <final>true</final>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/data/hdfs/data</value>
   <final>true</final>
 </property>
 <property>
   <name>dfs.namenode.secondary.http-address</name>
   <value>redHat1:9001</value>
 </property>
 <property>
   <name>dfs.webhdfs.enabled</name>
   <value>true</value>
 </property>
 <property>
   <name>dfs.permissions</name>
   <value>false</value>
 </property>
</configuration>

注意:dfs.namenode.name.dir和dfs.datanode.data.dir的value填写对应前面创建的目录

 

3.4.3 修改vim mapred-site.xml

复制template,生成xml,命令如下:

cp mapred-site.xml.template mapred-site.xml

vim  mapred-site.xml

 

   

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
 <property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
 </property>
</configuration>
 

3.4.4 修改vim yarn-site.xml

vim  yarn-site.xml

 

<?xml version="1.0"?>
 <!--
   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at
 
     http://www.apache.org/licenses/LICENSE-2.0
 
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License. See accompanying LICENSE file.
 -->
<configuration>

<!-- Site specific YARN configuration properties -->
 <property>
   <name>yarn.resourcemanager.address</name>
   <value>redHat1:18040</value>
 </property>
 <property>
   <name>yarn.resourcemanager.scheduler.address</name>
   <value>redHat1:18030</value>
 </property>
 <property>
   <name>yarn.resourcemanager.webapp.address</name>
   <value>redHat1:18088</value>
 </property>
 <property>
   <name>yarn.resourcemanager.resource-tracker.address</name>
   <value>redHat1:18025</value>
 </property>
 <property>
   <name>yarn.resourcemanager.admin.address</name>
   <value>redHat1:18141</value>
 </property>
 <property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce.shuffle</value>
 </property>
 <property>
   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
 </property>
</configuration>

3.4.5 修改data/hadoop-2.7.1/etc/hadoop/redHat1

将原来的localhost删除,改成如下内容

vim /data/hadoop-2.7.1/etc/hadoop/slaves

Linux下Hadoop2.7.1集群环境的搭建(3台为例)

最后,将整个hadoop-2.7.1文件夹及其子文件夹使用scp复制到redHat2和redHat3的相同目录中:

scp -r /data/hadoop-2.7.1 redHat2:/data

scp -r /data/hadoop-2.7.1 redHat3:/data

四、运行Hadoop

首先要格式化:

hadoop namenode -format

 

sh ./start-all.sh

Linux下Hadoop2.7.1集群环境的搭建(3台为例)

查看集群状态:

/data/hadoop-2.7.1/bin/hdfs dfsadmin -report

Linux下Hadoop2.7.1集群环境的搭建(3台为例)

Linux下Hadoop2.7.1集群环境的搭建(3台为例)

测试yarn:

http://192.168.92.140:18088/cluster/cluster

Linux下Hadoop2.7.1集群环境的搭建(3台为例)

测试查看HDFS:

http://192.168.92.140:50070/dfshealth.html#tab-overview

Linux下Hadoop2.7.1集群环境的搭建(3台为例)

Linux下Hadoop2.7.1集群环境的搭建(3台为例)

 

重点::配置运行Hadoop中遇见的问题

1 JAVA_HOME未设置?

启动的时候报:

Linux下Hadoop2.7.1集群环境的搭建(3台为例)

则需要/data/hadoop-2.7.1/etc/hadoop/hadoop-env.sh,添加JAVA_HOME路径

Linux下Hadoop2.7.1集群环境的搭建(3台为例)

要将路径写为绝对路径,不要用出事自动获取那种。

2. FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-336454126-127.0.0.1-1419216478581 (storage id DS-445205871-127.0.0.1-50010-1419216613930) service to /192.168.149.128:9000
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0, storageID=DS-445205871-127.0.0.1-50010-1419216613930, infoPort=50075, ipcPort=50020, storageInfo=lv=-47;cid=CID-41993190-ade1-486c-8fe1-395c1d6f5739;nsid=1679060915;c=0)

 

原因:
由于本地dfs.data.dir目录下的数据文件和namenode已知的不一致,导致datanode节点不被namenode接受。

解决:

1,删除dfs.namenode.name.dir和dfs.datanode.data.dir 目录下的所有文件

2,修改hosts

 cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.149.128 localhost

3,重新格式化:bin/hadoop namenode -format

4,启动

重新启动