Hadoop2.6.0伪分布式集群搭建

应用场景

在研究hadoop的过程中,当然需要部署hadoop集群,如果想要在本地简单试用hadoop,并且没有那么多服务器供你使用,那么伪分布式hadoop环境绝对是你最好的选择。

操作步骤

1. 安装JDK

1.1 查看是否安装了openjdk

 # java -version

openjdk version "1.8.0_65"
OpenJDK Runtime Environment (build 1.8.0_65-b17)
OpenJDK 64-Bit Server VM (build 25.65-b01, mixed mode)

1.2 查看openjdk源

 # rpm -qa | grep java

java-1.7.0-openjdk-1.7.0.91-2.6.2.3.el7.x86_64
tzdata-java-2015g-1.el7.noarch
python-javapackages-3.4.1-11.el7.noarch
javapackages-tools-3.4.1-11.el7.noarch
java-1.8.0-openjdk-headless-1.8.0.65-3.b17.el7.x86_64
java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64
java-1.7.0-openjdk-headless-1.7.0.91-2.6.2.3.el7.x86_64

1.3 依次删除openjdk

 # rpm -e --nodeps java-1.7.0-openjdk-1.7.0.91-2.6.2.3.el7.x86_64 
 # rpm -e --nodeps tzdata-java-2015g-1.el7.noarch 
 # rpm -e --nodeps java-1.8.0-openjdk-headless-1.8.0.65-3.b17.el7.x86_64 
 # rpm -e --nodeps java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64 
 # rpm -e --nodeps java-1.7.0-openjdk-headless-1.7.0.91-2.6.2.3.el7.x86_64

1.4 重新下载jdk

将下载JDK的后缀为.tar.gz,上传到linux上,解压缩至/opt路径下
jdk下载地址

1.5 配置JDK环境变量

# vim /etc/profile

JAVA_HOME=/opt/jdk1.7.0_79
JRE_HOME=/opt/jdk1.7.0_79/jre
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib:$CLASSPATH
PATH=$JAVA_HOME/bin:$PATH

1.6 使变量生效

 # source /etc/profile

2. SSH免秘钥登录

2.1 正常登陆,节点跳转ssh,需要输入用户名密码,每次都需要输入,很麻烦,需要设置成免密码登录

 # ssh localhost

The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is 7b:10:e3:b5:ea:7d:29:be:77:83:1c:c0:1d:85:de:ba.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
[[email protected]'s password:
Last login: Sat Apr  2 22:32:44 2016

2.2 配置免密码登录

 # cd ~/.ssh/    #若没有该目录,请先执行一次
 # ssh localhost 
 # ssh-****** -t rsa     #会有提示,都按回车就可以 
 # cat id_rsa.pub >> authorized_keys 
 # chmod 600 ./authorized_keys  # 加入授权

2.3 再次登录,可免秘钥

 # ssh localhost

Last login: Sat Apr  2 22:51:41 2016 from localhost

3. 安装Hadoop

3.1 解压Hadoop至/opt路径下

下载hadoop2.6.0
下载hadoop其他版本

3.2 配置Hadoop环境变量

 # vim /etc/profile

export JAVA_HOME=/opt/jdk1.7.0_79
export HADOOP_HOME=/opt/hadoop-2.6.0
export HADOOP_PREFIX=/opt/hadoop-2.6.0
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

3.3 使变量生效

 # source /etc/profile

3.4 修改hadoop-env.sh

 # cd /opt/hadoop-2.6.0   # 进入hadoop目录,修改hadoop-env.sh,添加JAVA_HOME路径
 # vim etc/hadoop/hadoop-env.sh

export JAVA_HOME=/opt/jdk1.7.0_79

 # bin/hadoop  # 执行hadoop指令,测试

3.5 配置HDFS

3.5.1 编辑core-site.xml

 # vim /opt/hadoop-2.6.0/etc/hadoop/core-site.xml

<configuration>
 <property>
 <name>hadoop.tmp.dir</name>
 <value>file:/opt/hadoop-2.6.0/tmp</value>
 <description>Abase for other temporary directories.</description>
 </property>
 <property>
 <name>fs.defaultFS</name>
 <value>hdfs://192.168.208.110:9000</value>
 </property>
</configuration>

3.5.2 编辑hdfs-site.xml

 # vim /opt/hadoop-2.6.0/etc/hadoop/hdfs-site.xml

<configuration>
 <property>
 <name>dfs.replication</name>
 <value>1</value>
 </property>
 <property>
 <name>dfs.namenode.name.dir</name>
 <value>file:/opt/hadoop-2.6.0/tmp/dfs/name</value>
 </property>
 <property>
 <name>dfs.datanode.data.dir</name>
 <value>file:/opt/hadoop-2.6.0/tmp/dfs/data</value>
 </property>
 <property>
 <name>dfs.permissions.enabled</name>
 <value>false</value>
 </property>
</configuration>

3.5.3 格式化

[[email protected] hadoop-2.6.0]# hdfs namenode -format

省略N行
16/04/02 22:54:15 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.

/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at bogon/221.192.153.42
************************************************************/

3.5.4 开启HDFS

 # start-dfs.sh

访问http://localhost:50070

Hadoop2.6.0伪分布式集群搭建

3.5.5 HDFS简单使用案例

 # hdfs dfs -mkdir /user
 # hdfs dfs -mkdir /user/lei
 # hdfs dfs -put etc/hadoop input   # 如果出现没有input错误

     put: `input': No such file or directory

 # bin/hadoop fs -mkdir -p input     # 手动创建
 # hdfs dfs -put etc/hadoop input
 # hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar grep input output 'dfs[a-z.]+'


 # hdfs dfs -ls /      #查看文件

DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Found 2 items
drwxr-xr-x   - root supergroup          0 2016-04-02 23:39 input
drwxr-xr-x   - root supergroup          0 2016-04-02 23:43 output

3.6 YARN配置

3.6.1 配置mapred-site.xml

 # cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
 # vim /opt/hadoop-2.6.0/etc/hadoop/mapred-site.xml

<configuration> 
      <property>  
           <name>mapreduce.framework.name</name>  
           <value>yarn</value>  
      </property>  
      <property>  
           <name>mapred.job.tracker</name>  
           <value>192.168.208.110:10020</value>  
      </property> 
</configuration>

3.6.2 配置yarn-site.xml

 # vim /opt/hadoop-2.6.0/etc/hadoop/yarn-site.xml

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

3.6.3 启动YARN

 # start-yarn.sh

访问http://localhost:8088

Hadoop2.6.0伪分布式集群搭建