二十六、伪分布式(Pseudo-Distributed Operation、单节点)之启动YARN并运行MapReduce 程序

      伪分布式(Pseudo-Distributed Operation、单节点)之启动YARN并运行MapReduce 程序

文章步骤:

        (1)准备1台客户机
	(2)安装jdk
	(3)配置jdk环境变量
	(4)安装hadoop
	(5)配置hadoop环境变量
	(6)配置hadoop集群yarn上运行
	(7)启动、测试集群增、删、查
	(8)在yarn上执行wordcount案例

1、准备一台客户机

2、安装jdk

jdk安装包解压的过程

3、配置jdk环境变量

编辑/etc/profile文件,加入:

##JAVA_HOME
export JAVA_HOME=/opt/module/jdk8
export PATH=$PATH:$JAVA_HOME/bin
export PATH=$PATH:$JAVA_HOME/sbin

4、安装hadoop

hadoop安装包解压的过程,不会的看笔者上一篇文章。

路径:/opt/module

[[email protected] module]$ tar -zxvf hadoop-2.7.2.tar.gz

5、配置hadoop环境变量

[[email protected] hadoop-2.7.2]$ vim  /etc/profile

在profile文件末尾添加:

##HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin		

让配置文件生效:

[[email protected] hadoop-2.7.2]$ source /etc/profile

验证hadoop是否安装成功:

[[email protected] hadoop-2.7.2]$ hadoop
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
  CLASSNAME            run the class named CLASSNAME
 or
  where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar <jar>            run a jar file
                       note: please use "yarn jar" to launch
                             YARN applications, not this command.
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath            prints the class path needed to get the
  credential           interact with credential providers
                       Hadoop jar and the required libraries
  daemonlog            get/set the log level for each daemon
  trace                view and modify Hadoop tracing settings

Most commands print help when invoked w/o parameters.

6、配置hadoop集群

6.1、配置 hadoop-env. sh

修改hadoop-env. sh 中JAVA_HOME变量值

[[email protected] hadoop-2.7.2]$ echo $JAVA_HOME
/opt/module/jdk8
[[email protected] hadoop-2.7.2]$ vim etc/hadoop/hadoop-env.sh 

修改值:

export JAVA_HOME=/opt/module/jdk8

6.2、配置core-site.xml

配置core-site.xml文件,声明namenode节点的地址;及hadoop运行是产生文件的目录。

[[email protected] hadoop-2.7.2]$ vim etc/hadoop/core-site.xml

在< configuration>< /configuration>标签下添加如下:

<!-- 指定HDFS中NameNode的地址 -->
<property>
	<name>fs.defaultFS</name>
    <value>hdfs://hadoop12:9000</value>
</property>

<!-- 指定hadoop运行时产生文件的存储目录:datanode目录,没有配置的话就是在/tmp目录下,当你格式化namenode时要记得删除 -->
<property>
	<name>hadoop.tmp.dir</name>
	<value>/opt/module/hadoop-2.7.2/data/tmp</value>
</property>

6.3、配置hdfs-site.xml

[[email protected] hadoop-2.7.2]$ vim etc/hadoop/hdfs-site.xml

在< configuration>< /configuration>标签下添加如下:

	<!-- 指定HDFS副本的数量 -->
	<property>
		<name>dfs.replication</name>
		<value>1</value>
	</property>

7、配置hadoop集群yarn上运行

7.1、配置yarn-env.sh

修改hadoop-env. sh 中JAVA_HOME变量值

[[email protected] hadoop-2.7.2]$ echo $JAVA_HOME
/opt/module/jdk8
[[email protected] hadoop-2.7.2]$ vim etc/hadoop/yarn-env.sh

新增值:

export JAVA_HOME=/opt/module/jdk8

7.2、配置yarn-site.xml

配置yarn-site.xml声明reducer获取数据的方式、指定YARN的ResourceManager的地址:

[[email protected] hadoop-2.7.2]$ vim etc/hadoop/yarn-site.xml

在< configuration>< /configuration>标签下添加如下:

<!-- reducer获取数据的方式 -->
<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>

<!-- 指定YARN的ResourceManager的地址 -->
<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>hadoop13</value>
</property>

7.3、配置mapred-env.sh

修改hadoop-env. sh 中JAVA_HOME变量值

[[email protected] hadoop-2.7.2]$ echo $JAVA_HOME
/opt/module/jdk8
[[email protected] hadoop-2.7.2]$ vim etc/hadoop/mapred-env.sh

新增值:

export JAVA_HOME=/opt/module/jdk8

7.4、配置(对mapred-site.xml.template重新命名为) mapred-site.xml

[[email protected] hadoop]$  cd /opt/module/hadoop-2.7.2/etc/hadoop
[[email protected] hadoop]$ mv mapred-site.xml.template mapred-site.xml
[[email protected] hadoop]$ vim  mapred-site.xml

在< configuration>< /configuration>标签下添加如下:

<!-- 指定mr运行在yarn上 -->
<property>
	<name>mapreduce.framework.name</name>
	<value>yarn</value>
</property>

8、启动集群

(第一次启动时格式化,以后就不要总格式化)
[[email protected] hadoop-2.7.2]$ bin/hdfs namenode -format

(a)启动namenode
[[email protected] hadoop-2.7.2]$ sbin/hadoop-daemon.sh start namenode

(b)启动datanode
[[email protected] hadoop-2.7.2]$ sbin/hadoop-daemon.sh start datanode

(c)启动resourcemanager
[[email protected] hadoop-2.7.2]$ sbin/yarn-daemon.sh start resourcemanager

(d)启动nodemanager
[[email protected] hadoop-2.7.2]$ sbin/yarn-daemon.sh start nodemanager

9、集群操作

8.1、yarn的浏览器页面查看

             http://192.168.1.113:8088/cluster

二十六、伪分布式(Pseudo-Distributed Operation、单节点)之启动YARN并运行MapReduce 程序

8.2、在hdfs文件系统上创建一个input文件夹

[[email protected] hadoop-2.7.2]$  hadoop fs -mkdir -p /user/admin/mapreduce/wordcount/input

8.4、将测试文件内容上传到文件系统上

[[email protected] hadoop-2.7.2]$ vim wc.input 
[[email protected] hadoop-2.7.2]$ hadoop fs -put wc.input  /user/admin/mapreduce/wordcount/input/

8.5、查看上传的文件是否正确

[[email protected] hadoop-2.7.2]$ hadoop fs -ls /user/admin/mapreduce/wordcount/input/ 
[[email protected] hadoop-2.7.2]$ hadoop fs -cat /user/admin/mapreduce/wordcount/input/wc.input

8.6、执行mapreduce程序

[[email protected] hadoop-2.7.2]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /user/admin/mapreduce/wordcount/input  /user/admin/mapreduce/wordcount/output

8.7、查看输出结果

命令行查看:

[[email protected] hadoop-2.7.2]$ hadoop fs -cat /user/admin/mapreduce/wordcount/output/*

浏览器查看:

http://192.168.1.113:50070/explorer.html#/user/admin/mapreduce/wordcount/output

二十六、伪分布式(Pseudo-Distributed Operation、单节点)之启动YARN并运行MapReduce 程序

二十六、伪分布式(Pseudo-Distributed Operation、单节点)之启动YARN并运行MapReduce 程序

8.8、停止程序

[[email protected] hadoop-2.7.2]$ sbin/yarn-daemon.sh stop nodemanager
[[email protected] hadoop-2.7.2]$ sbin/yarn-daemon.sh stop resourcemanager
[[email protected] hadoop-2.7.2]$ sbin/hadoop-daemon.sh stop datanode
[[email protected] hadoop-2.7.2]$ sbin/hadoop-daemon.sh stop namenode