二十六、伪分布式(Pseudo-Distributed Operation、单节点)之启动YARN并运行MapReduce 程序
伪分布式(Pseudo-Distributed Operation、单节点)之启动YARN并运行MapReduce 程序
文章步骤:
(1)准备1台客户机
(2)安装jdk
(3)配置jdk环境变量
(4)安装hadoop
(5)配置hadoop环境变量
(6)配置hadoop集群yarn上运行
(7)启动、测试集群增、删、查
(8)在yarn上执行wordcount案例
1、准备一台客户机
略
2、安装jdk
jdk安装包解压的过程
3、配置jdk环境变量
编辑/etc/profile文件,加入:
##JAVA_HOME
export JAVA_HOME=/opt/module/jdk8
export PATH=$PATH:$JAVA_HOME/bin
export PATH=$PATH:$JAVA_HOME/sbin
4、安装hadoop
hadoop安装包解压的过程,不会的看笔者上一篇文章。
路径:/opt/module
[[email protected] module]$ tar -zxvf hadoop-2.7.2.tar.gz
5、配置hadoop环境变量
[[email protected] hadoop-2.7.2]$ vim /etc/profile
在profile文件末尾添加:
##HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
让配置文件生效:
[[email protected] hadoop-2.7.2]$ source /etc/profile
验证hadoop是否安装成功:
[[email protected] hadoop-2.7.2]$ hadoop
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
CLASSNAME run the class named CLASSNAME
or
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
note: please use "yarn jar" to launch
YARN applications, not this command.
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
credential interact with credential providers
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
trace view and modify Hadoop tracing settings
Most commands print help when invoked w/o parameters.
6、配置hadoop集群
6.1、配置 hadoop-env. sh
修改hadoop-env. sh 中JAVA_HOME变量值
[[email protected] hadoop-2.7.2]$ echo $JAVA_HOME
/opt/module/jdk8
[[email protected] hadoop-2.7.2]$ vim etc/hadoop/hadoop-env.sh
修改值:
export JAVA_HOME=/opt/module/jdk8
6.2、配置core-site.xml
配置core-site.xml文件,声明namenode节点的地址;及hadoop运行是产生文件的目录。
[[email protected] hadoop-2.7.2]$ vim etc/hadoop/core-site.xml
在< configuration>< /configuration>标签下添加如下:
<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop12:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储目录:datanode目录,没有配置的话就是在/tmp目录下,当你格式化namenode时要记得删除 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-2.7.2/data/tmp</value>
</property>
6.3、配置hdfs-site.xml
[[email protected] hadoop-2.7.2]$ vim etc/hadoop/hdfs-site.xml
在< configuration>< /configuration>标签下添加如下:
<!-- 指定HDFS副本的数量 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
7、配置hadoop集群yarn上运行
7.1、配置yarn-env.sh
修改hadoop-env. sh 中JAVA_HOME变量值
[[email protected] hadoop-2.7.2]$ echo $JAVA_HOME
/opt/module/jdk8
[[email protected] hadoop-2.7.2]$ vim etc/hadoop/yarn-env.sh
新增值:
export JAVA_HOME=/opt/module/jdk8
7.2、配置yarn-site.xml
配置yarn-site.xml声明reducer获取数据的方式、指定YARN的ResourceManager的地址:
[[email protected] hadoop-2.7.2]$ vim etc/hadoop/yarn-site.xml
在< configuration>< /configuration>标签下添加如下:
<!-- reducer获取数据的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop13</value>
</property>
7.3、配置mapred-env.sh
修改hadoop-env. sh 中JAVA_HOME变量值
[[email protected] hadoop-2.7.2]$ echo $JAVA_HOME
/opt/module/jdk8
[[email protected] hadoop-2.7.2]$ vim etc/hadoop/mapred-env.sh
新增值:
export JAVA_HOME=/opt/module/jdk8
7.4、配置(对mapred-site.xml.template重新命名为) mapred-site.xml
[[email protected] hadoop]$ cd /opt/module/hadoop-2.7.2/etc/hadoop
[[email protected] hadoop]$ mv mapred-site.xml.template mapred-site.xml
[[email protected] hadoop]$ vim mapred-site.xml
在< configuration>< /configuration>标签下添加如下:
<!-- 指定mr运行在yarn上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
8、启动集群
(第一次启动时格式化,以后就不要总格式化)
[[email protected] hadoop-2.7.2]$ bin/hdfs namenode -format
(a)启动namenode
[[email protected] hadoop-2.7.2]$ sbin/hadoop-daemon.sh start namenode
(b)启动datanode
[[email protected] hadoop-2.7.2]$ sbin/hadoop-daemon.sh start datanode
(c)启动resourcemanager
[[email protected] hadoop-2.7.2]$ sbin/yarn-daemon.sh start resourcemanager
(d)启动nodemanager
[[email protected] hadoop-2.7.2]$ sbin/yarn-daemon.sh start nodemanager
9、集群操作
8.1、yarn的浏览器页面查看
http://192.168.1.113:8088/cluster
8.2、在hdfs文件系统上创建一个input文件夹
[[email protected] hadoop-2.7.2]$ hadoop fs -mkdir -p /user/admin/mapreduce/wordcount/input
8.4、将测试文件内容上传到文件系统上
[[email protected] hadoop-2.7.2]$ vim wc.input
[[email protected] hadoop-2.7.2]$ hadoop fs -put wc.input /user/admin/mapreduce/wordcount/input/
8.5、查看上传的文件是否正确
[[email protected] hadoop-2.7.2]$ hadoop fs -ls /user/admin/mapreduce/wordcount/input/
[[email protected] hadoop-2.7.2]$ hadoop fs -cat /user/admin/mapreduce/wordcount/input/wc.input
8.6、执行mapreduce程序
[[email protected] hadoop-2.7.2]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /user/admin/mapreduce/wordcount/input /user/admin/mapreduce/wordcount/output
8.7、查看输出结果
命令行查看:
[[email protected] hadoop-2.7.2]$ hadoop fs -cat /user/admin/mapreduce/wordcount/output/*
浏览器查看:
http://192.168.1.113:50070/explorer.html#/user/admin/mapreduce/wordcount/output
8.8、停止程序
[[email protected] hadoop-2.7.2]$ sbin/yarn-daemon.sh stop nodemanager
[[email protected] hadoop-2.7.2]$ sbin/yarn-daemon.sh stop resourcemanager
[[email protected] hadoop-2.7.2]$ sbin/hadoop-daemon.sh stop datanode
[[email protected] hadoop-2.7.2]$ sbin/hadoop-daemon.sh stop namenode