2-10 就业课(2.0)-oozie:6、通过oozie执行mr任务,以及执行sqoop任务的解决思路

  执行sqoop任务的解决思路(目前的问题是sqoop只安装在node03上,而oozie会随机分配一个节点来执行任务):2-10 就业课(2.0)-oozie:6、通过oozie执行mr任务,以及执行sqoop任务的解决思路

=====================================================

 

4.3、使用oozie调度MR任务

第一步:准备MR执行的数据

我们这里通过oozie调度一个MR的程序的执行,MR的程序可以是自己写的,也可以是hadoop工程自带的,我们这里就选用hadoop工程自带的MR程序来运行wordcount的示例

准备以下数据上传到HDFS的/oozie/input路径下去

hdfs dfs -mkdir -p /oozie/input

vim wordcount.txt

hello   world   hadoop

spark   hive    hadoop

将我们的数据上传到hdfs对应目录

hdfs dfs -put wordcount.txt /oozie/input

 

第二步:执行官方测试案例

yarn jar /export/servers/hadoop-2.6.0-cdh5.14.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.14.0.jar wordcount /oozie/input/ /oozie/output

第三步:准备我们调度的资源

将我们需要调度的资源都准备好放到一个文件夹下面去,包括我们的jar包,我们的job.properties,以及我们的workflow.xml。

拷贝MR的任务模板

cd /export/servers/oozie-4.1.0-cdh5.14.0

cp -ra examples/apps/map-reduce/ oozie_works/

 

删掉MR任务模板lib目录下自带的jar包

cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce/lib

rm -rf oozie-examples-4.1.0-cdh5.14.0.jar

 

第三步:拷贝我们自己的jar包到对应目录

从上一步的删除当中,我们可以看到我们需要调度的jar包存放在了

/export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce/lib这个目录下,所以我们把我们需要调度的jar包也放到这个路径下即可

cp /export/servers/hadoop-2.6.0-cdh5.14.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.14.0.jar /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce/lib/

 

第四步:修改配置文件

修改job.properties

cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce

vim job.properties

nameNode=hdfs://node01:8020

jobTracker=node01:8032

queueName=default

examplesRoot=oozie_works

 

oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/map-reduce/workflow.xml

          2-10 就业课(2.0)-oozie:6、通过oozie执行mr任务,以及执行sqoop任务的解决思路

outputDir=/oozie/output

inputdir=/oozie/input

修改workflow.xml

cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce

vim workflow.xml

<?xml version="1.0" encoding="UTF-8"?>

<!--

  Licensed to the Apache Software Foundation (ASF) under one

  or more contributor license agreements.  See the NOTICE file

  distributed with this work for additional information

  regarding copyright ownership.  The ASF licenses this file

  to you under the Apache License, Version 2.0 (the

  "License"); you may not use this file except in compliance

  with the License.  You may obtain a copy of the License at

 

       http://www.apache.org/licenses/LICENSE-2.0

 

  Unless required by applicable law or agreed to in writing, software

  distributed under the License is distributed on an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the License for the specific language governing permissions and

  limitations under the License.

-->

<workflow-app xmlns="uri:oozie:workflow:0.5" name="map-reduce-wf">

    <start to="mr-node"/>

    <action name="mr-node">

        <map-reduce>

            <job-tracker>${jobTracker}</job-tracker>

            <name-node>${nameNode}</name-node>

            <prepare>

                <delete path="${nameNode}/${outputDir}"/>

      2-10 就业课(2.0)-oozie:6、通过oozie执行mr任务,以及执行sqoop任务的解决思路

            </prepare>

            <configuration>

                <property>

                    <name>mapred.job.queue.name</name>

                    <value>${queueName}</value>

                </property>

                <!-- 

                <property>

                    <name>mapred.mapper.class</name>

                    <value>org.apache.oozie.example.SampleMapper</value>

                </property>

                <property>

                    <name>mapred.reducer.class</name>

                    <value>org.apache.oozie.example.SampleReducer</value>

                </property>

                <property>

                    <name>mapred.map.tasks</name>

                    <value>1</value>

                </property>

                <property>

                    <name>mapred.input.dir</name>

                    <value>/user/${wf:user()}/${examplesRoot}/input-data/text</value>

        2-10 就业课(2.0)-oozie:6、通过oozie执行mr任务,以及执行sqoop任务的解决思路

                </property>

                <property>

                    <name>mapred.output.dir</name>

                    <value>/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}</value>

      2-10 就业课(2.0)-oozie:6、通过oozie执行mr任务,以及执行sqoop任务的解决思路

                </property>

                -->

               

                   <!-- 开启使用新的API来进行配置 -->

                <property>

                    <name>mapred.mapper.new-api</name>

                    <value>true</value>

                </property>

 

                <property>

                    <name>mapred.reducer.new-api</name>

                    <value>true</value>

                </property>

 

                <!-- 指定MR的输出key的类型 -->

                <property>

                    <name>mapreduce.job.output.key.class</name>

                    <value>org.apache.hadoop.io.Text</value>

                </property>

 

                <!-- 指定MR的输出的value的类型-->

                <property>

                    <name>mapreduce.job.output.value.class</name>

                    <value>org.apache.hadoop.io.IntWritable</value>

                </property>

 

                <!-- 指定输入路径 -->

                <property>

                    <name>mapred.input.dir</name>

                    <value>${nameNode}/${inputDir}</value>

      2-10 就业课(2.0)-oozie:6、通过oozie执行mr任务,以及执行sqoop任务的解决思路

                </property>

 

                <!-- 指定输出路径 -->

                <property>

                    <name>mapred.output.dir</name>

                    <value>${nameNode}/${outputDir}</value>

      2-10 就业课(2.0)-oozie:6、通过oozie执行mr任务,以及执行sqoop任务的解决思路

                </property>

 

                <!-- 指定执行的map类 -->

                <property>

                    <name>mapreduce.job.map.class</name>

                    <value>org.apache.hadoop.examples.WordCount$TokenizerMapper</value>

                </property>

 

                <!-- 指定执行的reduce类 -->

                <property>

                    <name>mapreduce.job.reduce.class</name>

                    <value>org.apache.hadoop.examples.WordCount$IntSumReducer</value>

                </property>

                <!--  配置map task的个数 -->

                <property>

                    <name>mapred.map.tasks</name>

                    <value>1</value>

                </property>

 

            </configuration>

        </map-reduce>

        <ok to="end"/>

        <error to="fail"/>

    </action>

    <kill name="fail">

        <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>

    </kill>

    <end name="end"/>

</workflow-app>

第五步:上传调度任务到hdfs对应目录

cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works

hdfs dfs -put map-reduce/ /user/root/oozie_works/

 

第六步:执行调度任务

执行我们的调度任务,然后通过oozie的11000端口进行查看任务结果

cd /export/servers/oozie-4.1.0-cdh5.14.0

bin/oozie job -oozie http://node03:11000/oozie -config oozie_works/map-reduce/job.properties -run