windows配置hadoop 2.x Hadoop-eclipse-plugin插件远程操作hadoop

我的hadoop集群是在centos系统上运行的,之前采用的是在windows本地通过Eclipse开发完MapReduce之后打成jar包再发送到centos集群去运行测试,效率不是很高。因此现在考虑直接在windows本地通过hadoop-eclipse插件的方式进行测试。

一、环境搭建

首先是复制集群上已经部署好的hadoop安装目录到windows本地,具体放在哪里自己决定。之后配置环境变量HADOOP_HOME、HADOOP_USER_NAME、path变量值等。之后就可以配置插件了。

网上很多文章介绍如何编译插件,但尝试了一整天也没有编译成功。最后直接使用git上提供的已经编译好的jar包,发现可以使用。

打开这个地址https://github.com/winghc/hadoop2x-eclipse-plugin,可以发现有2.2.0、2.4.0和2.6.0三个版本的插件,我的hadoop版本是2.7.5的,就近原则选择下载2.6.0版本。。。。

下载之后直接放到Eclipse安装目录下的plugins目录下即可,然后重启eclipse。

启动后,可能会自动在Project Explorer中显示DFS Locations目录,也可能不自动显示。不管显示不显示,都执行以下三个操作:

①window->perspective->open perspective->other->map/reduce

②window->preferences->hadoop存放目录

③window->show view->MapReduce tools->map/reduce locations

之后便可以看到一个黄色的小象(map/reduce locations),然后就可以右键单击进行new/edit hadoop location

windows配置hadoop 2.x Hadoop-eclipse-plugin插件远程操作hadoop

 

配置完之后刷新DFS Locations即可以看到集群上hdfs中存储的目录结构:

windows配置hadoop 2.x Hadoop-eclipse-plugin插件远程操作hadoop

之后就可以直接在本地开发MapReduce并操作hdfs了。

二、WordCount示例

不过,在开发MapReduce之前需要下载两个可执行文件hadoop.dll和winutils.exe到hadoop/bin目录下,没有这两个文件的话,windows下将无法操作hadoop。

下面以Wordcount程序作为远程操作的演示示例:

  1. 新建MR项目 :File->new->other->Map/Reduce Project
  2. 创建主类:src下创建Package,Package下创建WordCount.java类
  3. 创建log4j.properties文件:在src下创建log4j.properties文件,不然运行程序时候会报错
  4. 配置运行参数

2、WordCount.java代码

package remote.wordcount.test;

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
 
public class WordCount {
 
    public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();
 
        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }
 
    public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();
 
        public void reduce(Text key, Iterable<IntWritable> values, Context context)
                throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }
 
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if (otherArgs.length != 2) {
            System.err.println("Usage: wordcount <in> <out>");
            System.exit(2);
        }
        @SuppressWarnings("deprecation")
		Job job = new Job(conf, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

3、 log4j.properties文件

# Configure logging for testing:optionally with log file 
#log4j.rootLogger=debug,appender 
log4j.rootLogger=info,appender 
#log4j.rootLogger=error,appender 
#\u8F93\u51FA\u5230\u63A7\u5236\u53F0 
log4j.appender.appender=org.apache.log4j.ConsoleAppender 
#\u6837\u5F0F\u4E3ATTCCLayout 
log4j.appender.appender.layout=org.apache.log4j.TTCCLayout

4、配置参数并运行

右键项目,依次Run as ->Run Configurations->Java Application。选Java Application后点击左上角的New launch application,配置Main标签参数。填写Name(任起),Search(找WordCount),apply、run。

windows配置hadoop 2.x Hadoop-eclipse-plugin插件远程操作hadoop

 windows配置hadoop 2.x Hadoop-eclipse-plugin插件远程操作hadoop

运行后可以在控制台看到如下输出:

[main] INFO org.apache.hadoop.conf.Configuration.deprecation - session.id is deprecated. Instead, use dfs.metrics.session-id
[main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
[main] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
[main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
[main] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
[main] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_local1992754859_0001
[main] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://localhost:8080/
[main] INFO org.apache.hadoop.mapreduce.Job - Running job: job_local1992754859_0001
[Thread-4] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter set in config null
[Thread-4] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1
[Thread-4] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
[Thread-4] INFO org.apache.hadoop.mapred.LocalJobRunner - Waiting for map tasks
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local1992754859_0001_m_000000_0
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.yarn.util.ProcfsBasedProcessTree - ProcfsBasedProcessTree currently is supported only on Linux.
[main] INFO org.apache.hadoop.mapreduce.Job - Job job_local1992754859_0001 running in uber mode : false
[main] INFO org.apache.hadoop.mapreduce.Job -  map 0% reduce 0%
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task -  Using ResourceCalculatorProcessTree : [email protected]
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Processing split: hdfs://192.168.89.128:9000/word.txt:0+32
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - (EQUATOR) 0 kvi 26214396(104857584)
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - mapreduce.task.io.sort.mb: 100
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - soft limit at 83886080
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - bufstart = 0; bufvoid = 104857600
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - kvstart = 26214396; length = 6553600
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - 
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Spilling map output
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - bufstart = 0; bufend = 92; bufvoid = 104857600
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - kvstart = 26214396(104857584); kvend = 26214340(104857360); length = 57/6553600
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Finished spill 0
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Task:attempt_local1992754859_0001_m_000000_0 is done. And is in the process of committing
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - map
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local1992754859_0001_m_000000_0' done.
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Final Counters for attempt_local1992754859_0001_m_000000_0: Counters: 23
	File System Counters
		FILE: Number of bytes read=158
		FILE: Number of bytes written=290259
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=32
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=5
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=1
	Map-Reduce Framework
		Map input records=3
		Map output records=15
		Map output bytes=92
		Map output materialized bytes=55
		Input split bytes=100
		Combine input records=15
		Combine output records=6
		Spilled Records=6
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=0
		Total committed heap usage (bytes)=268435456
	File Input Format Counters 
		Bytes Read=32
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - Finishing task: attempt_local1992754859_0001_m_000000_0
[Thread-4] INFO org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
[Thread-4] INFO org.apache.hadoop.mapred.LocalJobRunner - Waiting for reduce tasks
[pool-6-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local1992754859_0001_r_000000_0
[pool-6-thread-1] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1
[pool-6-thread-1] INFO org.apache.hadoop.yarn.util.ProcfsBasedProcessTree - ProcfsBasedProcessTree currently is supported only on Linux.
[pool-6-thread-1] INFO org.apache.hadoop.mapred.Task -  Using ResourceCalculatorProcessTree : [email protected]
[pool-6-thread-1] INFO org.apache.hadoop.mapred.ReduceTask - Using ShuffleConsumerPlugin: [email protected]
[pool-6-thread-1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - MergerManager: memoryLimit=334338464, maxSingleShuffleLimit=83584616, mergeThreshold=220663392, ioSortFactor=10, memToMemMergeOutputsThreshold=10
[EventFetcher for fetching Map Completion Events] INFO org.apache.hadoop.mapreduce.task.reduce.EventFetcher - attempt_local1992754859_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
[localfetcher#1] INFO org.apache.hadoop.mapreduce.task.reduce.LocalFetcher - localfetcher#1 about to shuffle output of map attempt_local1992754859_0001_m_000000_0 decomp: 51 len: 55 to MEMORY
[localfetcher#1] INFO org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput - Read 51 bytes from map-output for attempt_local1992754859_0001_m_000000_0
[localfetcher#1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - closeInMemoryFile -> map-output of size: 51, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->51
[EventFetcher for fetching Map Completion Events] INFO org.apache.hadoop.mapreduce.task.reduce.EventFetcher - EventFetcher is interrupted.. Returning
[pool-6-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - 1 / 1 copied.
[pool-6-thread-1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
[pool-6-thread-1] INFO org.apache.hadoop.mapred.Merger - Merging 1 sorted segments
[pool-6-thread-1] INFO org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 1 segments left of total size: 47 bytes
[pool-6-thread-1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - Merged 1 segments, 51 bytes to disk to satisfy reduce memory limit
[pool-6-thread-1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - Merging 1 files, 55 bytes from disk
[pool-6-thread-1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - Merging 0 segments, 0 bytes from memory into reduce
[pool-6-thread-1] INFO org.apache.hadoop.mapred.Merger - Merging 1 sorted segments
[pool-6-thread-1] INFO org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 1 segments left of total size: 47 bytes
[pool-6-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - 1 / 1 copied.
[main] INFO org.apache.hadoop.mapreduce.Job -  map 100% reduce 0%
[pool-6-thread-1] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
[pool-6-thread-1] INFO org.apache.hadoop.mapred.Task - Task:attempt_local1992754859_0001_r_000000_0 is done. And is in the process of committing
[pool-6-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - 1 / 1 copied.
[pool-6-thread-1] INFO org.apache.hadoop.mapred.Task - Task attempt_local1992754859_0001_r_000000_0 is allowed to commit now
[pool-6-thread-1] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local1992754859_0001_r_000000_0' to hdfs://192.168.89.128:9000/output1/_temporary/0/task_local1992754859_0001_r_000000
[pool-6-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
[pool-6-thread-1] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local1992754859_0001_r_000000_0' done.
[pool-6-thread-1] INFO org.apache.hadoop.mapred.Task - Final Counters for attempt_local1992754859_0001_r_000000_0: Counters: 29
	File System Counters
		FILE: Number of bytes read=300
		FILE: Number of bytes written=290314
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=32
		HDFS: Number of bytes written=25
		HDFS: Number of read operations=8
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=3
	Map-Reduce Framework
		Combine input records=0
		Combine output records=0
		Reduce input groups=6
		Reduce shuffle bytes=55
		Reduce input records=6
		Reduce output records=6
		Spilled Records=6
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=0
		Total committed heap usage (bytes)=268435456
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Output Format Counters 
		Bytes Written=25
[pool-6-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - Finishing task: attempt_local1992754859_0001_r_000000_0
[Thread-4] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce task executor complete.
[main] INFO org.apache.hadoop.mapreduce.Job -  map 100% reduce 100%
[main] INFO org.apache.hadoop.mapreduce.Job - Job job_local1992754859_0001 completed successfully
[main] INFO org.apache.hadoop.mapreduce.Job - Counters: 35
	File System Counters
		FILE: Number of bytes read=458
		FILE: Number of bytes written=580573
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=64
		HDFS: Number of bytes written=25
		HDFS: Number of read operations=13
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=4
	Map-Reduce Framework
		Map input records=3
		Map output records=15
		Map output bytes=92
		Map output materialized bytes=55
		Input split bytes=100
		Combine input records=15
		Combine output records=6
		Reduce input groups=6
		Reduce shuffle bytes=55
		Reduce input records=6
		Reduce output records=6
		Spilled Records=12
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=0
		Total committed heap usage (bytes)=536870912
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=32
	File Output Format Counters 
		Bytes Written=25
Picked up _JAVA_OPTIONS: -Xmx512M

然后通过右键单击DFS Locations中的根目录进行刷新就可以看到生成的目录和文件:

windows配置hadoop 2.x Hadoop-eclipse-plugin插件远程操作hadoop
 

参考博文:

https://blog.****.net/u010185220/article/details/79095179

https://blog.****.net/ASN_forever/article/details/81066282

https://www.cnblogs.com/qingyunzong/p/8528134.html(大佬)