windows配置hadoop 2.x Hadoop-eclipse-plugin插件远程操作hadoop
我的hadoop集群是在centos系统上运行的,之前采用的是在windows本地通过Eclipse开发完MapReduce之后打成jar包再发送到centos集群去运行测试,效率不是很高。因此现在考虑直接在windows本地通过hadoop-eclipse插件的方式进行测试。
一、环境搭建
首先是复制集群上已经部署好的hadoop安装目录到windows本地,具体放在哪里自己决定。之后配置环境变量HADOOP_HOME、HADOOP_USER_NAME、path变量值等。之后就可以配置插件了。
网上很多文章介绍如何编译插件,但尝试了一整天也没有编译成功。最后直接使用git上提供的已经编译好的jar包,发现可以使用。
打开这个地址https://github.com/winghc/hadoop2x-eclipse-plugin,可以发现有2.2.0、2.4.0和2.6.0三个版本的插件,我的hadoop版本是2.7.5的,就近原则选择下载2.6.0版本。。。。
下载之后直接放到Eclipse安装目录下的plugins目录下即可,然后重启eclipse。
启动后,可能会自动在Project Explorer中显示DFS Locations目录,也可能不自动显示。不管显示不显示,都执行以下三个操作:
①window->perspective->open perspective->other->map/reduce
②window->preferences->hadoop存放目录
③window->show view->MapReduce tools->map/reduce locations
之后便可以看到一个黄色的小象(map/reduce locations),然后就可以右键单击进行new/edit hadoop location
配置完之后刷新DFS Locations即可以看到集群上hdfs中存储的目录结构:
之后就可以直接在本地开发MapReduce并操作hdfs了。
二、WordCount示例
不过,在开发MapReduce之前需要下载两个可执行文件hadoop.dll和winutils.exe到hadoop/bin目录下,没有这两个文件的话,windows下将无法操作hadoop。
下面以Wordcount程序作为远程操作的演示示例:
- 新建MR项目 :File->new->other->Map/Reduce Project
- 创建主类:src下创建Package,Package下创建WordCount.java类
- 创建log4j.properties文件:在src下创建log4j.properties文件,不然运行程序时候会报错
- 配置运行参数
2、WordCount.java代码
package remote.wordcount.test;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
@SuppressWarnings("deprecation")
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
3、 log4j.properties文件
# Configure logging for testing:optionally with log file
#log4j.rootLogger=debug,appender
log4j.rootLogger=info,appender
#log4j.rootLogger=error,appender
#\u8F93\u51FA\u5230\u63A7\u5236\u53F0
log4j.appender.appender=org.apache.log4j.ConsoleAppender
#\u6837\u5F0F\u4E3ATTCCLayout
log4j.appender.appender.layout=org.apache.log4j.TTCCLayout
4、配置参数并运行
右键项目,依次Run as ->Run Configurations->Java Application。选Java Application后点击左上角的New launch application,配置Main标签参数。填写Name(任起),Search(找WordCount),apply、run。
运行后可以在控制台看到如下输出:
[main] INFO org.apache.hadoop.conf.Configuration.deprecation - session.id is deprecated. Instead, use dfs.metrics.session-id
[main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
[main] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
[main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
[main] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
[main] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_local1992754859_0001
[main] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://localhost:8080/
[main] INFO org.apache.hadoop.mapreduce.Job - Running job: job_local1992754859_0001
[Thread-4] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter set in config null
[Thread-4] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1
[Thread-4] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
[Thread-4] INFO org.apache.hadoop.mapred.LocalJobRunner - Waiting for map tasks
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local1992754859_0001_m_000000_0
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.yarn.util.ProcfsBasedProcessTree - ProcfsBasedProcessTree currently is supported only on Linux.
[main] INFO org.apache.hadoop.mapreduce.Job - Job job_local1992754859_0001 running in uber mode : false
[main] INFO org.apache.hadoop.mapreduce.Job - map 0% reduce 0%
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorProcessTree : [email protected]
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Processing split: hdfs://192.168.89.128:9000/word.txt:0+32
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - (EQUATOR) 0 kvi 26214396(104857584)
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - mapreduce.task.io.sort.mb: 100
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - soft limit at 83886080
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - bufstart = 0; bufvoid = 104857600
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - kvstart = 26214396; length = 6553600
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner -
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Spilling map output
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - bufstart = 0; bufend = 92; bufvoid = 104857600
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - kvstart = 26214396(104857584); kvend = 26214340(104857360); length = 57/6553600
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Finished spill 0
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Task:attempt_local1992754859_0001_m_000000_0 is done. And is in the process of committing
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - map
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local1992754859_0001_m_000000_0' done.
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Final Counters for attempt_local1992754859_0001_m_000000_0: Counters: 23
File System Counters
FILE: Number of bytes read=158
FILE: Number of bytes written=290259
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=32
HDFS: Number of bytes written=0
HDFS: Number of read operations=5
HDFS: Number of large read operations=0
HDFS: Number of write operations=1
Map-Reduce Framework
Map input records=3
Map output records=15
Map output bytes=92
Map output materialized bytes=55
Input split bytes=100
Combine input records=15
Combine output records=6
Spilled Records=6
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=0
Total committed heap usage (bytes)=268435456
File Input Format Counters
Bytes Read=32
[LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - Finishing task: attempt_local1992754859_0001_m_000000_0
[Thread-4] INFO org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
[Thread-4] INFO org.apache.hadoop.mapred.LocalJobRunner - Waiting for reduce tasks
[pool-6-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local1992754859_0001_r_000000_0
[pool-6-thread-1] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1
[pool-6-thread-1] INFO org.apache.hadoop.yarn.util.ProcfsBasedProcessTree - ProcfsBasedProcessTree currently is supported only on Linux.
[pool-6-thread-1] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorProcessTree : [email protected]
[pool-6-thread-1] INFO org.apache.hadoop.mapred.ReduceTask - Using ShuffleConsumerPlugin: [email protected]
[pool-6-thread-1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - MergerManager: memoryLimit=334338464, maxSingleShuffleLimit=83584616, mergeThreshold=220663392, ioSortFactor=10, memToMemMergeOutputsThreshold=10
[EventFetcher for fetching Map Completion Events] INFO org.apache.hadoop.mapreduce.task.reduce.EventFetcher - attempt_local1992754859_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
[localfetcher#1] INFO org.apache.hadoop.mapreduce.task.reduce.LocalFetcher - localfetcher#1 about to shuffle output of map attempt_local1992754859_0001_m_000000_0 decomp: 51 len: 55 to MEMORY
[localfetcher#1] INFO org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput - Read 51 bytes from map-output for attempt_local1992754859_0001_m_000000_0
[localfetcher#1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - closeInMemoryFile -> map-output of size: 51, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->51
[EventFetcher for fetching Map Completion Events] INFO org.apache.hadoop.mapreduce.task.reduce.EventFetcher - EventFetcher is interrupted.. Returning
[pool-6-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - 1 / 1 copied.
[pool-6-thread-1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
[pool-6-thread-1] INFO org.apache.hadoop.mapred.Merger - Merging 1 sorted segments
[pool-6-thread-1] INFO org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 1 segments left of total size: 47 bytes
[pool-6-thread-1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - Merged 1 segments, 51 bytes to disk to satisfy reduce memory limit
[pool-6-thread-1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - Merging 1 files, 55 bytes from disk
[pool-6-thread-1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - Merging 0 segments, 0 bytes from memory into reduce
[pool-6-thread-1] INFO org.apache.hadoop.mapred.Merger - Merging 1 sorted segments
[pool-6-thread-1] INFO org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 1 segments left of total size: 47 bytes
[pool-6-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - 1 / 1 copied.
[main] INFO org.apache.hadoop.mapreduce.Job - map 100% reduce 0%
[pool-6-thread-1] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
[pool-6-thread-1] INFO org.apache.hadoop.mapred.Task - Task:attempt_local1992754859_0001_r_000000_0 is done. And is in the process of committing
[pool-6-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - 1 / 1 copied.
[pool-6-thread-1] INFO org.apache.hadoop.mapred.Task - Task attempt_local1992754859_0001_r_000000_0 is allowed to commit now
[pool-6-thread-1] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local1992754859_0001_r_000000_0' to hdfs://192.168.89.128:9000/output1/_temporary/0/task_local1992754859_0001_r_000000
[pool-6-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
[pool-6-thread-1] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local1992754859_0001_r_000000_0' done.
[pool-6-thread-1] INFO org.apache.hadoop.mapred.Task - Final Counters for attempt_local1992754859_0001_r_000000_0: Counters: 29
File System Counters
FILE: Number of bytes read=300
FILE: Number of bytes written=290314
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=32
HDFS: Number of bytes written=25
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Map-Reduce Framework
Combine input records=0
Combine output records=0
Reduce input groups=6
Reduce shuffle bytes=55
Reduce input records=6
Reduce output records=6
Spilled Records=6
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=0
Total committed heap usage (bytes)=268435456
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Output Format Counters
Bytes Written=25
[pool-6-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - Finishing task: attempt_local1992754859_0001_r_000000_0
[Thread-4] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce task executor complete.
[main] INFO org.apache.hadoop.mapreduce.Job - map 100% reduce 100%
[main] INFO org.apache.hadoop.mapreduce.Job - Job job_local1992754859_0001 completed successfully
[main] INFO org.apache.hadoop.mapreduce.Job - Counters: 35
File System Counters
FILE: Number of bytes read=458
FILE: Number of bytes written=580573
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=64
HDFS: Number of bytes written=25
HDFS: Number of read operations=13
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Map input records=3
Map output records=15
Map output bytes=92
Map output materialized bytes=55
Input split bytes=100
Combine input records=15
Combine output records=6
Reduce input groups=6
Reduce shuffle bytes=55
Reduce input records=6
Reduce output records=6
Spilled Records=12
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=0
Total committed heap usage (bytes)=536870912
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=32
File Output Format Counters
Bytes Written=25
Picked up _JAVA_OPTIONS: -Xmx512M
然后通过右键单击DFS Locations中的根目录进行刷新就可以看到生成的目录和文件:
参考博文:
https://blog.****.net/u010185220/article/details/79095179