windows(64位)本地(local)用eclipse调试mapreduce程序
一、环境准备
java环境、eclipse、 hadoop -2.x (windows环境下)
此处本人所用hadoop包 链接:https://pan.baidu.com/s/1230HUG2HluDsP1FT-tXa-g 密码:sodt (此处文件已全部替换完毕)
首先从网上下载64位winutils.exe、hadoop.dll将文件复制到hadoop/bin目录下,将lib文件中的native库替换为windows版本库,新建系统环境变量 HADOOP_HOME Path
此时,hadoop\bin中的文件
lib中文件
二、本次运行mapreduce
此处运行例子为简单的wordcount统计程序,代码如下
1.WordcountMapper:
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordcountMapper extends Mapper<LongWritable, Text, Text, IntWritable>
{
@Override
protected void map (LongWritable key ,Text value,Context context) throws IOException, InterruptedException
{
//拿到一行数据转换为string
String line=value.toString();
//将这一行切分出各个单词
String[] words =line.split(" " );
//遍历数组,输出<单词,1>
for (String word : words) {
context.write(new Text(word ), new IntWritable(1));
}
}
}
2.WordcountReducer
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordcountReducer extends Reducer<Text, IntWritable, Text, IntWritable>
{
//key单词 //value:[1,1] //Iterable<IntWritable> 迭代器
@Override
protected void reduce (Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException
{
//定义一个计数器
int count=0;
//遍历这一组kv的所有v,累加到count中
for (IntWritable value :values) {
count +=value.get();
}
context.write(key, new IntWritable(count));
}
}
3. WordcountDriver:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordcountDriver {
public static void main(String[] args) throws Exception {
Configuration conf =new Configuration();
//是否运行为本地模式,就是看这个参数值是否为local,默认就是local
conf.set("mapreduce.framework.name", "local");
conf.set("fs.defaults", "file:///");
Job job=Job.getInstance(conf);
//jar包路径
job.setJarByClass(WordcountDriver.class);
//指定本业务job
job.setMapperClass(WordcountMapper.class);
job.setReducerClass(WordcountReducer.class);
//指定mapper输出的kv类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
//指定最终输出的类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//指定job的输入原始文件在目录
FileInputFormat.setInputPaths(job, new Path(args[0]));
//指定job的输出结果目录
FileOutputFormat.setOutputPath(job, new Path(args[1]));
boolean res =job.waitForCompletion(true);
System.exit(res?0:1);
}
}
此处注意:mr程序中一定将运行参数改为本地
conf.set("mapreduce.framework.name", "local");
conf.set("fs.defaults", "file:///");
三、利用eclipse调试
main右键-->debug configure
设置完成后,debug
控制台输出:
2018-03-05 21:24:33,151 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1019)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2018-03-05 21:24:33,155 INFO [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
2018-03-05 21:24:33,505 WARN [main] mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(150)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2018-03-05 21:24:33,507 WARN [main] mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(259)) - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2018-03-05 21:24:33,776 INFO [main] input.FileInputFormat (FileInputFormat.java:listStatus(281)) - Total input paths to process : 1
2018-03-05 21:24:33,837 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(396)) - number of splits:1
2018-03-05 21:24:33,934 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(479)) - Submitting tokens for job: job_local2074308044_0001
2018-03-05 21:24:33,966 WARN [main] conf.Configuration (Configuration.java:loadProperty(2368)) - file:/tmp/hadoop-Administrator/mapred/staging/hadoop2074308044/.staging/job_local2074308044_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2018-03-05 21:24:33,972 WARN [main] conf.Configuration (Configuration.java:loadProperty(2368)) - file:/tmp/hadoop-Administrator/mapred/staging/hadoop2074308044/.staging/job_local2074308044_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2018-03-05 21:24:34,125 WARN [main] conf.Configuration (Configuration.java:loadProperty(2368)) - file:/tmp/hadoop-Administrator/mapred/local/localRunner/hadoop/job_local2074308044_0001/job_local2074308044_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2018-03-05 21:24:34,131 WARN [main] conf.Configuration (Configuration.java:loadProperty(2368)) - file:/tmp/hadoop-Administrator/mapred/local/localRunner/hadoop/job_local2074308044_0001/job_local2074308044_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2018-03-05 21:24:34,138 INFO [main] mapreduce.Job (Job.java:submit(1289)) - The url to track the job: http://localhost:8080/
2018-03-05 21:24:34,139 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1334)) - Running job: job_local2074308044_0001
2018-03-05 21:24:34,140 INFO [Thread-4] mapred.LocalJobRunner (LocalJobRunner.java:createOutputCommitter(471)) - OutputCommitter set in config null
2018-03-05 21:24:34,152 INFO [Thread-4] mapred.LocalJobRunner (LocalJobRunner.java:createOutputCommitter(489)) - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2018-03-05 21:24:34,203 INFO [Thread-4] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(448)) - Waiting for map tasks
2018-03-05 21:24:34,203 INFO [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner (LocalJobRunner.java:run(224)) - Starting task: attempt_local2074308044_0001_m_000000_0
2018-03-05 21:24:34,242 INFO [LocalJobRunner Map Task Executor #0] util.ProcfsBasedProcessTree (ProcfsBasedProcessTree.java:isAvailable(181)) - ProcfsBasedProcessTree currently is supported only on Linux.
2018-03-05 21:24:34,585 INFO [LocalJobRunner Map Task Executor #0] mapred.Task (Task.java:initialize(587)) - Using ResourceCalculatorProcessTree : [email protected]
2018-03-05 21:24:34,590 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:runNewMapper(733)) - Processing split: file:/D:/flowsum/input/w.txt:0+47
2018-03-05 21:24:34,632 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:createSortingCollector(388)) - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2018-03-05 21:24:34,675 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:setEquator(1182)) - (EQUATOR) 0 kvi 26214396(104857584)
2018-03-05 21:24:34,675 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:init(975)) - mapreduce.task.io.sort.mb: 100
2018-03-05 21:24:34,675 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:init(976)) - soft limit at 83886080
2018-03-05 21:24:34,676 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:init(977)) - bufstart = 0; bufvoid = 104857600
2018-03-05 21:24:34,676 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:init(978)) - kvstart = 26214396; length = 6553600
2018-03-05 21:24:34,687 INFO [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) -
2018-03-05 21:24:34,687 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:flush(1437)) - Starting flush of map output
2018-03-05 21:24:34,687 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:flush(1455)) - Spilling map output
2018-03-05 21:24:34,687 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:flush(1456)) - bufstart = 0; bufend = 84; bufvoid = 104857600
2018-03-05 21:24:34,688 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:flush(1458)) - kvstart = 26214396(104857584); kvend = 26214360(104857440); length = 37/6553600
2018-03-05 21:24:34,717 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:sortAndSpill(1641)) - Finished spill 0
2018-03-05 21:24:34,731 INFO [LocalJobRunner Map Task Executor #0] mapred.Task (Task.java:done(1001)) - Task:attempt_local2074308044_0001_m_000000_0 is done. And is in the process of committing
2018-03-05 21:24:34,788 INFO [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - map
2018-03-05 21:24:34,788 INFO [LocalJobRunner Map Task Executor #0] mapred.Task (Task.java:sendDone(1121)) - Task 'attempt_local2074308044_0001_m_000000_0' done.
2018-03-05 21:24:34,788 INFO [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner (LocalJobRunner.java:run(249)) - Finishing task: attempt_local2074308044_0001_m_000000_0
2018-03-05 21:24:34,789 INFO [Thread-4] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(456)) - map task executor complete.
2018-03-05 21:24:34,796 INFO [Thread-4] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(448)) - Waiting for reduce tasks
2018-03-05 21:24:34,797 INFO [pool-3-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:run(302)) - Starting task: attempt_local2074308044_0001_r_000000_0
2018-03-05 21:24:34,808 INFO [pool-3-thread-1] util.ProcfsBasedProcessTree (ProcfsBasedProcessTree.java:isAvailable(181)) - ProcfsBasedProcessTree currently is supported only on Linux.
2018-03-05 21:24:35,142 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1355)) - Job job_local2074308044_0001 running in uber mode : false
2018-03-05 21:24:35,143 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1362)) - map 100% reduce 0%
2018-03-05 21:24:35,252 INFO [pool-3-thread-1] mapred.Task (Task.java:initialize(587)) - Using ResourceCalculatorProcessTree : [email protected]
2018-03-05 21:24:35,284 INFO [pool-3-thread-1] mapred.ReduceTask (ReduceTask.java:run(362)) - Using ShuffleConsumerPlugin: [email protected]
2018-03-05 21:24:35,333 INFO [pool-3-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:<init>(193)) - MergerManager: memoryLimit=1321893888, maxSingleShuffleLimit=330473472, mergeThreshold=872449984, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2018-03-05 21:24:35,365 INFO [localfetcher#1] reduce.LocalFetcher (LocalFetcher.java:copyMapOutput(140)) - localfetcher#1 about to shuffle output of map attempt_local2074308044_0001_m_000000_0 decomp: 106 len: 110 to MEMORY
2018-03-05 21:24:35,368 INFO [EventFetcher for fetching Map Completion Events] reduce.EventFetcher (EventFetcher.java:run(61)) - attempt_local2074308044_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2018-03-05 21:24:40,808 INFO [communication thread] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - reduce > copy
2018-03-05 21:24:42,990 INFO [localfetcher#1] reduce.InMemoryMapOutput (InMemoryMapOutput.java:shuffle(100)) - Read 106 bytes from map-output for attempt_local2074308044_0001_m_000000_0
2018-03-05 21:24:43,808 INFO [communication thread] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - reduce > copy
2018-03-05 21:24:46,809 INFO [communication thread] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - reduce > copy
2018-03-05 21:24:48,472 INFO [localfetcher#1] reduce.MergeManagerImpl (MergeManagerImpl.java:closeInMemoryFile(307)) - closeInMemoryFile -> map-output of size: 106, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->106
2018-03-05 21:24:48,949 INFO [EventFetcher for fetching Map Completion Events] reduce.EventFetcher (EventFetcher.java:run(76)) - EventFetcher is interrupted.. Returning
2018-03-05 21:24:48,951 INFO [pool-3-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - 1 / 1 copied.
2018-03-05 21:24:48,951 INFO [pool-3-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:finalMerge(667)) - finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
2018-03-05 21:24:48,995 INFO [pool-3-thread-1] mapred.Merger (Merger.java:merge(591)) - Merging 1 sorted segments
2018-03-05 21:24:48,995 INFO [pool-3-thread-1] mapred.Merger (Merger.java:merge(690)) - Down to the last merge-pass, with 1 segments left of total size: 99 bytes
2018-03-05 21:24:48,997 INFO [pool-3-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:finalMerge(742)) - Merged 1 segments, 106 bytes to disk to satisfy reduce memory limit
2018-03-05 21:24:48,997 INFO [pool-3-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:finalMerge(772)) - Merging 1 files, 110 bytes from disk
2018-03-05 21:24:48,998 INFO [pool-3-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:finalMerge(787)) - Merging 0 segments, 0 bytes from memory into reduce
2018-03-05 21:24:48,998 INFO [pool-3-thread-1] mapred.Merger (Merger.java:merge(591)) - Merging 1 sorted segments
2018-03-05 21:24:48,999 INFO [pool-3-thread-1] mapred.Merger (Merger.java:merge(690)) - Down to the last merge-pass, with 1 segments left of total size: 99 bytes
2018-03-05 21:24:48,999 INFO [pool-3-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - 1 / 1 copied.
2018-03-05 21:24:49,045 INFO [pool-3-thread-1] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1019)) - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2018-03-05 21:24:49,096 INFO [pool-3-thread-1] mapred.Task (Task.java:done(1001)) - Task:attempt_local2074308044_0001_r_000000_0 is done. And is in the process of committing
2018-03-05 21:24:49,097 INFO [pool-3-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - 1 / 1 copied.
2018-03-05 21:24:49,098 INFO [pool-3-thread-1] mapred.Task (Task.java:commit(1162)) - Task attempt_local2074308044_0001_r_000000_0 is allowed to commit now
2018-03-05 21:24:49,100 INFO [pool-3-thread-1] output.FileOutputCommitter (FileOutputCommitter.java:commitTask(439)) - Saved output of task 'attempt_local2074308044_0001_r_000000_0' to file:/D:/flowsum/output/_temporary/0/task_local2074308044_0001_r_000000
2018-03-05 21:24:49,102 INFO [pool-3-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - reduce > reduce
2018-03-05 21:24:49,103 INFO [pool-3-thread-1] mapred.Task (Task.java:sendDone(1121)) - Task 'attempt_local2074308044_0001_r_000000_0' done.
2018-03-05 21:24:49,103 INFO [pool-3-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:run(325)) - Finishing task: attempt_local2074308044_0001_r_000000_0
2018-03-05 21:24:49,103 INFO [Thread-4] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(456)) - reduce task executor complete.
2018-03-05 21:24:49,145 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1362)) - map 100% reduce 100%
2018-03-05 21:24:49,145 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1373)) - Job job_local2074308044_0001 completed successfully
2018-03-05 21:24:49,154 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1380)) - Counters: 33
File System Counters
FILE: Number of bytes read=640
FILE: Number of bytes written=473924
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=4
Map output records=10
Map output bytes=84
Map output materialized bytes=110
Input split bytes=93
Combine input records=0
Combine output records=0
Reduce input groups=10
Reduce shuffle bytes=110
Reduce input records=10
Reduce output records=10
Spilled Records=20
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=0
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=464388096
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=47
File Output Format Counters
Bytes Written=76
调试成功,看看我们的输出目录:
ok!!