MapReduce编程入门版
实现一个WordCount
一、编写Map类
package com.hellohadoop;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class TokenizerMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()){
word.set(itr.nextToken());
context.write(word, one);
}
}
}
二、编写Reduce类
package com.hellohadoop;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException{
int sum = 0;
for (IntWritable val : values){
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
三、编写主类
package com.hellohadoop;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.reduce.IntSumReducer;
public class WordCount {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
if (args.length != 2){
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = new Job();
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
四、运行
【开启Hadoop集群】
start-all.sh
【注意】
hadoop对jar包的jdk版本有要求,我安装的版本要求用jdk1.7进行编译。
【上传文件到HDFS】
hdfs dfs -copyFromLocal /apps/hadoop/datainput/wordcount/news.txt /wordcount
【执行命令】
hadoop jar [jar包地址] [运行的类] [输入文件] [输出文件]
hadoop jar /apps/hadoop/myprograms/wordcount1.7.jar com.hellohadoop.WordCount /wordcount /output
【查看文件系统】
hdfs dfs -ls /output3 /home/my
【下载HDFS的文件】
hdfs dfs -get /output3 /home/my
其中Hadoop启动成功的标志为出现下面那几个进程。
6314 ResourceManager
7295 Jps
6037 DataNode
6180 SecondaryNameNode
5930 NameNode
6527 NodeManager
五、成功运行的日志
[[email protected] my]# hadoop jar /apps/hadoop/myprograms/wordcount1.7.jar com.hellohadoop.WordCount /wordcount /output3
18/10/14 16:15:30 INFO client.RMProxy: Connecting to ResourceManager at /192.168.190.129:18040
18/10/14 16:15:31 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
18/10/14 16:15:33 INFO input.FileInputFormat: Total input paths to process : 1
18/10/14 16:15:33 INFO mapreduce.JobSubmitter: number of splits:1
18/10/14 16:15:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1539558785155_0001
18/10/14 16:15:35 INFO impl.YarnClientImpl: Submitted application application_1539558785155_0001
18/10/14 16:15:35 INFO mapreduce.Job: The url to track the job: http://192.168.190.129:18088/proxy/application_1539558785155_0001/
18/10/14 16:15:35 INFO mapreduce.Job: Running job: job_1539558785155_0001
18/10/14 16:16:00 INFO mapreduce.Job: Job job_1539558785155_0001 running in uber mode : false
18/10/14 16:16:00 INFO mapreduce.Job: map 0% reduce 0%
18/10/14 16:16:25 INFO mapreduce.Job: map 100% reduce 0%
18/10/14 16:16:51 INFO mapreduce.Job: map 100% reduce 100%
18/10/14 16:16:53 INFO mapreduce.Job: Job job_1539558785155_0001 completed successfully
18/10/14 16:16:53 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=6058
FILE: Number of bytes written=205347
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2999
HDFS: Number of bytes written=2630
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=27004
Total time spent by all reduces in occupied slots (ms)=17731
Total time spent by all map tasks (ms)=27004
Total time spent by all reduce tasks (ms)=17731
Total vcore-seconds taken by all map tasks=27004
Total vcore-seconds taken by all reduce tasks=17731
Total megabyte-seconds taken by all map tasks=27652096
Total megabyte-seconds taken by all reduce tasks=18156544
Map-Reduce Framework
Map input records=1
Map output records=529
Map output bytes=4994
Map output materialized bytes=6058
Input split bytes=111
Combine input records=0
Combine output records=0
Reduce input groups=318
Reduce shuffle bytes=6058
Reduce input records=529
Reduce output records=318
Spilled Records=1058
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=321
CPU time spent (ms)=2830
Physical memory (bytes) snapshot=291082240
Virtual memory (bytes) snapshot=1688584192
Total committed heap usage (bytes)=136122368
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=2888
File Output Format Counters
Bytes Written=2630
六、实验成功结果截图