实现一个WordCount

一、编写Map类

package com.hellohadoop;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class TokenizerMapper extends Mapper<LongWritable, Text, Text, IntWritable>{

	private final static IntWritable one = new IntWritable(1);
	private Text word = new Text();
	
	public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{
		StringTokenizer itr = new StringTokenizer(value.toString());
		while (itr.hasMoreTokens()){
			word.set(itr.nextToken());
			context.write(word, one);
		}
	}
}

二、编写Reduce类

package com.hellohadoop;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
	
	private IntWritable result = new IntWritable();
	
	public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException{
		int sum = 0;
		for (IntWritable val : values){
			sum += val.get();
		}
		result.set(sum);
		context.write(key, result);
	}

}

三、编写主类

package com.hellohadoop;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.reduce.IntSumReducer;

public class WordCount {
	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
		Configuration conf = new Configuration();
		if (args.length != 2){
			System.err.println("Usage: wordcount <in> <out>");
			System.exit(2);
		}
		Job job = new Job();
		job.setJarByClass(WordCount.class);
		job.setMapperClass(TokenizerMapper.class);
		job.setReducerClass(IntSumReducer.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		FileInputFormat.addInputPath(job, new Path(args[0]));
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		System.exit(job.waitForCompletion(true) ? 0 : 1);
	}
}

四、运行

【开启Hadoop集群】
start-all.sh

【注意】
hadoop对jar包的jdk版本有要求，我安装的版本要求用jdk1.7进行编译。

【上传文件到HDFS】
hdfs dfs -copyFromLocal /apps/hadoop/datainput/wordcount/news.txt /wordcount


【执行命令】
hadoop jar [jar包地址] [运行的类] [输入文件] [输出文件]
hadoop jar /apps/hadoop/myprograms/wordcount1.7.jar com.hellohadoop.WordCount /wordcount /output


【查看文件系统】
hdfs dfs -ls /output3 /home/my


【下载HDFS的文件】
hdfs dfs -get /output3 /home/my

其中Hadoop启动成功的标志为出现下面那几个进程。

6314 ResourceManager
7295 Jps
6037 DataNode
6180 SecondaryNameNode
5930 NameNode
6527 NodeManager

五、成功运行的日志

[[email protected] my]# hadoop jar /apps/hadoop/myprograms/wordcount1.7.jar com.hellohadoop.WordCount /wordcount /output3
18/10/14 16:15:30 INFO client.RMProxy: Connecting to ResourceManager at /192.168.190.129:18040
18/10/14 16:15:31 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
18/10/14 16:15:33 INFO input.FileInputFormat: Total input paths to process : 1
18/10/14 16:15:33 INFO mapreduce.JobSubmitter: number of splits:1
18/10/14 16:15:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1539558785155_0001
18/10/14 16:15:35 INFO impl.YarnClientImpl: Submitted application application_1539558785155_0001
18/10/14 16:15:35 INFO mapreduce.Job: The url to track the job: http://192.168.190.129:18088/proxy/application_1539558785155_0001/
18/10/14 16:15:35 INFO mapreduce.Job: Running job: job_1539558785155_0001
18/10/14 16:16:00 INFO mapreduce.Job: Job job_1539558785155_0001 running in uber mode : false
18/10/14 16:16:00 INFO mapreduce.Job:  map 0% reduce 0%
18/10/14 16:16:25 INFO mapreduce.Job:  map 100% reduce 0%
18/10/14 16:16:51 INFO mapreduce.Job:  map 100% reduce 100%
18/10/14 16:16:53 INFO mapreduce.Job: Job job_1539558785155_0001 completed successfully
18/10/14 16:16:53 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=6058
                FILE: Number of bytes written=205347
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=2999
                HDFS: Number of bytes written=2630
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=27004
                Total time spent by all reduces in occupied slots (ms)=17731
                Total time spent by all map tasks (ms)=27004
                Total time spent by all reduce tasks (ms)=17731
                Total vcore-seconds taken by all map tasks=27004
                Total vcore-seconds taken by all reduce tasks=17731
                Total megabyte-seconds taken by all map tasks=27652096
                Total megabyte-seconds taken by all reduce tasks=18156544
        Map-Reduce Framework
                Map input records=1
                Map output records=529
                Map output bytes=4994
                Map output materialized bytes=6058
                Input split bytes=111
                Combine input records=0
                Combine output records=0
                Reduce input groups=318
                Reduce shuffle bytes=6058
                Reduce input records=529
                Reduce output records=318
                Spilled Records=1058
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=321
                CPU time spent (ms)=2830
                Physical memory (bytes) snapshot=291082240
                Virtual memory (bytes) snapshot=1688584192
                Total committed heap usage (bytes)=136122368
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=2888
        File Output Format Counters 
                Bytes Written=2630

六、实验成功结果截图

MapReduce编程入门版

MapReduce编程入门版

实现一个WordCount

一、编写Map类

二、编写Reduce类

三、编写主类

四、运行

五、成功运行的日志

相关推荐