Hadoop7days-4 MR实现倒排索引

实现倒排索引值得是：将位于不同文件里面的单词，统计出其在不同文件中出现的次数，结果应为

“hello”，"a.txt->3,b.txt->2,c.txt->2"

的形式。要达成这一目标，需要设置多个mapper和reducer类。可以使用倒退的方法，来确定各个mapper和reducer要实现的功能，其步骤如下：

mapper 的输出是
context.write("hell0->a.txt","1");
context.write("hell0->a.txt","1");
context.write("hell0->a.txt","1");

shuffle后变为：
<"hello a.txt" , {1,1,1}>
------------------------------reducer
reducer的输入：
<"hello a.txt" , {1,1,1}>

reducer的输出应该是：
"hello","a.txt->3"
"hello","b.txt->2"
"hello","c.txt->2"
------------------------------maper的输出应该是：
mapper的输入应该是：
"hello","a.txt->3"
"hello","b.txt->2"
"hello","c.txt->2"

context.write("hello","a.txt->3"}
context.write("hello","b.txt->2"}
context.write("hello","c.txt->2"}
shuffle之后变为:

<"hello",{"a.txt->3","b.txt->2","c.txt->2">
-----------------------------最终reducer的输出
reducer的输入应该是
context.write("hello",{"a.txt->3","b.txt->2","c.txt->2"}

reducer的输出

context.write("hello","a.txt->3 b.txt->2 c.txt->2");

下面开始我们的设计：

第一个map应该讲文件变为 "word->name,"1"的形式

Hadoop7days-4 MR实现倒排索引

第一个reducer应该将 “word->name”,"1"变为 “word”,"name,1"的形式，我们加一个combiner，让combiner完成这个功能

Hadoop7days-4 MR实现倒排索引

reducer:

Hadoop7days-4 MR实现倒排索引

Hadoop7days-4 MR实现倒排索引

相关推荐