hadoop mapreduce shuffle

it is a trick to understand  the shuffle principle in hadoop.let us to learn together.

if u find something wrong in this descriptions,please tell me why :)



hadoop mapreduce shuffle

here is a 4 maps + 3 reds

 

1.

why there are threes flows in the illustration(as marked by digit parts)?

when i check the property "mapred.local.dir",it says:

写道
The local directory where MapReduce stores intermediate
data files. May be a comma-separated list of
directories on different devices in order to **spread disk i/o**.

it is the really causes of that? it is necessary for me to chceck the sources to validate...TODO

 

 

2.u can see the merge oper before reduce,so there are two Sort phase in the shuffle in the fact.

someone asks:why it is necessary to sort after map before transfer to reduce?Performance! i think,if this,the Merge in Red is much more effective as shuffle used a "二路归并" sorting algorithm.