hadoop源码分析之mapredue的泛型类解析

首先,我们来看看下面函数
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(Object key, Text value, Context context
               ) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
   word.set(itr.nextToken());
   context.write(word, one);
}
}
}

我们平常在写mapreduce的过程中，似乎看到了<Object, Text, Text, IntWritable>  这几个参数，那么几个参数是什么意思？带有<>尖括号，又是什么意思。这个我们是否想过。
我们平常在类中，是很少带有参数的，即使带有参数，也是这种类型，比如
public void map(Object key, Text value, Context context  )

这个是很容易理解。key是object类型，value是Text类型。

什么是泛型

public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 那么这个的含义是什么？

如果我们知道泛型类，我们就知道带有尖括号代表泛型。比如：

[Java] 纯文本查看复制代码

/**
 
 * 自定义泛型
 
 * @author tfq
 
 *
 
 */
 
public class GenericArrayList<E> {
 
  Object[] objects=new Object[10];
 
  int index=0;
 
  /**
 
  * 父类引用指向子类对象
 
  * @param o
 
  */
 
  public void add(E o){
 
   if(index==objects.length){
 
      Object[] newObjects=new Object[objects.length*2];
 
      System.arraycopy(objects, 0, newObjects, 0, objects.length);
 
      objects=newObjects;
 
    }
 
     objects[index]=o;
 
     index++;
 
  }
 
    /**
 
    * 获取数组的长度
 
    * @return
 
    */
 
    public int size(){
 
       return index;
 
    }
 
   public static void main(String[] args) {
 
       //把E替换成你想要实现的类或类型
 
        GenericArrayList<String> geneArray=new GenericArrayList<String>();
 
        geneArray.add("a");
 
        System.out.println(geneArray.size());
 
    }
 
}

好吧，上面有些复杂，这里把它们抽离出来：

public class GenericArrayList<E>{}

public static void main(String[] args) {

//把E替换成你想要实现的类或类型

GenericArrayList<String> geneArray=new GenericArrayList<String>();

}

public class GenericArrayList<E>在实例化的时候，被替换为了String。

那么我们还可以怎么做
GenericArrayList<String> geneArray=new GenericArrayList<Integer >();

从这里我们是否能够明白泛型的方便了。
对于一个类型
当我们不确定是使用整型或则字符串类型的时候，我们可以使用泛型
当我们有时候使用泛型，有时候使用整型，我们可以使用泛型。
泛型是什么？
泛型是对数据类型的抽象。
那么什么是数据类型
Integer,String,shot等这些都是数据类型。

什么是泛型接口/类

这样我们明白了泛型，可是
<Object, Text, Text, IntWritable>
这个仍然无法解释，因为这里没有实例化，而且带有尖括号。那么它是什么意思。
下面我们给出一个小例子

[Java] 纯文本查看复制代码

//泛型接口定义

interface ISum<T> {

    public abstract void sum(T... t);
}
 
//从泛型接口继承的具体类型类

class IntSum implements ISum<Integer> {

    public void sum(Integer... t) {

        int s = 0;

        for (int e : t) {

            s += e;

        }

        System.out.println(s);

    }
}
 
class DoubleSum implements ISum<Double> {

    public void sum(Double... t) {

        double s = 0.0;

        for (double e : t) {

            s += e;

        }

        System.out.println(s);

    }
}
 
//使用示例

public class SumMain {
 
    public static void main(String[] args) {
 
        IntSum intSum = new IntSum();

        intSum.sum(1, 2, 3, 4, 5);

        intSum.sum(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
 
        DoubleSum doubleSum = new DoubleSum();

        doubleSum.sum(1.0, 1.5, 2.0, 2.5, 3.0);

        doubleSum.sum(1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0);
 
    }
}

interface ISum<T> {
}

//从泛型接口继承的具体类型类
class IntSum implements ISum<Integer>

我们看到上面是一个接口ISum和继续这个接口的类IntSum。

如果足够细心你会发现interface ISum<T> 中T被置换为Integer 所以从这里，我们得出
当一个泛型接口，被类继承的时候，泛型会被具体化。

好的，这样我们就明白了

mapreduce中Mapper泛型类

public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(Object key, Text value, Context context
               ) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
   word.set(itr.nextToken());
   context.write(word, one);
}
}
}

我们看到TokenizerMapper继承Mapper，类型被具体化了，也即是说Mapper可能是一个泛型类，好的，这时候我们在看源码
public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>

hadoop源码分析之mapredue的泛型类解析

果然Mapper是一个泛型类。

那么这时候，我们在回头看TokenizerMapper类。
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>

hadoop源码分析之mapredue的泛型类解析

相关推荐