Hadoop with phoenix:如何将phoenix表格对象写入hdfs文件系统

问题描述:

我有一个map reduce作业,它使用phoenix从hbase表读取数据。我希望这个作业的输出是在HDFS中,然后馈送到另一个地图reduce作业,在那里我将更新到HBASE表。这是我尝试过的。Hadoop with phoenix:如何将phoenix表格对象写入hdfs文件系统

public class Job1Driver extends Configured implements Tool { 
@Override 
public int run(String[] args) throws Exception { 
    final org.apache.hadoop.conf.Configuration jobConfiguration = super.getConf(); 
    final Job job1 = Job.getInstance(jobConfiguration, jobConfiguration.get("mapreduce.job.name")); 
    final String selectQuery = "SELECT * FROM TABLE1 WHERE IS_SUMMARY_RECORD=false"; 
    job1.setJarByClass(Job1Driver.class); 
    PhoenixMapReduceUtil.setInputCluster(job1, jobConfiguration.get("HBASE_URL")); 
    PhoenixMapReduceUtil.setInput(job1, Table1Writable.class, "TABLE1", selectQuery); 
    if (jobConfiguration.get("IS_FROZEN_DATA_AVAILABLE").equals("True")) { 
     MultipleInputs.addInputPath(job1,new Path(args[0]), 
       TextInputFormat.class, FrozenMapper.class); 
    } 
    MultipleInputs.addInputPath(job1,new Path(args[1]), 
      PhoenixInputFormat.class,ActiveMapper.class); 

    FileOutputFormat.setOutputPath(job1, new Path(args[2])); 

    job1.setMapOutputKeyClass(Text.class); 
    job1.setMapOutputValueClass(Table1Writable.class); 

    job1.setOutputKeyClass(NullWritable.class); 
    job1.setOutputValueClass(Table1Writable.class); 

    job1.setReducerClass(Job1Reducer.class); 

    boolean st = job1.waitForCompletion(true); 

    return st ? 0 : 1; 

} 

public static void main(String[] args) throws Exception { 
    Configuration conf = HBaseConfiguration.create(); 

    int exitCode = ToolRunner.run(conf, new Job1Driver(), args); 
    System.exit(exitCode); 
} 

当我运行它,我得到这样的事情在输出目录

[email protected] 

使用可写的实现,我可以从映射出写入HDFS,但相同的是不工作的减速机出。有没有什么明显的我错过了?

您是否使用MapReduce,因为凤凰查询不能缩放?我们试图在Splice Machine(开源)基准测试Phoenix,并且我们无法将它扩展到大型查询/更新。

我认为你需要设置

job.setOutputFormatClass() 

好运...