Hadoop with phoenix:如何将phoenix表格对象写入hdfs文件系统
问题描述:
我有一个map reduce作业,它使用phoenix从hbase表读取数据。我希望这个作业的输出是在HDFS中,然后馈送到另一个地图reduce作业,在那里我将更新到HBASE表。这是我尝试过的。Hadoop with phoenix:如何将phoenix表格对象写入hdfs文件系统
public class Job1Driver extends Configured implements Tool {
@Override
public int run(String[] args) throws Exception {
final org.apache.hadoop.conf.Configuration jobConfiguration = super.getConf();
final Job job1 = Job.getInstance(jobConfiguration, jobConfiguration.get("mapreduce.job.name"));
final String selectQuery = "SELECT * FROM TABLE1 WHERE IS_SUMMARY_RECORD=false";
job1.setJarByClass(Job1Driver.class);
PhoenixMapReduceUtil.setInputCluster(job1, jobConfiguration.get("HBASE_URL"));
PhoenixMapReduceUtil.setInput(job1, Table1Writable.class, "TABLE1", selectQuery);
if (jobConfiguration.get("IS_FROZEN_DATA_AVAILABLE").equals("True")) {
MultipleInputs.addInputPath(job1,new Path(args[0]),
TextInputFormat.class, FrozenMapper.class);
}
MultipleInputs.addInputPath(job1,new Path(args[1]),
PhoenixInputFormat.class,ActiveMapper.class);
FileOutputFormat.setOutputPath(job1, new Path(args[2]));
job1.setMapOutputKeyClass(Text.class);
job1.setMapOutputValueClass(Table1Writable.class);
job1.setOutputKeyClass(NullWritable.class);
job1.setOutputValueClass(Table1Writable.class);
job1.setReducerClass(Job1Reducer.class);
boolean st = job1.waitForCompletion(true);
return st ? 0 : 1;
}
public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
int exitCode = ToolRunner.run(conf, new Job1Driver(), args);
System.exit(exitCode);
}
当我运行它,我得到这样的事情在输出目录
[email protected]
使用可写的实现,我可以从映射出写入HDFS,但相同的是不工作的减速机出。有没有什么明显的我错过了?
答
您是否使用MapReduce,因为凤凰查询不能缩放?我们试图在Splice Machine(开源)基准测试Phoenix,并且我们无法将它扩展到大型查询/更新。
我认为你需要设置
job.setOutputFormatClass()
好运...