MapReduceJoin采坑
文章目录
java.lang.NullPointerException
在刚开始执行代码时,指定的目录在hdfs上,运行时返回了一个空指针
java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
at org.apache.hadoop.util.Shell.run(Shell.java:478)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:738)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:831)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:814)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:664)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:452)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:309)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:133)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:148)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1325)
at com.ruozedata.bigdata.myself.MapJoin.JoinMapperDemo.run(JoinMapperDemo.java:123)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at com.ruozedata.bigdata.myself.MapJoin.JoinMapperDemo.main(JoinMapperDemo.java:142)
查看错误日志发现以下内容
2019-04-26 21:55:25,194 [main] ERROR - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:378)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:393)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:386)
at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:438)
at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:484)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:64)
at com.ruozedata.bigdata.myself.MapJoin.JoinMapperDemo.main(JoinMapperDemo.java:142)
上面描述在在Hadoop路径中找不到winutils文件
- 解决方案:
- 下载hadoop:链接:https://pan.baidu.com/s/1gvYs8dsAT8RCCdCnksJAHQ 提取码:uyo2
- 下载winutils:链接:https://pan.baidu.com/s/1v91wukmR85IsPuCdpdzKtg 提取码:co89
- 将下载好的hadoop压缩包解压到windows本地
- 将下载的winutils.exe放到hadoop-2.6.0-cdh5.7.0\bin目录
- 在代码中指定hadoop目录
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.println( "Please input 2 params: input output" );
System.exit( 0 );
}
System.setProperty( "hadoop.home.dir", "D:\\software\\hadoopapp\\hadoop-2.6.0-cdh5.7.0" );
......
java.lang.IllegalArgumentException: Wrong FS: hdfs://hadoop614:9000/g6/hadoop/MapReduceJoin/output, expected: file:///
-
报错说hdfs://hadoop614:9000/g6/hadoop/MapReduceJoin/output不是一个本地路径,当我将缓存文件路径和和输入输出路径改为本地路径时报
java.lang.IllegalArgumentException: Illegal character in opaque part at index 2: D:\ruozedata_workspace\g6_java\input\customer
错误,此错误解决方法在下面。解决后确实可以指定本地路径,但这不是我们想要的,我是要指定hdfs上的目录和文件 -
解决方法:使用hadoop配置文件core-site.xml中的属性,指定是一个hdfs的目录
@Override
public int run(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
//获取配置配置文件对象
Configuration configuration = new Configuration();
configuration.set( "fs.defaultFS", "hdfs://hadoop614:9000" );
.....
- 指定后再次运行,OK
- 注意:此时当你再次指定本地路径时,需要注释掉此行
configuration.set( "fs.defaultFS", "hdfs://hadoop614:9000" );
否则会出现以下错误:java.lang.IllegalArgumentException: Pathname /D:/ruozedata_workspace/g6_java/output from D:/ruozedata_workspace/g6_java/output is not a valid DFS filename.
:这不是一个HDFS目录
java.lang.IllegalArgumentException: Illegal character in opaque part at index 2: D:\ruozedata_workspace\g6_java\input\customer
- 这个报错是我将input和output指定到本地是报错
private static String cacheFile = "D:\\ruozedata_workspace\\g6_java\\input\\customer";
- 但是日志内没有打印任何ERROR信息
- 最后通过DEBUG运行,一行行找到了报错的原因
- 在将小文件写入缓存时
job.addCacheFile( URI.create( cacheFile ) );
,URI.create并没有获取到我们传入的字符串
具体原因没有查到,但是可以使用以下方法:
private static String cacheFile = "file:///D:/ruozedata_workspace/g6_java/input/customer";
private static String cacheFile = "/D:/ruozedata_workspace/g6_java/input/customer";