在Cloudera Manager提交spark任务失败

在hue中配置workflow,提交以后,报如下错误:

 Log Type: stderr
          
            Log Upload Time: Wed Aug 29 10:36:23 +0800 2018
          
            Log Length: 1452
          SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/disk04/yarn/nm/filecache/16/sparkcase2-1.0-SNAPSHOT-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
	at org.apache.spark.deploy.SparkSubmitArguments.handle(SparkSubmitArguments.scala:409)
	at org.apache.spark.launcher.SparkSubmitOptionParser.parse(SparkSubmitOptionParser.java:163)
	at org.apache.spark.deploy.SparkSubmitArguments.(SparkSubmitArguments.scala:104)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 5 more
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]

集群中的角色分配如下图:

在Cloudera Manager提交spark任务失败

这是我修改之后,修改之前,tdxy-bigdata-[5-10]的机器是没有Spark2 Gateway的角色,提交spark2任务后,会在5-10选一台机器运行,因为没有网关角色,所以不能给这几台机器发布客户端配置,导致找不到hadoop的配置。重新加上网关角色后,再提交任务,发现成功。
修改之前查看spark配置:

[[email protected] spark2-conf]# find / -name spark-env.sh
/opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/etc/spark/conf.dist/spark-env.sh

修改之后:

[[email protected] ccdeploy_spark2-conf_etcspark2conf.cloudera.spark2_on_yarn_-2119533512992985839]# find / -name spark-env.sh
/opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/etc/spark/conf.dist/spark-env.sh
/var/run/cloudera-scm-agent/process/ccdeploy_spark2-conf_etcspark2conf.cloudera.spark2_on_yarn_-2119533512992985839/aux/client/spark-env.sh
/var/run/cloudera-scm-agent/process/ccdeploy_spark2-conf_etcspark2conf.cloudera.spark2_on_yarn_-2119533512992985839/spark2-conf/spark-env.sh
/etc/spark2/conf.cloudera.spark2_on_yarn/spark-env.sh

/var目录的配置是提交spark后生成的,/etc目录下是cloudera通过spark gateway部署的。