39609

Spark ClassNotFoundException when run on yarn-cluster

my code:

import org.apache.spark.{SparkConf, SparkContext} object Run extends App { val conf = new SparkConf().setMaster("yarn-cluster").setAppName("t666") sc.addJar("hdfs://10.1.11.99:8020/user/spark/share/scalaj-http_2.10-2.3.0.jar") val sc = new SparkContext(conf) val b = scalaj.http.Base64.encodeString("刘") val a = Array[String](b) sc.parallelize(a).saveAsTextFile("hdfs://10.1.11.99:8020/testdata/t2/") }

and my submit commend is:

spark-submit --master yarn-cluster --class start.Run run.jar

the log on yarn show:

16/11/04 13:50:01 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done 16/11/04 13:50:01 INFO spark.SparkContext: Added JAR hdfs://10.1.11.99:8020/user/spark/share/scalaj-http_2.10-2.3.0.jar at hdfs://10.1.11.99:8020/user/spark/share/scalaj-http_2.10-2.3.0.jar with timestamp 1478238601256 16/11/04 13:50:01 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM@192.168.3.49:53976) 16/11/04 13:50:01 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.NoClassDefFoundError: scalaj/http/Base64 java.lang.NoClassDefFoundError: scalaj/http/Base64 at start.Run$delayedInit$body.apply(Run.scala:31) at scala.Function0$class.apply$mcV$sp(Function0.scala:40) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$$anonfun$main$1.apply(App.scala:71) at scala.App$$anonfun$main$1.apply(App.scala:71) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32) at scala.App$class.main(App.scala:71) at start.Run$.main(Run.scala:9) at start.Run.main(Run.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542) Caused by: java.lang.ClassNotFoundException: scalaj.http.Base64 at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 15 more 16/11/04 13:50:01 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.NoClassDefFoundError: scalaj/http/Base64) 16/11/04 13:50:01 INFO client.RMProxy: Connecting to ResourceManager at slave3/192.168.3.48:8030 16/11/04 13:50:01 INFO yarn.YarnRMClient: Registering the ApplicationMaster 16/11/04 13:50:01 INFO yarn.ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals 16/11/04 13:50:01 INFO spark.SparkContext: Invoking stop() from shutdown hook

the 2nd line show:

INFO spark.SparkContext: Added JAR hdfs://10.1.11.99:8020/user/spark/share/scalaj-http_2.10-2.3.0.jar at hdfs://10.1.11.99:8020/user/spark/share/scalaj-http_2.10-2.3.0.jar with timestamp 1478238601256

it seems already add the jar file into my classpath,but this exception i can't explain.

anyone's answer will be help me a lot!

Answer1:

I believe SparkContext.addJar only adds the JAR to the classpath of the workers, and not the driver. Try adding the JAR using the --jars option in the spark-submit command:

spark-submit --master yarn \ --deploy-mode cluster \ --jars hdfs://10.1.11.99:8020/user/spark/share/scalaj-http_2.10-2.3.0.jar \ --class start.Run run.jar

Recommend

  • SVG Fill Width to Child Elements
  • Visual Studio 2008 and JavaScript brackets formatting
  • Maven repository location is not updated in eclipse
  • Laravel Impossible to Create Root Directory
  • Table-per-type inheritance insert problem
  • drop duplicates pandas dataframe
  • Updates to SolrConfig.xml file are not being reflected
  • textIndicatorPrecision in dojox.dgauges using dojo
  • Pandas: merge_asof() sum multiple rows / don't duplicate
  • Is it possible to ask GIT, don't show a file?
  • How to format code on aptana 3?
  • unrecognized selector isPitched called
  • GWT Widget.addHandler
  • How to integrate Paytm with Codeigniter
  • Get UILabel out of UIButton
  • Entity Framework unable to delete database, database in use
  • Freeing interfaced object that descends from a TRectangle
  • Nested projects in multiproject visual studio templates
  • File not found error python
  • How do I refer to a client_deploy.wsdd file that's in WEB-INF?
  • Separate ID and Class for JS and CSS
  • Draw half infinite lines?
  • triggering user space with kernel
  • ckeditor and jquery UI dialog not working
  • Vuejs: Lifecycle hooks of child routerview components using keep alive
  • sweetalert2 inputoptions from file in select example
  • New Firebase failed: First argument must be a valid firebase URL and the path can't contain “.”
  • JSON encode and decode on PHP
  • Building Qt project for C++11 standard
  • What causes the runtime difference in this trivial fortran code?
  • Retrieve IP address of device
  • using System.Speech.Synthesis with Windows10 universal app (XAML-C#)
  • Implicit joins and Where in Doctrine - how?
  • Android Activity.onWindowFocusChanged doesn't get called from within TabHost
  • Adding elements to a huge XML file
  • How to define and use opencv mat of user type
  • How to rebase a series of branches?
  • Cassandra Data Model
  • Getting Messege Twice Using IMvxMessenger
  • Python/Django TangoWithDjango Models and Databases