Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with private[spark] functions? #7

Closed
mathieu1 opened this issue Dec 23, 2014 · 7 comments
Closed

problem with private[spark] functions? #7

mathieu1 opened this issue Dec 23, 2014 · 7 comments

Comments

@mathieu1
Copy link

I'm running into a problem while executing the standard spark/graphX example in ISpark, see this notebook.

Using Spark 1.1.1 with "local[2]" master and IPython Notebook 2.3, I get the following error:

org.apache.spark.SparkException: Job aborted due to stage failure: ClassNotFound with classloader: org.apache.spark.executor.ExecutorURLClassLoader@b6c3ef9
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
scala.Option.foreach(Option.scala:236)
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
akka.actor.ActorCell.invoke(ActorCell.scala:456)
akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
akka.dispatch.Mailbox.run(Mailbox.scala:219)
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

A similar error has already been brought up by @benjaminlaird also using ISpark, see his much simpler code.

I suppose this problem has to do with the ExecutorURLClassLoader class being private[spark] (see ExecutorURLClassLoader.scala)

Of course, all the code runs fine on the standard spark-shell. The same issue happens on the spark backend for IScala from @hvanhovell

@tribbloid
Copy link
Owner

Strange, this problem should be resolved:
https://issues.apache.org/jira/browse/SPARK-1199
I'll test on my laptop, in the mean time, please ensure that you Spark
binary has been upgraded to 1.1.1.
(Should be upgraded to 1.2.0 soon, after they fixed
https://issues.apache.org/jira/browse/SPARK-4923)

Yours Peng

On 12/23/2014 12:31 PM, Mathieu wrote:

I'm running into a problem while executing the standard spark/graphX
example
https://spark.apache.org/docs/1.1.1/graphx-programming-guide.html#examples
in ISpark, see this notebook
http://nbviewer.ipython.org/gist/mathieu1/4c7bf1ae84514939a83f.

Using Spark 1.1.1 with "local[2]" master and IPython Notebook 2.3, I
get the following error:

|org.apache.spark.SparkException: Job aborted due to stage failure: ClassNotFound with classloader: org.apache.spark.executor.ExecutorURLClassLoader@b6c3ef9
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
scala.Option.foreach(Option.scala:236)
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
akka.actor.ActorCell.invoke(ActorCell.scala:456)
akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
akka.dispatch.Mailbox.run(Mailbox.scala:219)
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
|

A similar error has already been brought up by @benjaminlaird
https://github.com/benjaminlaird also using ISpark, see his much
simpler code https://gist.github.com/benjaminlaird/3e543a9a89fb499a3a14.

I suppose this problem has to do with the |ExecutorURLClassLoader|
class being |private[spark]| (see |ExecutorURLClassLoader.scala|)

Of course, all the code runs fine on the standard |spark-shell|. The
same issue happens on the spark backend for IScala
mattpap/IScala#21 from @hvanhovell
https://github.com/hvanhovell


Reply to this email directly or view it on GitHub
#7.

@tribbloid
Copy link
Owner

Confirmed, this is a bug caused by bypassing some steps in SparkImport (as classloaders in executors are different from that in master).

@tribbloid
Copy link
Owner

Hi @mathieu1 ,

Thanks a lot for your prompt. This is getting more and more interesting. First I would like to confirm you are not running on scala_2.11? Apparently spark-repl setup a ClassLoader server in this case to synch between driver & executor (scala_2.10.4 doesn't have this feature).

The problem only happens when new class is defined in interpreter and its instances being collected from executor. It doesn't matter where the class is instantiated. There is no problem printing it locally.

Further more, its always thrown by ExecutorUrlClassLoader which is only used in SparkSubmit. So maybe there are some secret hacking in spark-shell.sh which I didn't scrutinize.

I'll get back to you once I advance, please keep me informed. Also, have you tried it on NFlab zeppelin & ibm Spark-kernel? Do they work?

@mathieu1
Copy link
Author

mathieu1 commented Jan 8, 2015

Indeed I compiled and ran averything on scala 2.10.4

Regarding your last question : I tried IBM's spark-kernel and it worked :) Their dependency on an old version of zeromq (2) made it nontrivial for me to setup however.

I haven't managed to use NFlab's zeppelin at all so far.

@tribbloid
Copy link
Owner

aha, looks like its working on zeppelin as well, I'll likely switch my backend to them in the future.

@tribbloid
Copy link
Owner

I still get this error in 1.3.0, looks like there is going be some serious hacking to the class loader.

I'll simply copy @benjaminlaird's test script here, in case it got deleted by original author.

case class Circle(rad:Float)
val rdd = sc.parallelize(1 to 10000).map(i=>Circle(i.toFloat))
rdd.take(10)

tribbloid pushed a commit that referenced this issue Apr 12, 2015
…becomes private

visualization upgraded to be compatible with dataframe api
fix a bug: problem with private[spark] functions? #7, interpreter now use class server correctly
display dsl are moved into a new package
fix 2 bugs in display dataframe as table.
@tribbloid
Copy link
Owner

ok problem fixed. Turns out to be easier than I think.
This issue will be closed after 3 days without objection

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants