Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClassNotFoundException with class, defined in notebook #13

Closed
aenoskov opened this issue Apr 20, 2015 · 3 comments
Closed

ClassNotFoundException with class, defined in notebook #13

aenoskov opened this issue Apr 20, 2015 · 3 comments

Comments

@aenoskov
Copy link

Hi, and thank you for your work
spark 1.3.1, yarn-client
to reproduce

case class Test(x: Int)
sc.parallelize((0 to 99).map{Test(_)}).map{(_,1)}.countByKey()

org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 26.0 failed 4 times, most recent failure: Lost task 3.3 in stage 26.0 (TID 165): java.lang.ClassNotFoundException: $iwC$$iwC$Test
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)
    at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
    at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
    at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
    at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
    at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
    at org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:91)
    at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44)
    at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:64)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
@tribbloid
Copy link
Owner

You haven't upgraded right?

After you do so you should get:

In [1]:

case class Test(x:Int)

sc.parallelize((0 to 99).map{Test()}).map{(,1)}.countByKey()

Out[1]:

Map(Test(93) -> 1, Test(14) -> 1, Test(61) -> 1, Test(10) -> 1, Test(17) -> 1, Test(0) -> 1, Test(47) -> 1, Test(73) -> 1, Test(4) -> 1, Test(42) -> 1, Test(46) -> 1, Test(81) -> 1, Test(39) -> 1, Test(60) -> 1, Test(24) -> 1, Test(35) -> 1, Test(6) -> 1, Test(62) -> 1, Test(20) -> 1, Test(32) -> 1, Test(7) -> 1, Test(96) -> 1, Test(1) -> 1, Test(55) -> 1, Test(27) -> 1, Test(97) -> 1, Test(77) -> 1, Test(13) -> 1, Test(52) -> 1, Test(56) -> 1, Test(45) -> 1, Test(87) -> 1, Test(66) -> 1, Test(91) -> 1, Test(89) -> 1, Test(98) -> 1, Test(85) -> 1, Test(11) -> 1, Test(67) -> 1, Test(69) -> 1, Test(95) -> 1, Test(12) -> 1, Test(59) -> 1, Test(88) -> 1, Test(58) -> 1, Test(3) -> 1, Test(5) -> 1, Test(33) -> 1, Test(99) -> 1, Test(82) -> 1, Test(86) -> 1, Test(19) -> 1, Test(84) -> 1, Test(63) -> 1, Test(51) -> 1, Test(23) -> 1, Test(83) -> 1, Test(48) -> 1, Test(78) -> 1, Test(53) -> 1, Test(15) -> 1, Test(8) -> 1, Test(38) -> 1, Test(40) -> 1, Test(64) -> 1, Test(65) -> 1, Test(94) -> 1, Test(44) -> 1, Test(76) -> 1, Test(79) -> 1, Test(57) -> 1, Test(9) -> 1, Test(21) -> 1, Test(37) -> 1, Test(41) -> 1, Test(72) -> 1, Test(71) -> 1, Test(36) -> 1, Test(70) -> 1, Test(28) -> 1, Test(74) -> 1, Test(16) -> 1, Test(18) -> 1, Test(80) -> 1, Test(68) -> 1, Test(26) -> 1, Test(31) -> 1, Test(49) -> 1, Test(2) -> 1, Test(34) -> 1, Test(54) -> 1, Test(29) -> 1, Test(75) -> 1, Test(92) -> 1, Test(25) -> 1, Test(43) -> 1, Test(50) -> 1, Test(22) -> 1, Test(30) -> 1, Test(90) -> 1)

Thanks a lot for your reply!

On 04/20/2015 03:21 PM, Artem N. wrote:

Hi, and thank you for your work
to reproduce

|case class Test(x: Int)
sc.parallelize((0 to 99).map{Test()}).map{(,1)}.countByKey()

org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 26.0 failed 4 times, most recent failure: Lost task 3.3 in stage 26.0 (TID 165): java.lang.ClassNotFoundException: $iwC$$iwC$Test
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:91)
at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
|


Reply to this email directly or view it on GitHub
#13.

@aenoskov
Copy link
Author

Rebuilt everything, and it works perfectly
Thanks!

@tribbloid
Copy link
Owner

Thanks a lot! I'm not a frequent YARN user (never test it on YARN myself, just prayed it works as it share most of the components with spark-shell). Thank you so much for verifying its compatibility!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants