Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark Build #24

Open
wittyResry opened this issue Oct 5, 2016 · 7 comments
Open

Spark Build #24

wittyResry opened this issue Oct 5, 2016 · 7 comments

Comments

@wittyResry
Copy link
Owner

wittyResry commented Oct 5, 2016

  • build spark
  • need add Scala and mvn TO PATH with area JDK1.8
  • 命令: ./build/mvn install -DskipTests./build/mvn -Pyarn -Phadoop-2.3 -DskipTests -Phive -Phive-thriftserver clean package ,在build下面会下载build需要的环境mvn、scala、zinc等,为下面配置环境做好准备
    • BuildSuccess
  • 首次运行可以看到spark会下载下面依赖
    image

$ ./build/mvn install -DskipTests
配置依赖,使用spark的git仓库下build目录中的
$ vi ~/.bash_profile
export MAVEN_HOME=/Users/resry/gitworkspace/spark/build/apache-maven-3.3.9
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
export JAVA6_HOME=$(/usr/libexec/java_home  -v "1.6*")
export JAVA8_HOME=$(/usr/libexec/java_home  -v "1.8*")
export JAVA_HOME=$JAVA8_HOME
export SPARK_HOME=/Users/resry/sparkTest/spark-2.0.0-bin-hadoop2.3
#export SCALA_HOME=/Users/resry/scala-2.10.4
export SCALA_HOME=/Users/resry/gitworkspace/spark/build/scala-2.11.8
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=.:$SPARK_HOME/bin:$SCALA_HOME/bin:$JAVA_HOME/bin:$MAVEN_HOME/bin:$HOME/bin:$PATH
编辑完后
$ source ~/.bash_profile

echo $PATH
.:/Users/resry/gitworkspace/spark/build/scala-2.11.8/bin:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/bin:/Users/resry/gitworkspace/spark/build/apache-maven-3.3.9/bin:/home/dfs/apache-maven-3.2.3/bin:.:/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin:/Users/resry/alipay/apache-maven-2.2.1/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
  • 下面配置sbt
$  ~ which sbt
/Users/resry/bin/sbt
将下面两行拷贝到/Users/resry/bin/sbt
$  ~ cat /Users/resry/bin/sbt
SBT_OPTS="-Xms512M -Xmx1536M -Xss1M -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=256M"
java $SBT_OPTS -jar /Users/resry/gitworkspace/spark/build/sbt-launch-0.13.11.jar "$@"
添加$HOME/bin到path中
$  ~ echo $PATH
.:/Users/resry/sparkTest/spark-2.0.0-bin-hadoop2.3/bin:/Users/resry/gitworkspace/spark/build/scala-2.11.8/bin:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/bin:/Users/resry/gitworkspace/spark/build/apache-maven-3.3.9/bin:/Users/resry/bin:/home/dfs/apache-maven-3.2.3/bin:.:/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin:/Users/resry/alipay/apache-maven-2.2.1/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
@wittyResry
Copy link
Owner Author

wittyResry commented Oct 5, 2016

Running Tests

  • command: ./build/mvn -Pyarn -Phadoop-2.3 -DskipTests -Phive -Phive-thriftserver clean package
    • BuildSuccess
  • ./build/mvn -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver test
    • BuildFailure
- versioning and immutability *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 2, localhost): org.apache.hadoop.util.Shell$ExitCodeException: chmod: /Users/resry/gitworkspace/spark/sql/core/target/tmp/StateStoreRDDSuite2889323466578328859/??鹤具??氏賂??-6bbf4b2c-a35f-418f-8944-ed2603511959/0/0/temp--7084769780949456207: No such file or directory

    at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
    at org.apache.hadoop.util.Shell.run(Shell.java:418)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:631)
    at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:468)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:907)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:785)
    at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.<init>(HDFSBackedStateStoreProvider.scala:90)
    at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.getStore(HDFSBackedStateStoreProvider.scala:215)
    at org.apache.spark.sql.execution.streaming.state.StateStore$.get(StateStore.scala:145)
    at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:61)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
    at org.apache.spark.scheduler.Task.run(Task.scala:86)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
  at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1452)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1440)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1439)
  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1439)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
  at scala.Option.foreach(Option.scala:257)
  at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
  ...
  Cause: org.apache.hadoop.util.Shell$ExitCodeException: chmod: /Users/resry/gitworkspace/spark/sql/core/target/tmp/StateStoreRDDSuite2889323466578328859/??鹤具??氏賂??-6bbf4b2c-a35f-418f-8944-ed2603511959/0/0/temp--7084769780949456207: No such file or directory
  at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
  at org.apache.hadoop.util.Shell.run(Shell.java:418)
  at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
  at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
  at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
  at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:631)
  at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:468)
  at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
  at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:907)
  ...
- recovering from files *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1.0 failed 1 times, most recent failure: Lost task 1.0 in stage 1.0 (TID 2, localhost): org.apache.hadoop.util.Shell$ExitCodeException: chmod: /Users/resry/gitworkspace/spark/sql/core/target/tmp/StateStoreRDDSuite2889323466578328859/吵??Ⅲ挹?痐??緔-8291895d-e5c0-4dd2-ba91-090191f58470/0/1/temp--4813019502269688640: No such file or directory

    at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
    at org.apache.hadoop.util.Shell.run(Shell.java:418)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:631)
    at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:468)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:907)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:785)
    at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.<init>(HDFSBackedStateStoreProvider.scala:90)
    at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.getStore(HDFSBackedStateStoreProvider.scala:215)
    at org.apache.spark.sql.execution.streaming.state.StateStore$.get(StateStore.scala:145)
    at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:61)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
    at org.apache.spark.scheduler.Task.run(Task.scala:86)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
  at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1452)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1440)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1439)
  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1439)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
  at scala.Option.foreach(Option.scala:257)
  at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
  ...
  Cause: org.apache.hadoop.util.Shell$ExitCodeException: chmod: /Users/resry/gitworkspace/spark/sql/core/target/tmp/StateStoreRDDSuite2889323466578328859/吵??Ⅲ挹?痐??緔-8291895d-e5c0-4dd2-ba91-090191f58470/0/1/temp--4813019502269688640: No such file or directory
  at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
  at org.apache.hadoop.util.Shell.run(Shell.java:418)
  at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
  at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
  at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
  at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:631)
  at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:468)
  at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
  at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:907)
  ...
- usage with iterators - only gets and only puts *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 2, localhost): org.apache.hadoop.util.Shell$ExitCodeException: chmod: /Users/resry/gitworkspace/spark/sql/core/target/tmp/StateStoreRDDSuite2889323466578328859/????o?縅长??-529b4047-7de8-4d74-bfd7-9df02c4a0c95/0/0/temp--5374338674303412730: No such file or directory

    at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
    at org.apache.hadoop.util.Shell.run(Shell.java:418)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:631)
    at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:468)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:907)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:785)
    at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.<init>(HDFSBackedStateStoreProvider.scala:90)
    at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.getStore(HDFSBackedStateStoreProvider.scala:215)
    at org.apache.spark.sql.execution.streaming.state.StateStore$.get(StateStore.scala:145)
    at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:61)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
    at org.apache.spark.scheduler.Task.run(Task.scala:86)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
  at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1452)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1440)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1439)
  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1439)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
  at scala.Option.foreach(Option.scala:257)
  at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
  ...
  Cause: org.apache.hadoop.util.Shell$ExitCodeException: chmod: /Users/resry/gitworkspace/spark/sql/core/target/tmp/StateStoreRDDSuite2889323466578328859/????o?縅长??-529b4047-7de8-4d74-bfd7-9df02c4a0c95/0/0/temp--5374338674303412730: No such file or directory
  at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
  at org.apache.hadoop.util.Shell.run(Shell.java:418)
  at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
  at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
  at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
  at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:631)
  at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:468)
  at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
  at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:907)
  ...
- preferred locations using StateStoreCoordinator *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 2, localhost): org.apache.hadoop.util.Shell$ExitCodeException: chmod: /Users/resry/gitworkspace/spark/sql/core/target/tmp/StateStoreRDDSuite2889323466578328859/覄??攂?鈜?陉∞赻-b22ebd9d-6fa5-4d95-b1a8-6cfbd504e8a4/0/0/temp-7355846794818621318: No such file or directory

    at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
    at org.apache.hadoop.util.Shell.run(Shell.java:418)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:631)
    at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:468)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:907)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:785)
    at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.<init>(HDFSBackedStateStoreProvider.scala:90)
    at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.getStore(HDFSBackedStateStoreProvider.scala:215)
    at org.apache.spark.sql.execution.streaming.state.StateStore$.get(StateStore.scala:145)
    at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:61)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
    at org.apache.spark.scheduler.Task.run(Task.scala:86)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
  at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1452)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1440)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1439)
  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1439)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
  at scala.Option.foreach(Option.scala:257)
  at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
  ...
  Cause: org.apache.hadoop.util.Shell$ExitCodeException: chmod: /Users/resry/gitworkspace/spark/sql/core/target/tmp/StateStoreRDDSuite2889323466578328859/覄??攂?鈜?陉∞赻-b22ebd9d-6fa5-4d95-b1a8-6cfbd504e8a4/0/0/temp-7355846794818621318: No such file or directory
  at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
  at org.apache.hadoop.util.Shell.run(Shell.java:418)
  at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
  at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
  at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
  at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:631)
  at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:468)
  at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
  at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:907)
  ...
- distributed test *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 8, localhost): org.apache.hadoop.util.Shell$ExitCodeException: chmod: /Users/resry/gitworkspace/spark/sql/core/target/tmp/StateStoreRDDSuite2889323466578328859/??咳?ㄋ钐??喷?-c9efb979-4fd3-4a90-a1c2-38c0b5cd4138/0/0/temp--6706523959583811421: No such file or directory

    at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
    at org.apache.hadoop.util.Shell.run(Shell.java:418)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:631)
    at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:468)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:907)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:785)
    at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.<init>(HDFSBackedStateStoreProvider.scala:90)
    at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.getStore(HDFSBackedStateStoreProvider.scala:215)
    at org.apache.spark.sql.execution.streaming.state.StateStore$.get(StateStore.scala:145)
    at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:61)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
    at org.apache.spark.scheduler.Task.run(Task.scala:86)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
  at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1452)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1440)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1439)
  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1439)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
  at scala.Option.foreach(Option.scala:257)
  at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
  ...
  Cause: org.apache.hadoop.util.Shell$ExitCodeException: chmod: /Users/resry/gitworkspace/spark/sql/core/target/tmp/StateStoreRDDSuite2889323466578328859/??咳?ㄋ钐??喷?-c9efb979-4fd3-4a90-a1c2-38c0b5cd4138/0/0/temp--6706523959583811421: No such file or directory
  at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
  at org.apache.hadoop.util.Shell.run(Shell.java:418)
  at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
  at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
  at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
  at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:631)
  at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:468)
  at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
  at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:907)
  ...
Run completed in 11 minutes, 2 seconds.
Total number of tests run: 2322
Suites: completed 157, aborted 0
Tests: succeeded 2307, failed 15, canceled 0, ignored 49, pending 0
*** 15 TESTS FAILED ***
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [  5.210 s]
[INFO] Spark Project Tags ................................. SUCCESS [  3.796 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 15.945 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 43.313 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  9.860 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [  9.415 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 11.531 s]
[INFO] Spark Project Core ................................. SUCCESS [17:31 min]
[INFO] Spark Project GraphX ............................... SUCCESS [01:04 min]
[INFO] Spark Project Streaming ............................ SUCCESS [04:42 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [07:44 min]
[INFO] Spark Project SQL .................................. FAILURE [13:51 min]
[INFO] Spark Project ML Local Library ..................... SKIPPED
[INFO] Spark Project ML Library ........................... SKIPPED
[INFO] Spark Project Tools ................................ SKIPPED
[INFO] Spark Project Hive ................................. SKIPPED
[INFO] Spark Project REPL ................................. SKIPPED
[INFO] Spark Project YARN Shuffle Service ................. SKIPPED
[INFO] Spark Project YARN ................................. SKIPPED
[INFO] Spark Project Hive Thrift Server ................... SKIPPED
[INFO] Spark Project Assembly ............................. SKIPPED
[INFO] Spark Project External Flume Sink .................. SKIPPED
[INFO] Spark Project External Flume ....................... SKIPPED
[INFO] Spark Project External Flume Assembly .............. SKIPPED
[INFO] Spark Integration for Kafka 0.8 .................... SKIPPED
[INFO] Spark Project Examples ............................. SKIPPED
[INFO] Spark Project External Kafka Assembly .............. SKIPPED
[INFO] Spark Integration for Kafka 0.10 ................... SKIPPED
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SKIPPED
[INFO] Spark Project Java 8 Tests ......................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 46:34 min
[INFO] Finished at: 2016-10-04T18:52:03+08:00
[INFO] Final Memory: 56M/739M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.scalatest:scalatest-maven-plugin:1.0:test (test) on project spark-sql_2.11: There are test failures -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :spark-sql_2.11

@wittyResry
Copy link
Owner Author

Running Tests

The ScalaTest plugin also supports running only a specific Scala test suite as follows:

./build/mvn -P... -Dtest=none -DwildcardSuites=org.apache.spark.repl.ReplSuite test
./build/mvn -P... -Dtest=none -DwildcardSuites=org.apache.spark.repl.* test

or a Java test:

./build/mvn test -P... -DwildcardSuites=none -Dtest=org.apache.spark.streaming.JavaAPISuite

@wittyResry
Copy link
Owner Author

Testing with SBT

sbt 实际上是一个jar包,下载的包会放到~/.ivy2/cache

@wittyResry
Copy link
Owner Author

wittyResry commented Oct 6, 2016

调试Spark

  • 参考知乎回答
  • 勾选IDEA的hive和hive-thriftserver
  • 在 $HOME/.sbt/0.13/global.sbt 中添加以下内容即可:
import scala.Console.{BLUE, RESET, UNDERLINED}

shellPrompt := { state =>
  val projectId = Project.extract(state).currentProject.id
  s"$BLUE${UNDERLINED}sbt ($projectId)>$RESET "
}

在spark目录下运行

$ ./build/sbt -Phive -Phive-thriftserver
sbt (spark)> project hive
sbt (hive)> 
设置监听模式远程调试选项:
sbt (hive)> set javaOptions in Test += "-agentlib:jdwp=transport=dt_socket,server=n,suspend=n,address=localhost:5005"
最后启动我们的 test case:
sbt (hive)> testOnly *.HiveQuerySuite -- -t foo

配置监听模式

Debugger mode:Listen,debugger 担任服务器预先启动并在指定端口监听,等待目标程序主动连入
Host / Port:Debugger 监听的地址和端口,默认为 localhost:5005

Spark错误解决

一次成功的调试

@wittyResry
Copy link
Owner Author

wittyResry commented Oct 6, 2016

Spark 使用SBT运行特定TestCase和在IDEA进行远程Debug调试。

  • 首先计入SBT执行test
  • 在Spark目录执行./build/sbt -Phive -Phive-thriftserver
sbt (spark)> project core
  • 设置监听模式,然后在IDEA中以Debugger mode:Listen开启调试,并且在AccumulatorSuite上打断点。
  • SBT命令界面执行下面语句后进入debug
sbt (core)> set javaOptions in Test += "-agentlib:jdwp=transport=dt_socket,server=n,suspend=n,address=localhost:5005"
sbt (core)> testOnly *.AccumulatorSuite -- -t "accumulator serialization"
  • 如果出现下面的报错,说明IDEA的5005的Socket监听端口没有打开,只需要打开IDEA的debug端口即可。
[debug] javaOptions: List(-Xmx3g, -Djava.io.tmpdir=/Users/resry/spark/target/tmp, -Dspark.test.home=/Users/resry/spark, -Dspark.testing=1, -Dspark.port.maxRetries=100, -Dspark.master.rest.enabled=false, -Dspark.ui.enabled=false, -Dspark.ui.showConsoleProgress=false, -Dspark.unsafe.exceptionOnMemoryLeak=true, -Dsun.io.serialization.extendedDebugInfo=false, -Dderby.system.durability=test, -ea, -Xmx3g, -Xss4096k, -XX:PermSize=128M, -XX:MaxNewSize=256m, -XX:MaxPermSize=1g, -agentlib:jdwp=transport=dt_socket,server=n,suspend=n,address=localhost:5005)
[debug] Forking tests - parallelism = false
[error] Could not accept connection from test agent: class java.net.SocketException: Socket closed
java.net.SocketException: Socket closed
    at java.net.PlainSocketImpl.socketAccept(Native Method)
    at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:404)
    at java.net.ServerSocket.implAccept(ServerSocket.java:545)
    at java.net.ServerSocket.accept(ServerSocket.java:513)
    at sbt.ForkTests$$anonfun$mainTestTask$1$Acceptor$2$.run(ForkTests.scala:46)
    at java.lang.Thread.run(Thread.java:745)
  • 成功后如下图
    image

@wittyResry
Copy link
Owner Author

wittyResry commented Oct 6, 2016

解决IDEA本地编译问题

  • 首先运行命令./build/mvn -Pyarn -Phadoop-2.3 -DskipTests -Phive -Phive-thriftserver clean package 生成需要的target文件,因为有两个插件需要依赖target,下面详述在IDE中build失败的解决办法。
  • target -> scala-2.11 -> src_managed -> main -> compiled_avro, 右键单击compiled_avro选中Sources选项,标记为Sources目录。注意,保留其他目录任然为红色(exclude),如下图。
    sparksources
  • 如果出现org.apache.spark.sql.catalyst.parser.SqlBaseParser无法解析需要再标记目录为Sources目录
    spark-catalyst-sources
  • 如果出现编译hive-thriftserver编译有问题,需要添加GenatedSourceCode,目录在src/gen/java下面
    sparkgensourcesinclude
  • maven 版本选择
    image
  • 出现IO GBK编码问题时候,需要添加参数-encoding utf-8到Compile选项中
    image
  • 编码选择UTF-8
    sparkfileencoding
  • 解决上面两个问题后就可以build成功
    image

@wittyResry wittyResry reopened this Oct 7, 2016
@wittyResry
Copy link
Owner Author

wittyResry commented Oct 13, 2016

解决本地执行example测试用例

  • 选择Profiles编译成功
    image
  • 编译成功后,可以选择run一个example,如选择org.apache.spark.examples.JavaWordCount的用例尝试运行。需要注意的是给上-Dspark.master=local的JVM启动参数,入参指定/etc/passwd,如下图
    image
  • 运行后会出现找不到类的问题,需要手动把example-pom下所有scope的provided改成compile,还有一些需要在`File->Project Structure... ->Modules->Dependencies ->scope的所有都改成compile,这样就可以正常运行了(PS:先改pom,大部分都会自动同步变Compile,可减少重复操作,还有部分的scope需要在IDEA设置里面单独改):
diff --git a/examples/pom.xml b/examples/pom.xml
index d222794..ad966e1 100644
--- a/examples/pom.xml
+++ b/examples/pom.xml
@@ -35,10 +35,10 @@
     <sbt.project.name>examples</sbt.project.name>
     <build.testJarPhase>none</build.testJarPhase>
     <build.copyDependenciesPhase>package</build.copyDependenciesPhase>
-    <flume.deps.scope>provided</flume.deps.scope>
-    <hadoop.deps.scope>provided</hadoop.deps.scope>
-    <hive.deps.scope>provided</hive.deps.scope>
-    <parquet.deps.scope>provided</parquet.deps.scope>
+    <flume.deps.scope>compile</flume.deps.scope>
+    <hadoop.deps.scope>compile</hadoop.deps.scope>
+    <hive.deps.scope>compile</hive.deps.scope>
+    <parquet.deps.scope>compile</parquet.deps.scope>
   </properties>

   <dependencies>
@@ -47,54 +47,54 @@
       <groupId>org.spark-project.spark</groupId>
       <artifactId>unused</artifactId>
       <version>1.0.0</version>
-      <scope>provided</scope>
+      <scope>compile</scope>
     </dependency>
     <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-core_${scala.binary.version}</artifactId>
       <version>${project.version}</version>
-      <scope>provided</scope>
+      <scope>compile</scope>
     </dependency>
     <dependency>
       <groupId>org.apache.spark</groupId>

image

  • 运行结果
    image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant