Open
Description
I'm trying to implement variant normalization function. I'm calling it within a dataframe like this:
.withColumn('normalizationResult',
F.when((F.length(F.col('ss_other_allele')) > 1) & (
(F.length(F.col('trim_ref')) > 0) |
(F.length(F.col('trim_alt')) > 0)
),
glow.normalize_variant("contigName", "start", "end", "referenceAllele", "alternateAlleles", ref_path)
).otherwise(None)
)
I'm preparing "contigName", "start", "end", "referenceAllele", "alternateAlleles"
field before the call, and I've checked there is no any NULL values in any of the fields.
During Spark action call I'm getting this error:
23/10/12 00:33:15 ERROR TaskContextImpl: Error in TaskCompletionListener
java.lang.NullPointerException: null
at io.projectglow.sql.expressions.NormalizeVariantExpr$.$anonfun$doVariantNormalization$1(NormalizeVariantExpr.scala:55) ~[io.projectglow_glow-spark3_2.12-1.2.1.jar:1.2.1]
at io.projectglow.sql.expressions.NormalizeVariantExpr$.$anonfun$doVariantNormalization$1$adapted(NormalizeVariantExpr.scala:54) ~[io.projectglow_glow-spark3_2.12-1.2.1.jar:1.2.1]
at org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:132) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContextImpl.$anonfun$invokeTaskCompletionListeners$1(TaskContextImpl.scala:144) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContextImpl.$anonfun$invokeTaskCompletionListeners$1$adapted(TaskContextImpl.scala:144) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:199) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContextImpl.invokeTaskCompletionListeners(TaskContextImpl.scala:144) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:137) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:180) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.scheduler.Task.run(Task.scala:141) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_382]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_382]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_382]
23/10/12 00:33:15 ERROR Executor: Exception in task 3.0 in stage 14.0 (TID 88)
org.apache.spark.util.TaskCompletionListenerException: null
at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:254) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContextImpl.invokeTaskCompletionListeners(TaskContextImpl.scala:144) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:137) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:180) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.scheduler.Task.run(Task.scala:141) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_382]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_382]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_382]
Suppressed: java.lang.NullPointerException
at io.projectglow.sql.expressions.NormalizeVariantExpr$.$anonfun$doVariantNormalization$1(NormalizeVariantExpr.scala:55) ~[io.projectglow_glow-spark3_2.12-1.2.1.jar:1.2.1]
at io.projectglow.sql.expressions.NormalizeVariantExpr$.$anonfun$doVariantNormalization$1$adapted(NormalizeVariantExpr.scala:54) ~[io.projectglow_glow-spark3_2.12-1.2.1.jar:1.2.1]
at org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:132) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContextImpl.$anonfun$invokeTaskCompletionListeners$1(TaskContextImpl.scala:144) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContextImpl.$anonfun$invokeTaskCompletionListeners$1$adapted(TaskContextImpl.scala:144) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:199) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContextImpl.invokeTaskCompletionListeners(TaskContextImpl.scala:144) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:137) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:180) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.scheduler.Task.run(Task.scala:141) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_382]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_382]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_382]
I've tried to run just this part of dataframe from pyspark session manually, there were no any errors. But when I run whole pipeline with all joins it's failing just on this step for multiple containers. Here you can see the executor stats:
I'm running this on Spark 3.4.1 with 6G executors and 3G driver.
It looks like Glow cannot find a listener for a specific task.
Can you help me with this, please?
Metadata
Metadata
Assignees
Labels
No labels