
Question:
The following code raises the NullPointerException. Even there is Option(x._1.F2).isDefined && Option(x._2.F2).isDefined
to prevent the null values?
case class Cols (F1: String, F2: BigDecimal, F3: Int, F4: Date, ...)
def readTable() : DataSet[Cols] = {
import sqlContext.sparkSession.implicits._
sqlContext.read.format("jdbc").options(Map(
"driver" -> "com.microsoft.sqlserver.jdbc.SQLServerDriver",
"url" -> jdbcSqlConn,
"dbtable" -> s"..."
)).load()
.select("F1", "F2", "F3", "F4")
.as[Cols]
}
import org.apache.spark.sql.{functions => func}
val j = readTable().joinWith(readTable(), func.lit(true))
readTable().filter(x =>
(if (Option(x._1.F2).isDefined && Option(x._2.F2).isDefined
&& (x._1.F2- x._2.F2< 1)) 1 else 0) //line 51
+ ..... > 100)
I tried !(x._1.F2== null || x._2.F2== null)
and it still gets the exception.
The exception is
java.lang.NullPointerException at scala.math.BigDecimal.$minus(BigDecimal.scala:563) at MappingPoint$$anonfun$compare$1.apply(MappingPoint.scala:51) at MappingPoint$$anonfun$compare$1.apply(MappingPoint.scala:44) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)<strong>Update:</strong>
I tried the following expression and the execution still hit the line x._1.F2- x._2.F2
. Is it a way to check if BigDecimal is null?
(if (!(Option(x._1.F2).isDefined && Option(x._2.F2).isDefined
&& x._1.F2!= null && x._2.F2!= null)) 0
else (if (x._1.F2- x._2.F2< 1) 1 else 0))
Update 2<strong><em>The exception is gone after I wrapped the minus into (math.abs((l.F2 - r.F2).toDouble)
.</em></strong>
Why?
Try adding this this to your if
statement:
&& (x._1.F2 && x._2.F2) != null
I've had a similar issue in Java and that's what has worked for me.
Answer2:Looking at the source code for BigDecimal, on line 563: <a href="https://github.com/scala/scala/blob/v2.11.8/src/library/scala/math/BigDecimal.scala#L563" rel="nofollow">https://github.com/scala/scala/blob/v2.11.8/src/library/scala/math/BigDecimal.scala#L563</a>
It may be possible that x._1.F2.bigDecimal
or x._2.F2.bigDecimal
is null
, though I'm not really sure how that would happen, given the constructor checks for that. But maybe check for null
there and see if that solves the problem?
BTW, you should really avoid all the ._1
, ._2
s... You should be able to do something like:
val (l: Cols, r: Cols) = x
To extract the tuple values.