How can I access computed metrics for each fold in a CrossValidatorModel


How can I get the computed metrics for each fold from a CrossValidatorModel in spark.ml? I know I can get the average metrics using model.avgMetrics but is it possible to get the raw results on each fold to look at eg. the variance of the results?

I am using Spark 2.0.0.


Studying the <a href="https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala" rel="nofollow">spark code here</a>

For the folds, you can do the iteration yourself like this:

val splits = MLUtils.kFold(dataset.toDF.rdd, $(numFolds), $(seed)) //K-folding operation starting //for each fold you have multiple models created cfm. the paramgrid splits.zipWithIndex.foreach { case ((training, validation), splitIndex) => val trainingDataset = sparkSession.createDataFrame(training, schema).cache() val validationDataset = sparkSession.createDataFrame(validation, schema).cache() val models = est.fit(trainingDataset, epm).asInstanceOf[Seq[Model[_]]] trainingDataset.unpersist() var i = 0 while (i < numModels) { val metric = eval.evaluate(models(i).transform(validationDataset, epm(i))) logDebug(s"Got metric $metric for model trained with ${epm(i)}.") metrics(i) += metric i += 1 }

This is in scala, but the ideas are very clearly outlined.

Take a look at <a href="https://stackoverflow.com/questions/38874546/spark-crossvalidatormodel-access-other-models-than-the-bestmodel" rel="nofollow">this answer</a> that outlines results per fold. Hope this helps.


