PMML model export - spark.mllib

spark.mllib supported models

spark.mllib supports model export to Predictive Model Markup Language (PMML).

The table below outlines the spark.mllib models that can be exported to PMML and their equivalent PMML model.

`spark.mllib` modelPMML model
KMeansModelClusteringModel
LinearRegressionModelRegressionModel (functionName="regression")
RidgeRegressionModelRegressionModel (functionName="regression")
LassoModelRegressionModel (functionName="regression")
SVMModelRegressionModel (functionName="classification" normalizationMethod="none")
Binary LogisticRegressionModelRegressionModel (functionName="classification" normalizationMethod="logit")

Examples

To export a supported model (see table above) to PMML, simply call model.toPMML.

Refer to the KMeans Scala docs and Vectors Scala docs for details on the API.

Here a complete example of building a KMeansModel and print it out in PMML format:

import org.apache.spark.mllib.clustering.KMeans
import org.apache.spark.mllib.linalg.Vectors

// Load and parse the data
val data = sc.textFile("data/mllib/kmeans_data.txt")
val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache()

// Cluster the data into two classes using KMeans
val numClusters = 2
val numIterations = 20
val clusters = KMeans.train(parsedData, numClusters, numIterations)

// Export to PMML
println("PMML Model:\n" + clusters.toPMML)

As well as exporting the PMML model to a String (model.toPMML as in the example above), you can export the PMML model to other formats:

// Export the model to a String in PMML format
clusters.toPMML

// Export the model to a local file in PMML format
clusters.toPMML("/tmp/kmeans.xml")

// Export the model to a directory on a distributed file system in PMML format
clusters.toPMML(sc,"/tmp/kmeans")

// Export the model to the OutputStream in PMML format
clusters.toPMML(System.out)

For unsupported models, either you will not find a .toPMML method or an IllegalArgumentException will be thrown.