MLlib: RDD-based API

This page documents sections of the MLlib guide for the RDD-based API (the spark.mllib package). Please see the MLlib Main Guide for the DataFrame-based API (the spark.ml package), which is now the primary API for MLlib.

Data types
Basic statistics
Classification and regression
Collaborative filtering
- alternating least squares (ALS)
Clustering
Dimensionality reduction
- singular value decomposition (SVD)
- principal component analysis (PCA)
Feature extraction and transformation
Frequent pattern mining
Evaluation metrics
PMML model export
Optimization (developer)
- stochastic gradient descent
- limited-memory BFGS (L-BFGS)

MLlib: Main Guide

MLlib: RDD-based API Guide

MLlib: RDD-based API