org.apache.spark.ml.regression

Class AFTAggregator

• Object
• org.apache.spark.ml.regression.AFTAggregator
• All Implemented Interfaces:
java.io.Serializable

public class AFTAggregator
extends Object
implements scala.Serializable
AFTAggregator computes the gradient and loss for a AFT loss function, as used in AFT survival regression for samples in sparse or dense vector in an online fashion.

The loss function and likelihood function under the AFT model based on: Lawless, J. F., Statistical Models and Methods for Lifetime Data, New York: John Wiley & Sons, Inc. 2003.

Two AFTAggregator can be merged together to have a summary of loss and gradient of the corresponding joint dataset.

Given the values of the covariates x^{'}, for random lifetime t_{i} of subjects i = 1, ..., n, with possible right-censoring, the likelihood function under the AFT model is given as


L(\beta,\sigma)=\prod_{i=1}^n[\frac{1}{\sigma}f_{0}
(\frac{\log{t_{i}}-x^{'}\beta}{\sigma})]^{\delta_{i}}S_{0}
(\frac{\log{t_{i}}-x^{'}\beta}{\sigma})^{1-\delta_{i}}

Where \delta_{i} is the indicator of the event has occurred i.e. uncensored or not. Using \epsilon_{i}=\frac{\log{t_{i}}-x^{'}\beta}{\sigma}, the log-likelihood function assumes the form

\iota(\beta,\sigma)=\sum_{i=1}^{n}[-\delta_{i}\log\sigma+
\delta_{i}\log{f_{0}}(\epsilon_{i})+(1-\delta_{i})\log{S_{0}(\epsilon_{i})}]

Where S_{0}(\epsilon_{i}) is the baseline survivor function, and f_{0}(\epsilon_{i}) is corresponding density function.

The most commonly used log-linear survival regression method is based on the Weibull distribution of the survival time. The Weibull distribution for lifetime corresponding to extreme value distribution for log of the lifetime, and the S_{0}(\epsilon) function is


S_{0}(\epsilon_{i})=\exp(-e^{\epsilon_{i}})

the f_{0}(\epsilon_{i}) function is

f_{0}(\epsilon_{i})=e^{\epsilon_{i}}\exp(-e^{\epsilon_{i}})

The log-likelihood function for Weibull distribution of lifetime is

\iota(\beta,\sigma)=
-\sum_{i=1}^n[\delta_{i}\log\sigma-\delta_{i}\epsilon_{i}+e^{\epsilon_{i}}]

Due to minimizing the negative log-likelihood equivalent to maximum a posteriori probability, the loss function we use to optimize is -\iota(\beta,\sigma). The gradient functions for \beta and \log\sigma respectively are

\frac{\partial (-\iota)}{\partial \beta}=
\sum_{1=1}^{n}[\delta_{i}-e^{\epsilon_{i}}]\frac{x_{i}}{\sigma}


\frac{\partial (-\iota)}{\partial (\log\sigma)}=
\sum_{i=1}^{n}[\delta_{i}+(\delta_{i}-e^{\epsilon_{i}})\epsilon_{i}]

param: parameters including three part: The log of scale parameter, the intercept and regression coefficients corresponding to the features. param: fitIntercept Whether to fit an intercept term. param: featuresStd The standard deviation values of the features.
Serialized Form
• Constructor Summary

Constructors
Constructor and Description
AFTAggregator(breeze.linalg.DenseVector<Object> parameters, boolean fitIntercept, double[] featuresStd)
• Method Summary

Methods
Modifier and Type Method and Description
AFTAggregator add(org.apache.spark.ml.regression.AFTPoint data)
Add a new training data to this AFTAggregator, and update the loss and gradient of the objective function.
long count()
breeze.linalg.DenseVector<Object> gradient()
double loss()
AFTAggregator merge(AFTAggregator other)
Merge another AFTAggregator, and update the loss and gradient of the objective function.
• Methods inherited from class Object

equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
• Constructor Detail

• AFTAggregator

public AFTAggregator(breeze.linalg.DenseVector<Object> parameters,
boolean fitIntercept,
double[] featuresStd)
• Method Detail

• count

public long count()
• loss

public double loss()

public breeze.linalg.DenseVector<Object> gradient()

public AFTAggregator add(org.apache.spark.ml.regression.AFTPoint data)
Add a new training data to this AFTAggregator, and update the loss and gradient of the objective function.

Parameters:
data - The AFTPoint representation for one data point to be added into this aggregator.
Returns:
This AFTAggregator object.
• merge

public AFTAggregator merge(AFTAggregator other)
Merge another AFTAggregator, and update the loss and gradient of the objective function. (Note that it's in place merging; as a result, this object will be modified.)

Parameters:
other - The other AFTAggregator to be merged.
Returns:
This AFTAggregator object.