public class AFTAggregator
extends Object
implements scala.Serializable
The loss function and likelihood function under the AFT model based on: Lawless, J. F., Statistical Models and Methods for Lifetime Data, New York: John Wiley & Sons, Inc. 2003.
Two AFTAggregator can be merged together to have a summary of loss and gradient of the corresponding joint dataset.
Given the values of the covariates x^{'}, for random lifetime t_{i} of subjects i = 1, ..., n, with possible right-censoring, the likelihood function under the AFT model is given as
L(\beta,\sigma)=\prod_{i=1}^n[\frac{1}{\sigma}f_{0}
(\frac{\log{t_{i}}-x^{'}\beta}{\sigma})]^{\delta_{i}}S_{0}
(\frac{\log{t_{i}}-x^{'}\beta}{\sigma})^{1-\delta_{i}}
Where \delta_{i} is the indicator of the event has occurred i.e. uncensored or not.
Using \epsilon_{i}=\frac{\log{t_{i}}-x^{'}\beta}{\sigma}, the log-likelihood function
assumes the form
\iota(\beta,\sigma)=\sum_{i=1}^{n}[-\delta_{i}\log\sigma+
\delta_{i}\log{f_{0}}(\epsilon_{i})+(1-\delta_{i})\log{S_{0}(\epsilon_{i})}]
Where S_{0}(\epsilon_{i}) is the baseline survivor function,
and f_{0}(\epsilon_{i}) is corresponding density function.
The most commonly used log-linear survival regression method is based on the Weibull distribution of the survival time. The Weibull distribution for lifetime corresponding to extreme value distribution for log of the lifetime, and the S_{0}(\epsilon) function is
S_{0}(\epsilon_{i})=\exp(-e^{\epsilon_{i}})
the f_{0}(\epsilon_{i}) function is
f_{0}(\epsilon_{i})=e^{\epsilon_{i}}\exp(-e^{\epsilon_{i}})
The log-likelihood function for Weibull distribution of lifetime is
\iota(\beta,\sigma)=
-\sum_{i=1}^n[\delta_{i}\log\sigma-\delta_{i}\epsilon_{i}+e^{\epsilon_{i}}]
Due to minimizing the negative log-likelihood equivalent to maximum a posteriori probability,
the loss function we use to optimize is -\iota(\beta,\sigma).
The gradient functions for \beta and \log\sigma respectively are
\frac{\partial (-\iota)}{\partial \beta}=
\sum_{1=1}^{n}[\delta_{i}-e^{\epsilon_{i}}]\frac{x_{i}}{\sigma}
\frac{\partial (-\iota)}{\partial (\log\sigma)}=
\sum_{i=1}^{n}[\delta_{i}+(\delta_{i}-e^{\epsilon_{i}})\epsilon_{i}]
param: parameters including three part: The log of scale parameter, the intercept and
regression coefficients corresponding to the features.
param: fitIntercept Whether to fit an intercept term.
param: featuresStd The standard deviation values of the features.Constructor and Description |
---|
AFTAggregator(breeze.linalg.DenseVector<Object> parameters,
boolean fitIntercept,
double[] featuresStd) |
Modifier and Type | Method and Description |
---|---|
AFTAggregator |
add(org.apache.spark.ml.regression.AFTPoint data)
Add a new training data to this AFTAggregator, and update the loss and gradient
of the objective function.
|
long |
count() |
breeze.linalg.DenseVector<Object> |
gradient() |
double |
loss() |
AFTAggregator |
merge(AFTAggregator other)
Merge another AFTAggregator, and update the loss and gradient
of the objective function.
|
public AFTAggregator(breeze.linalg.DenseVector<Object> parameters, boolean fitIntercept, double[] featuresStd)
public long count()
public double loss()
public breeze.linalg.DenseVector<Object> gradient()
public AFTAggregator add(org.apache.spark.ml.regression.AFTPoint data)
data
- The AFTPoint representation for one data point to be added into this aggregator.public AFTAggregator merge(AFTAggregator other)
this
object will be modified.)
other
- The other AFTAggregator to be merged.