Support vector methods for survival analysis: a comparison between ranking and regression approaches
Introduction
Survival studies arise in different areas. Although they are most well known in medical and in particular in cancer studies, they also occur in economics (e.g. prediction of bankruptcy of factories), in mechanics (e.g. failure of airplanes, breakdown of engines, etc.), electronics (e.g. lifetime of electrical components), social sciences (e.g. estimating the time from marriage to divorce) and many other topics. Depending on the question at study one is interested in risk groups (which group of patients/components is more likely to experience the event?) or time predictions (before which time should the engine be replaced to decrease the risk of failure?).
The survival literature describes different models to answer these questions. Many common methods including the proportional hazard model (cox model) and log-odds model are transformation models (tm) [1], [2], [3], [4], [5], [6]. This type of models assemble a prognostic index based on the covariates and link this index to the observed event times by means of a monotonic transformation function in a second step. tms for survival analysis mainly focus on the first step. The standard cox model [7] for example avoids the second step by assuming that the hazard (the instantaneous risk to observe the event now, knowing that the event did not occur before) is proportional to an unspecified baseline hazard. Other models assume a fixed transformation function h. The accelerated failure time model is one example which assumes that the transformation function h(y), with y the outcome under study, equals the logarithmic function, and the proportional odds model takes h(y) = logit(y) [6].
Survival models based on support vector machines (svm) [8] are able to incorporate non-linearities in an automatic way and using non-additive kernels, interactions are automatically incorporated. These methods use an approach which is different from the standard statistical approach. svm-based models do not assume a true underlying function for which the parameters need to be estimated. Instead the empirical risk of misranking two instances with regard to their failure time, is minimized [9]. The survival problem was therefore reformulated as a ranking problem. To reduce the computational load, a simplified version comparing each observation only with its closest neighbor instead of with all other observations, was proposed in [10]. A more theoretical framework was provided in [8]. We will refer to the survival model proposed in the latter work as model 1. In this work, we ask ourselves whether the inclusion of regression constraints can improve the performance. Therefore, the performance of model 1 is compared with that of model 2, including ranking and regression constraints. The proposed model is compared with survival methods only including ranking constraints (see [9], [10], [11]) and only including regression constraints (see [12], [13]). Table 1 gives an overview of the different models handled in this work, their constraints, the number of tuning parameters in case of a linear kernel and how the ranking constraints are defined.
This paper is organized as follows. Section 2 gives an overview of transformation models in survival analysis. Section 3 starts with a summary of existing svm-based survival methods, followed by the introduction of a new model, proposed by the authors. Section 4 compares the different svm-based survival models on 8 different datasets. In addition to the methods mentioned before, the experiments include the performance of the cox model for comparison.
The following notations are used throughout the text. denotes the set of observations , where xi is a d-dimensional covariate vector, yi is the corresponding survival time and δi denotes whether an event was observed (δi = 1) or the observation was right censored (δi=0). For notational convenience, it is assumed that the observations in are sorted such that for two observations {(xi, yi, δi), (xj, yj, δj)} with j < i, it applies that yj < yi.
Section snippets
Transformation models
A tm models a possibly unknown transformation of the outcome instead of the outcome itself as a function of the covariates. Initially, tms were introduced in regression problems where the normality assumption on the distribution of the errors and the constant variance were not satisfied. A standard regression model for example tries to model the outcome y as a linear combination of the covariates:where w is a coefficient vector and ϵ is the error variable. In cases where y is not
Kernel-based survival models
This section starts with a brief discussion of existing survival models based on svms. In a second subsection, a new method is proposed. Since the outcome of this type of survival models can, in general, not be interpreted as a failure time, we will denote the outcome of the model as the prognostic index u(x) instead of the prediction of the model. For the cox model this corresponds to .
Experiments
This section compares the performances of the discussed methods on 5 clinical data sets and 3 high dimensional data sets. A description of the data and the different performance measures is given first. Next the results on real data and on artificial data are discussed.
Conclusions
This work compared different methods for survival analysis based on support vector machines. Three different approaches were discussed: (i) the ranking approach, (ii) the regression approach and (iii) the combined approach. On a theoretical basis, the first and third methods are preferred since they can be linked with well known statistical models for survival analysis. However, the experiments revealed that the ranking approach performs significantly less than both other approaches.
Acknowledgments
This research is supported by Research Council KUL: GOA AMBioRICS, GOA MANET, CoE EF/05/006, IDO 05/010, IOF KP06/11, IOF SCORES4CHEM, several PhD, postdoc and fellow grants; Flemish Government: FWO: PhD and postdoc grants, IBBT, G.0407.02, G.0360.05, G.0519.06, G.0321.06, G.0341.07 and projects G.0452.04, G.0499.04, G.0211.05, G.0226.06, G.0302.07; IWT: PhD Grants, McKnow-E, Eureka-Flite; Belgian Federal Science Policy Office: IUAP P6/04; EU: FP6-2002 LIFESCIHEALTH 503094, IST 2004-27214,
References (29)
Likelihood methods and nonparametric tests
Journal of the American Statistical Association
(1978)- et al.
Partial likelihood in transformation models with censored data
Scandinavian Journal of Statistics
(1988) - et al.
On a correspondence between models in binary regression and survival analysis
International Statistical Review
(1990) - et al.
Analysis of transformation models with censored data
Biometrika
(1995) - et al.
Predicting survival probabilities with semiparametric transformation models
Journal of the American Statistical Association
(1997) - et al.
The statistical analysis of failure time data
(2002) Regression models and life-tables (with discussion)
Journal of the Royal Statistical Society, Series B
(1972)- et al.
Learning transformation models for ranking and survival analysis
Journal of Machine Learning Research
(2011) - et al.
Support vector machines for survival analysis
- et al.
Survival SVM: a practical scalable algorithm
Sparse kernel methods for high-dimensional survival data
Bioinformatics
A support vector approach to censored targets
Support vector regression for censored data (SVRc): a novel tool for survival analysis
Survival and event history analysis
Wiley reference series in biostatistics. Chapter: Parametric models in survival analysis
Cited by (132)
Combined learning models for survival analysis of patients with pulmonary hypertension
2024, Intelligent Systems with ApplicationsThe Concordance Index decomposition: A measure for a deeper understanding of survival prediction models
2024, Artificial Intelligence in MedicineLeast squares support vector regression for complex censored data
2023, Artificial Intelligence in MedicineAn uncertainty-based interpretable deep learning framework for predicting breast cancer outcome
2024, BMC Bioinformatics