Summary tables for R survival objects with stargazer or xtable

Last year, I was hired to do some statistical analysis for a group of surgeons. As research tends to go, our initial paper submission was rejected and several revisions were suggested. A particularly annoying problem came up, which was how to create nice tables from survival objects - summaries, not regression tables. The survival package overloads summary() rendering xtable and stargazer unable to build a beautiful table from the custom summary function.

I can illustrate the effect here using data from IDRE:

library(survival)
hmohiv<-read.table(
   "http://www.ats.ucla.edu/stat/r/examples/asa/hmohiv.csv", 
   sep=",",
   header = TRUE)
attach(hmohiv)
hmo.surv <- survfit(Surv(time, censor)~ 1)

Calling summary on the survival fit will print out a good deal of information for each time point recorded in your data.

normsum <- summary(hmo.surv)
print(normsum)
## Call: survfit(formula = Surv(time, censor) ~ 1)
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     1    100      15   0.8500  0.0357       0.7828        0.923
##     2     83       5   0.7988  0.0402       0.7237        0.882
##     3     73      10   0.6894  0.0473       0.6026        0.789
##     4     61       4   0.6442  0.0493       0.5544        0.748
##     5     56       7   0.5636  0.0517       0.4709        0.675
##     6     49       2   0.5406  0.0521       0.4476        0.653
##     7     46       6   0.4701  0.0526       0.3775        0.586
##     8     39       4   0.4219  0.0525       0.3306        0.538
##     9     35       3   0.3857  0.0520       0.2962        0.502
##    10     32       3   0.3496  0.0511       0.2625        0.466
##    11     28       3   0.3121  0.0500       0.2280        0.427
##    12     25       2   0.2872  0.0490       0.2055        0.401
##    13     21       1   0.2735  0.0486       0.1931        0.387
##    14     20       1   0.2598  0.0480       0.1809        0.373
##    15     19       2   0.2325  0.0467       0.1568        0.345
##    22     16       1   0.2179  0.0460       0.1441        0.330
##    30     14       1   0.2024  0.0453       0.1305        0.314
##    31     13       1   0.1868  0.0444       0.1173        0.298
##    32     12       1   0.1712  0.0433       0.1043        0.281
##    34     11       1   0.1557  0.0421       0.0916        0.264
##    35     10       1   0.1401  0.0407       0.0793        0.247
##    36      9       1   0.1245  0.0390       0.0674        0.230
##    43      8       1   0.1090  0.0371       0.0559        0.212
##    53      7       1   0.0934  0.0349       0.0449        0.194
##    54      6       1   0.0778  0.0324       0.0344        0.176
##    57      4       1   0.0584  0.0296       0.0216        0.157
##    58      3       1   0.0389  0.0253       0.0109        0.139

The survival summary takes additional options, namely a vector specifying the times at which you want to summarize your set.

survsum <- summary(hmo.surv, times = c(1,10,25,58))
print(survsum)
## Call: survfit(formula = Surv(time, censor) ~ 1)
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     1    100      15   0.8500  0.0357       0.7828        0.923
##    10     32      44   0.3496  0.0511       0.2625        0.466
##    25     14      10   0.2179  0.0460       0.1441        0.330
##    58      3      11   0.0389  0.0253       0.0109        0.139

Stargazer doesn’t like it (and neither does xtable - it won’t even let me compile the Rmd document).

stargazer(survsum)
## 
## % Error: Unrecognized object type.

So, I made a convenience function that turns your survfit object into a data frame that can be passed to a table generator. Feel free to modify it to create the columns that you need.

#' Create a nice data frame from a survival model for automatically generating a table. 
#'
#' @param survobj \code{\link[survival]{survfit}} object
#' @param timevec a vector of times for survival summary
#' @param scale a number that scales the timevec, e.g., if your data is recorded in days and you want months displayed, use scale = 30
#' @return a data frame with labeled columns 
#' @author Aaron Gonzales
#' \url{http://binaryaaron.github.io/}
#' @export

make_surv_table <- function(survobj, 
                            timevec,
                            scale = 1,
                            col_names = NULL
                            ){
   # vector of times for summary points, defaults to six month intervals
   if(is.null(timevec)){
      print('timevec invalid; defaulting to six-month intervals by days')
      timevec <- c(0, 180, 360, 540, 720)
   }
   # here, scale converts the numbers into months
   survsum <- summary(survobj, timevec, scale)
   # make a data frame out of the summary object
   df <- data.frame(survsum$time, 
              survsum$surv, 
              1 - survsum$surv, 
              survsum$std.err, 
              cumsum(survsum$n.event),
              survsum$n.risk 
              )
   # give that data frame names
   if(is.null(col_names)){
      colnames(df) <- c('Time',
                        'Survival',
                        'Deaths',
                        'Survival SE', 
                        'Cummulative Events',
                        'Number remaining')
   } else{
      colnames(df) <- col_names
   }
   # return the frame
   return(df)
}

And here it is in action:

summary_days <- c(0, 10, 25, 58)

stargazer(make_surv_table(hmo.surv, summary_days), 
         summary = FALSE,
         type = c('html'),
         title = 'Summary of HMOHIV survival fit',
         rownames = FALSE)
Summary of HMOHIV survival fit
Time Survival Deaths Survival SE Cummulative Events Number remaining
0 1 0 0 0 100
10 0.350 0.650 0.051 59 32
25 0.218 0.782 0.046 69 14
58 0.039 0.961 0.025 80 3

The HTML format leaves a hair to be desired, but they look great in latex. Obviously, change the type = 'html' to latex or text as needed.

Hopefully someone else can make use of this as well.