Cox 比例ハザードモデルの overall goodness-of-fit

May & Hosmer (1998) の方法

ロジスティック回帰モデルの Hosmer–Lemeshow 検定に近い発想の goodness-of-fit test です。データを $G$ 個のグループに分割し、各グループ内で観測値と期待値との乖離を検定します。ただし、ロジスティック回帰モデルのように全体の要約はできません。そのため、全体の要約を行う場合は、グループ変数を共変量として追加したモデルの部分尤度比検定などを行う必要があります。また、May & Hosmer (2004) シミュレーションから、経験的に良いグループ数

\[ G = [\max\{2, \min\{10, (\text{No. of events})/40\}\}], \]

が与えられています ([] はガウス記号)。

例として、SAS Help の Example 64.3 Modeling with Categorical Predictors のデータ VALung を使って、May & Hosmer (1998) の方法と全体を要約した部分尤度比検定を行ってみます。

ページ下に作成したマクロ (%mhoverallfit) を記載しました。このマクロを読み込んで、変数名などを適切に指定して実行すると、各種検定結果が出力されます。


  %mhoverallfit(
    out         = ofit,
    data        = VALung,
    class       = Prior(ref='no') Cell(ref='large') Therapy(ref='standard'),
    time        = Time,
    censor      = Status,
    censorvalue = 0,
    covariates  = Kps Duration Age Cell Prior|Therapy
  );

出力です。


  group   cutPoint   count   observed   expected       z         p       chisq    df    pchisq

    1     -2.58523     46       40       41.5125   -0.23475   0.81440    .         .    .
    2     -1.75900     46       44       45.5327   -0.22714   0.82032    .         .    .
    3     -0.06271     45       44       40.9548    0.47584   0.63419    .         .    .
    .       .           .        .         .         .         .        1.60106    2   0.44909

group = 1, 2, 3 の p が各グループに対する結果です。 pchisq が全体を要約した部分尤度比検定の結果です。

$R^2$ の類似指標

回帰分析でいう $R^2$ の類似指標も提案されています ^[4–6]。

Nagelkerke (1991)

\[ R_{N}^2=1- \left\{ \exp\left[ -\frac{2}{n}(l_{\hat{\mathbf{\beta}}}-l_0) \right ] \right \}. \]

O'Quigley, Xu, Stare (2005)

\[ R_{OXS}^2=1- \left\{ \exp\left[ -\frac{2}{m}(l_{\hat{\mathbf{\beta}}}-l_0) \right ] \right \}. \]

Royston (2006)

\[ R_{R}^2=\frac{R_{OXS}^2}{R_{OXS}^2+\frac{\pi^2}{6}(1-R_{OXS}^2)}. \]

ただし、$n$ はサンプルサイズ、$m$ はイベント数、$l_0$ は共変量なしの最大対数部分尤度、$l_{\beta}$ は共変量ありの最大対数部分尤度。

例として、上と同様のデータ VALung に対して上の指標を計算してみます。今度は、ページ下部に示したマクロ (%survR2) を使います。形式は %mhoverallfit と全く同様です。


  %survR2(
    out         = R2,
    data        = VALung,
    class       = Prior(ref='no') Cell(ref='large') Therapy(ref='standard'),
    time        = Time,
    censor      = Status,
    censorvalue = 0,
    covariates  = Kps Duration Age Cell Prior|Therapy
  );

出力です。


    chisq        Rn        Roxs        Rr

  65.6323    0.38064    0.40115    0.28939

Rn, Roxs, Rr が各指標に対応しています。

%mhoverallfit マクロ


  /***********************************
  %mhoverallfit macro

  Version: 1.0-1
  Published: 2012-12-07
  Depends: SAS/STAT 9.2 and probably later
  Author: Kengo Nagashima
  License: MIT
  ***********************************/

  %macro mhoverallfit(out, data, class, time, censor, censorvalue, covariates);
  ods results off;
  ods exclude all;
  proc phreg data = &data.;
    %if "&class." ^= "" %then class &class.;;
    model &time.*&censor.(&censorvalue.) = &covariates.;
    output out = &out._tmp xbeta = xb resmart = martres;
    ods output CensoredSummary = CS;
    ods output FitStatistics = L1;
  run;
  ods select all;
  ods results on;
  proc sort data = &out._tmp out = &out._tmp;
    by xb;
  data _null_;
    set CS;
    call symput("G", int(max(2, min(10, Event / 40))));
    call symput("N", Total);
  data &out._tmp;
    set &out._tmp;
    if _n_ <= (&N. / &G.) + 1 then group = 1;
    %do i = 2 %to &G. - 1;
    else if (&i. - 1)*(&N. / &G.) < _n_ <= &i.*(&N. / &G.) + 1 then group = &i.;
    %end;
    else group = &G.;
  data &out.;
    set &out._tmp;
    by group;
    retain cutPoint count observed expected z p 0;
    count    = count + 1;
    if &censor. ^= &censorvalue. then do;
      observed = observed + 1;
      expected = expected + (1 - martres);
    end;
    else do;
      expected = expected - martres;
    end;
    if last.group then do;
      cutPoint = xb;
      z = (observed - expected)/sqrt(expected);
      p = 2*cdf("Normal", -abs(z));
      output;
      count = 0;
      observed = 0;
      expected = 0;
    end;
    keep group cutPoint count observed expected z p;
  ods results off;
  ods exclude all;
  proc phreg data = &out._tmp;
    %if "&class." ^= "" %then class &class.;;
    class group;
    model &time.*&censor.(&censorvalue.) = group &covariates.;
    ods output FitStatistics = L2;
  run;
  ods select all;
  ods results on;
  data L2;
    merge
      L1(obs = 1 rename = (WithCovariates = L1))
      L2(obs = 1 rename = (WithCovariates = L2));
    chisq = L1 - L2;
    df = &G. - 1;
    pchisq = 1 - cdf("chisq", chisq, df);
    keep chisq df pchisq;
  data &out.;
    set &out. L2;
  proc print;
  run;
  proc datasets lib = work;
    delete CS L1 L2 &out._tmp;
  run; quit;
  %mend mhoverallfit;

%survR2 マクロ


  /***********************************
  %survR2 macro

  Version: 1.0-1
  Published: 2012-12-07
  Depends: SAS/STAT 9.2 and probably later
  Author: Kengo Nagashima
  License: MIT
  ***********************************/

  %macro survR2(out, data, class, time, censor, censorvalue, covariates);
  ods results off;
  ods exclude all;
  proc phreg data = &data.;
    %if "&class." ^= "" %then class &class.;;
    model &time.*&censor.(&censorvalue.) = &covariates.;
    ods output CensoredSummary = CS;
    ods output FitStatistics = &out.;
  run;
  ods select all;
  ods results on;
  data &out.;
    set &out.(obs = 1);
    if _n_ = 1 then set CS;
    chisq = WithoutCovariates - WithCovariates;
    Rn   = 1 - exp(-chisq / Total);
    Roxs = 1 - exp(-chisq / Event);
    Rr   = Roxs / (Roxs + constant("PI")**2/6 * (1 - Roxs));
    keep chisq Rn Roxs Rr;
  proc print; run;
  proc datasets lib = work;
    delete CS L0;
  run; quit;
  %mend survR2;

参考文献

Hosmer DW, Lemeshow S, May S. Applied survival analysis: regression modeling of time to event data, 2nd edition. New York: Wiley, 2008.
May S, Hosmer DW. A simplified method of calculating an overall goodness-of-fit test for the Cox proportional hazards model. Lifetime data analysis 1998;4(2):109–120.
May S, Hosmer DW. A cautionary note on the use of the Grønnesby and Borgan goodness-of-fit test for the Cox proportional hazards model. Lifetime data analysis 2004;10(3):283–291.
Nagelkerke NJD. A note on a general definition of the coefficient of determination. Biometrika 1991;78(3):691–692.
O'Quigley J, Xu R, Stare J. Explained randomness in proportional hazards models. Statistics in medicine 2005;24(3):479–489.
Royston P. Explained variation for survival models. The Stata Journal 2006;6(1):83–96.

履歴

2012/12/07 マクロの修正
2012/11/28 公開