Zarathu
Default | Repeated Measure | Survey | |
---|---|---|---|
Continuous | Linear regression | GEE | Survey GLM |
Event | GLM (logistic) | GEE | Survey GLM |
Time & Event | Cox | Marginal Cox | Survey Cox |
0,1,2,3 (rare event) | GLM (Poisson) | GEE | Survey GLM |
Continuous
\[Y = \beta_0 + \beta_1 X + \epsilon\] Estimate \(\beta_0\), \(\beta_1\) by minimizing sum of squared errors
\(Y\) must be continuous & normally distributed
\(X\) can be continuous or categorical
If continuous: same as correlation analysis
If binary: same as t-test with equal variance
Use the Bivariate Correlation function to test linear association between two continuous variables.
There is a weak negative, yet statistically significant, correlation between age and nodes.
This regression estimates how age affects nodes (number of affected lymph nodes).
Go to Analyze → Regression → Linear…
Set nodes as the Dependent variable and age as the Independent(s) variable.
R = .093, R² = .009: Very weak linear relationship
Unstandardized coefficient for age = -0.028, p < .001
As age increases by 1 year, the number of nodes is expected to decrease by 0.028. The relationship is statistically significant but weak.
This reverses the regression direction, predicting age from nodes.
Go to Analyze → Regression → Linear…
Set age as the Dependent variable and nodes as the Independent(s) variable.
Although both models are statistically significant (p < .001), the effect size (R² ≈ 0.009) is very small. This indicates that only about 0.9% of the variance in either variable is explained by the other.
are equivalent when comparing 2 groups
To compare the variable time between male and female groups using a t-test in SPSS:
We begin with a t-test because it is a simple, standard method for comparing group means when the predictor has only two levels. It provides the same result as regression but is more intuitive in this context.
This analysis fits a linear regression model where time is predicted by sex.
Go to Analyze → Regression → Linear…
Set time as the Dependent variable and sex as the Independent(s) variable.
The regression result is consistent with the t-test:
There is no statistically significant association between sex and time.
The variable rx
includes three treatment groups:
- 1 = Lev
- 2 = Lev+5FU
- 3 = Obs
(reference)
In SPSS, linear regression does not automatically treat string variables as categorical. To include a categorical variable like rx
in a regression model, we must first convert it to numeric.
Use Automatic Recode if your original rx
is a string variable. This assigns numeric values starting from 1: 1 = Lev, 2 = Lev+5FU, 3 = Obs
If it is a numeric variable, SPSS automatically chooses the lowest numeric value as the reference category.
This will assign numeric values starting from 1.
Now that
rx_new
is numeric, SPSS will automatically create dummy variables when used in regression BUT it will use the lowest number as the reference group (e.g., 1 = Lev). To change the reference group (e.g., set 3 = Obs), go to Categorical… and select it manually.
Go to Transform → Compute Variable… To include a 3-group variable in regression, create two dummy variables:
- rx_lev
= 1 if rx_new == 1
, else 0
- rx_lev5fu
= 1 if rx_new == 2
, else 0
The reference group (row where all dummy variables are 0) is Obs, which is not coded explicitly.
Use Analyze → Regression → Linear
time
rx_lev
, rx_lev5fu
The constant is the mean for the Obs group. Coefficients show the difference from Obs: - rx_lev
= difference between Lev and Obs
- rx_lev5fu
= difference between Lev+5FU and Obs
Only rx_lev5fu
shows a significant difference (p < .001)
Use Analyze → Compare Means → One-Way ANOVA
to test whether any of the rx_new
groups differ overall. - Dependent = time
- Factor = rx_new
This gives an overall p-value for overall group comparison.
\[Y = \beta_0 + \beta_1 X_{1} + \beta_2 X_{2} + \cdots + \epsilon\]
it is common to present results both before and after adjustment.
Unadjusted Analyze each variable one at a time:
Go to Analyze → Regression → Linear
Set time
as Dependent
Add one predictor (e.g., sex
) to Independent(s)
Adjusted Control for multiple variables:
Set time
as Dependent
Add all predictors (e.g., sex
, age
, rx
) to Independent(s)
This allows comparison of crude vs. adjusted effects.
Used for Binary Outcomes: 0/1
\[ P(Y = 1) = \frac{\exp{(X)}}{1 + \exp{(X)}}\]
\[ \begin{aligned} P(Y = 1) &= \frac{\exp{(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots)}}{1 + \exp{(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots)}} \\\\ \ln(\frac{p}{1-p}) &= \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots \end{aligned} \]
Interpretation of \(\beta_1\): When adjusting for \(X_2\), \(X_3\), etc., a one-unit increase in \(X_1\) results in an increase of \(\beta_1\) in \(\ln\left(\frac{p}{1 - p}\right)\) (the log-odds of the outcome).
\(\frac{p}{1-p}\) increases by a factor of \(\exp(\beta_1)\). In other words, the odds ratio = \(\exp(\beta_1)\).
Go to Analyze → Regression → Binary Logistic...
Move status
to the Dependent box, sex
, age
, and rx_new
into the Covariates box.
Click Categorical…
Select rx_new
, move to Categorical Covariates
Set Reference Category to Last (e.g., Obs
)
Exp(B) gives odds ratios
Time & Event
Most data are right-censored: the individual either died on day XX or survived up to day XX.
In survival analysis, Table 1 typically presents baseline characteristics.
Go to Analyze → Survival → Kaplan-Meier…
Set time
as Time, status(1)
as Status, rx_new
as Factor
Click Options → Check:
Click Compare Factor… → Check:
Click OK to see:
Sort by time in ascending order
\[ \begin{aligned} P(t) &= \frac{\text{Survived at } t}{\text{At risk at } t} \quad \text{(Interval survival)} \\\\ S(t) &= S(t-1) \times P(t) \end{aligned} \]
Compute the expected number of events for each interval, then combine them for a chi-squared test.
Is it combining the results across intervals?
Hazard function: \(h(t)\)
Cox model: evaluates the Hazard Ratio (HR)
\[ \begin{aligned} h(t) &= \exp({\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots}) \\\\ &= h_0(t) \exp({\beta_1 X_1 + \beta_2 X_2 + \cdots}) \end{aligned} \] When $X_1$ increases by 1, $h(t)$ increases by a factor of $(_1)$. In other words:
\[\text{HR} = \exp{(\beta_1)}\]
Like Kaplan-Meier, statistics are calculated by intervals.
Time-independent HR: time is captured only by \(h_0(t)\).
Model is simple: HR remains constant over time
Time-dependent Cox models are also possible
\(h_0(t)\) is not estimated, which simplifies computation
This is why Cox is called a semi-parametric method
But it’s a limitation when building prediction models — you need to estimate \(h_0(t)\) separately
time
, Status = status
Move sex
, age
, rx_new
to Covariates
Click Categorical, add rx_new
, set Reference Category to Last, click Change
Exp(B)
= hazard ratio
Proportional Hazards Assumption
Assumes a consistent trend: survival curves should not cross - No formal test for the assumption is strictly required — it can be checked visually.
Call:
coxph(formula = Surv(tstart, time, status) ~ trt + prior + karno:strata(tgroup),
data = vet2)
n= 225, number of events= 128
coef exp(coef) se(coef) z Pr(>|z|)
trt -0.011025 0.989035 0.189062 -0.058 0.953
prior -0.006107 0.993912 0.020355 -0.300 0.764
karno:strata(tgroup)tgroup=1 -0.048755 0.952414 0.006222 -7.836 4.64e-15 ***
karno:strata(tgroup)tgroup=2 0.008050 1.008083 0.012823 0.628 0.530
karno:strata(tgroup)tgroup=3 -0.008349 0.991686 0.014620 -0.571 0.568
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
trt 0.9890 1.011 0.6828 1.4327
prior 0.9939 1.006 0.9550 1.0344
karno:strata(tgroup)tgroup=1 0.9524 1.050 0.9409 0.9641
karno:strata(tgroup)tgroup=2 1.0081 0.992 0.9831 1.0337
karno:strata(tgroup)tgroup=3 0.9917 1.008 0.9637 1.0205
Concordance= 0.725 (se = 0.024 )
Likelihood ratio test= 63.04 on 5 df, p=3e-12
Wald test = 63.7 on 5 df, p=2e-12
Score (logrank) test = 71.33 on 5 df, p=5e-14
In survival analysis, all covariates should be measured before the index date - e.g., F/U lab, medication
To handle time-dependent covariates, a Cox model that accounts for this is required
Dafault | Repeated measure | Survey | |
---|---|---|---|
Continuous | linear regression | GEE | Survey GLM |
Event | GLM (logistic) | GEE | Survey GLM |
Time & Event | Cox | marginal Cox | Survey Cox |
0,1,2,3 (rare event) | GLM (poisson) | GEE | Survey GLM |