vignettes/time_to_event_mediator.Rmd
time_to_event_mediator.RmdWe follow the framework of Arce Domingo-Relloso et al.
(2024), who extend the mediational g-formula to accommodate longitudinal
mediators and competing events in a single causal framework. In studies
where both the mediator and competing risks evolve over time,
traditional mediation estimators may fail to correctly identify
path-specific effects. The approach implemented in
cmest_pathcomprisk overcomes this challenge by modeling the
structure—exposure, mediator history, competing event history, and
survival process—and then evaluating the interventions through the
g-formula.
This example illustrates how to use pathcomprisk to
estimate path-specific causal effects in settings where:
The method follows the framework of Arce Domingo-Relloso et
al. (2024), which extends the mediational g-formula to accommodate
longitudinal mediators and competing events simultaneously. When both
the mediator and the competing event process evolve over time, standard
mediation estimators may not correctly separate direct and indirect
pathways. The approach implemented in cmest_pathcomprisk
overcomes this limitation by modeling the full joint longitudinal
structure—exposure, mediator history, competing event history, and
survival process—and applying the g-formula to evaluate counterfactual
interventions.
Under this framework, the total effect of an exposure on a time-to-event outcome is decomposed into four interpretable components.
Direct effect (DE)
The portion of the exposure effect not operating through either the
mediator trajectory or the competing event process. This effect is
defined by modifying the exposure while holding both mediator and
competing event histories at levels that would occur under a reference
exposure value.
Indirect effect through the mediator (IEM)
The part of the effect transmitted through the entire history of the
time-varying mediator, while fixing the competing event process to its
reference distribution. This represents how changes in the mediator
pathway contribute to the exposure–outcome association.
Indirect effect through the competing event (IED)
The portion of the effect transmitted through the competing-risk
process. Because a competing event can alter the risk set for the
primary outcome, this pathway measures how differences in the occurrence
or timing of the competing event contribute to the exposure
effect.
Total effect (TE)
The overall effect of the exposure on the outcome when the mediator
pathway, the competing event pathway, and any direct mechanisms are all
allowed to vary naturally.
Under the identification assumptions presented in Arce
Domingo-Relloso et al. (2024), each path-specific effect can be
expressed using the longitudinal mediational g-formula. The
cmest_pathcomprisk function implements these estimands by
specifying regression models for the exposure, mediator, competing event
process, and time-to-event outcome; generating counterfactual
trajectories under fixed exposure values; and computing the
corresponding g-formula risk contrasts.
In the remainder of this example, we construct a simplified dataset
with a mediator measured at three time points, a competing event
indicator also measured at three time points, and a time-to-event
outcome. We then demonstrate how to estimate the four path-specific
effects using pathcomprisk.
We consider a baseline exposure , a vector of baseline covariates , a longitudinal mediator , a competing event process and a time-to-event outcome observed over discrete follow-up intervals . Let denote the survival indicator up to each interval, where if the individual is alive and event-free at time .
We define counterfactual outcomes under possibly contrary-to-fact interventions on the exposure, the mediator history, and the survival process, following the path-specific framework of Domingo-Relloso et al. (2024). For two exposure levels (reference) and $a^\*$ (higher level), the quantities of interest at time are effects on the risk difference scale:
Direct effect (DE): effect of changing from to $a^\*$ while holding both the mediator and the survival process at the levels they would take under exposure : $$ DE_{k+1}(a^\*, a) = \mathbb{E}\big\{Y_{k+1}(a^\*, \bar M_k(a,\bar S_{k-1}(a)), \bar S_k(a,\bar M_k(a)))\big\} - \mathbb{E}\big\{Y_{k+1}(a, \bar M_k(a,\bar S_{k-1}(a)), \bar S_k(a,\bar M_k(a)))\big\}. $$
Indirect effect through the longitudinal mediator (IEM): effect that operates by changing the entire mediator trajectory from what it would be under to what it would be under $A=a^\*$, while forcing the survival process to evolve as under : $$ IEM_{k+1}(a^\*, a) = \mathbb{E}\big\{Y_{k+1}(a^\*, \bar M_k(a^\*,\bar S_{k-1}(a)), \bar S_k(a,\bar M_k(a^\*)))\big\} - \mathbb{E}\big\{Y_{k+1}(a^\*, \bar M_k(a,\bar S_{k-1}(a)), \bar S_k(a,\bar M_k(a)))\big\}. $$
Indirect effect through the competing event (IED): effect that operates by changing the survival and competing risk process from what it would be under to what it would be under $A=a^\*$, while keeping the mediator trajectory at the level it would take under $A=a^\*$: $$ IED_{k+1}(a^\*, a) = \mathbb{E}\big\{Y_{k+1}(a^\*, \bar M_k(a^\*,\bar S_{k-1}(a^\*)), \bar S_k(a^\*,\bar M_k(a^\*)))\big\} - \mathbb{E}\big\{Y_{k+1}(a^\*, \bar M_k(a^\*,\bar S_{k-1}(a)), \bar S_k(a,\bar M_k(a^\*)))\big\}. $$
Total effect (TE): overall effect of changing the exposure from to $a^\*$, allowing both the mediator and the competing risk process to evolve under their natural levels for each exposure: $$ TE_{k+1}(a^\*, a) = \mathbb{E}\big\{Y_{k+1}(a^\*, \bar M_k(a^\*,\bar S_{k-1}(a^\*)), \bar S_k(a^\*,\bar M_k(a^\*)))\big\} - \mathbb{E}\big\{Y_{k+1}(a, \bar M_k(a,\bar S_{k-1}(a)), \bar S_k(a,\bar M_k(a)))\big\}. $$
By construction, the three path-specific effects sum to the total effect: $$ TE_{k+1}(a^\*, a) = DE_{k+1}(a^\*, a) + IEM_{k+1}(a^\*, a) + IED_{k+1}(a^\*, a). $$
In practice, the function cmest_pathcomprisk() reports
these effects at a user-specified time horizon $t^\*$ (for example, 10 years of follow-up),
expressed as differences in cumulative risk (or in cumulative incidence
derived from the additive hazards model). It also returns the proportion
of the total effect explained by the paths through the mediator and
competing risk, $$
Q(t^\*) = \frac{IEM(t^\*) + IED(t^\*)}{TE(t^\*)} \times 100,
$$ interpreted as the percentage of the total effect operating
through the mediator trajectory (including the nested pathway through
the competing event).
Under standard identifiability conditions for longitudinal mediation with competing risks (exchangeability, positivity, consistency and no interference), the counterfactual risks above are identified by the mediational g-formula. For generic arguments indicating the exposure level carried through the mediator, survival and outcome components, the g-formula identifies as a functional of the observed data distribution. The path-specific effects can then be written as contrasts of , for example $$ DE_{k+1}(a^\*, a) = \phi(a,a,a^\*) - \phi(a,a,a), \quad IEM_{k+1}(a^\*, a) = \phi(a^\*,a,a^\*) - \phi(a,a,a^\*), $$ $$ IED_{k+1}(a^\*, a) = \phi(a^\*,a^\*,a^\*) - \phi(a^\*,a,a^\*), \quad TE_{k+1}(a^\*, a) = \phi(a^\*,a^\*,a^\*) - \phi(a,a,a). $$
The function pathcomprisk() implements a parametric
mediational g-formula algorithm tailored to time-varying mediators and
competing risks:
Choose two exposure levels and $a^\*$ (for example, the 25th and 75th percentiles of the exposure distribution) and a time horizon $t^\*$.
Fit sequential regression models for the longitudinal mediator and the competing event at each visit , typically using linear or logistic models of the form where denotes the observed history up to visit .
Transform the data into counting-process format and fit an
additive hazards model for the time-to-event outcome including the full
mediator history, exposure and baseline covariates, for example using
the timereg package:
Using the fitted models, simulate counterfactual trajectories of , and under each of the required intervention regimes:
Average the simulated outcomes over individuals to obtain empirical estimates of the counterfactual risks at $t^\*$ for each regime. Plug these estimates into the contrasts above to obtain $\widehat{DE}(t^\*)$, $\widehat{IEM}(t^\*)$, $\widehat{IED}(t^\*)$ and $\widehat{TE}(t^\*)$, and compute $$ \widehat{Q}(t^\*) = \frac{\widehat{IEM}(t^\*) + \widehat{IED}(t^\*)}{\widehat{TE}(t^\*)} \times 100. $$
Optionally, repeat the entire procedure over bootstrap resamples to obtain confidence intervals for each effect and for the mediated proportion $Q(t^\*)$.
This algorithm mirrors the multistate mediator framework implemented
in cmest_multistate, but extends it to the path-specific
decomposition with a longitudinal mediator nested within the competing
risk process, as described in Domingo-Relloso et al. (2024).
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
cmdag(
outcome = "Y",
exposure = "A",
mediator = c("M1", "M2", "M3"), # mediator measured at 3 time points
basec = "L0", # baseline confounder
postc = c("D1", "D2", "D3"), # competing risk indicator at 3 time points
node = TRUE,
text_col = "white"
)
##
## Attaching package: 'ggdag'
## The following object is masked from 'package:stats':
##
## filter
library(dagitty)
library(ggplot2)
library(tibble)
## 1. Manually specify coordinates (ggdag manual style)
coords <- tribble(
~name, ~x, ~y,
"L0", 0, 0,
"A", 1, 0,
"M1", 2, 0,
"M2", 3, 0,
"M3", 4, 0,
"Y", 5, 0,
"D1", 2, -1,
"D2", 3, -1,
"D3", 4, -1
)
## 2. Define the DAG structure
dag_longi <- dagify(
# outcome
Y ~ A + L0 + M1 + M2 + M3 + D1 + D2 + D3,
# mediator process
M1 ~ A + L0,
M2 ~ A + L0 + M1,
M3 ~ A + L0 + M2,
# competing risk process
D1 ~ A + L0,
D2 ~ A + L0 + D1 + M1,
D3 ~ A + L0 + D2 + M2,
# exposure model
A ~ L0,
exposure = "A",
outcome = "Y",
coords = coords
)
## 4. Plot following the ggdag manual style
ggplot(dag_longi, aes(x = x, y = y, xend = xend, yend = yend)) +
geom_dag_edges() +
geom_dag_node() +
geom_dag_text(aes(label = name), vjust = -0.8) +
theme_dag()
pathcomprisk function
To easily implement this estimation procedure, this feature is
available via the pathcomprisk() function. It automates the
mediational g-formula for time-varying mediators and competing risks on
the hazard scale.
Here is a basic example of how to use the pathcomprisk()
function for time-to-event mediation analysis with competing risks:
library(CMAverse)
library(survival)
library(timereg)
library(data.table)
# Fit mediator models (at each time point)
M1 <- lm(M_1 ~ E + C, data = mydata)
M2 <- lm(M_2 ~ E + M_1 + C, data = mydata)
mreg_list <- list(M1, M2)
# Fit competing risk models
L1 <- glm(D1 ~ E + M_1 + C, data = mydata, family = binomial())
L2 <- glm(D2 ~ E + M_2 + C, data = mydata, family = binomial())
D_list <- list(L1, L2)
# Fit outcome model (Aalen additive hazards model)
Y <- timereg::aalen(
Surv(tstart, tstop, event) ~ const(E) + const(M) + const(C),
data = df_counted,
resample.iid = 1
)
# Define exposure levels
a <- 1 # Active exposure value
astar <- 0 # Reference exposure value
# Estimate path-specific effects
results <- pathcomprisk(
D = D_list, # Competing risk models
mreg = mreg_list, # Mediator models
mvar = c("M_1", "M_2"), # Mediator names
yreg = Y, # Outcome model
avar = "E", # Exposure name
a = a, # Active exposure
astar = astar, # Reference exposure
data = mydata,
time_points = c("1", "2"), # Time points for mediators
peryr = 100000, # Person-years divisor
nboot = 1000, # Bootstrap samples
refit = TRUE # Exact match bootstrapping
)
# View results
summary(results)We simulate the data using a simulated dataset created by Yuchen
Zhang. The exact matching mode utilizes an inner bootstrap loop via the
refit = TRUE logic to adequately capture variability.
library(CMAverse) # or library(cmcrhazard) / library(cmmed)
library(survival)
library(timereg)
library(data.table)
load("simData.Y10D10.Population.RData")
set.seed(2)
# Subset the data for demonstration
simData <- simData[sample(1:nrow(simData), size = 10000, replace = FALSE), ]
simData <- as.data.table(simData)
simData[, M_1 := M1]
simData[, M_2 := M2]
# Counting-process helpers the outcome model needs
simData[, `:=`(tstart = 0, tstop = time_to_event,
time.since.first.exam_1 = 0,
time.since.first.exam_2 = time_to_event)]
# Unique ID for tmerge / reshape
simData[, id_unique := .I]
simData <- as.data.frame(simData)
# ---- Mediator & Competing risks models (fit on the baseline sample as priors) ----
M1 <- lm(M_1 ~ E + C, data = simData)
M2 <- lm(M_2 ~ E + M_1 + C, data = simData)
mreg_list <- list(M1, M2)
L1 <- glm(D1 ~ E + M_1 + C, data = simData, family = binomial())
L2 <- glm(D2 ~ E + M_2 + C, data = simData, family = binomial())
D_list <- list(L1, L2)
# ---- Transformation to counting-process data for the outcome model ----
df <- survival::tmerge(
data1 = simData,
data2 = simData,
id = id_unique,
endpt = event(time_to_event, eventHappened)
)
df_tv <- reshape(
as.data.frame(simData),
direction = "long",
varying = list(M = c("M_1", "M_2"), time = c("time.since.first.exam_1", "time.since.first.exam_2")),
v.names = c("M", "time.since.first.exam"),
times = c(1, 2),
idvar = "id_unique"
)
df_tv <- df_tv[order(df_tv$id_unique, df_tv$time), ]
df <- survival::tmerge(df, df_tv, id = id_unique, M = tdc(time.since.first.exam, M))
Y <- timereg::aalen(
Surv(tstart, tstop, endpt) ~ const(E) + const(M) + const(C),
data = df, resample.iid = 1, n.sim = 0
)
# ---- Call the unified analysis function ----
a <- as.numeric(quantile(simData$E, 0.25, na.rm = TRUE))
a_star <- as.numeric(quantile(simData$E, 0.75, na.rm = TRUE))
results <- cmest_pathcomprisk(
D = D_list,
mreg = mreg_list,
mvar = c("M_1", "M_2"),
yreg = Y,
avar = "E",
a = a,
astar = a_star,
data = simData,
time_points = c("1", "2"),
peryr = 100000,
nboot = 1000,
refit = TRUE, # Use Exact Match bootstrapping mode
yreg_time = "time_to_event",
yreg_event = "eventHappened"
)We can directly inspect the calculated effects:
results$DE
[1] "116.04 (50.61, 184.85)"
$IEM
[1] "31.52 (-26.96, 88.81)"
$IED
[1] "-0.36 (-0.67, 0)"
$TE
[1] "147.6 (100.05, 194.07)"
$Q
[1] 21.35
The exposure had a large direct effect on the outcome (DE = 116.0), with more modest and uncertain mediation through the longitudinal mediator (IEM = 31.5) and a negligible competing-risk pathway (IED = -0.36). The total effect (TE = 147.6) reflects that most of the exposure’s influence operates directly rather than through mediated biological or competing-risk pathways.