We follow the framework of Arce Domingo-Relloso et al. (2024), who extend the mediational g-formula to accommodate longitudinal mediators and competing events in a single causal framework. In studies where both the mediator and competing risks evolve over time, traditional mediation estimators may fail to correctly identify path-specific effects. The approach implemented in cmest_pathcomprisk overcomes this challenge by modeling the structure—exposure, mediator history, competing event history, and survival process—and then evaluating the interventions through the g-formula.

This example illustrates how to use pathcomprisk to estimate path-specific causal effects in settings where:

  • the mediator is time-varying,
  • the outcome is a time-to-event variable, and
  • competing risks may occur prior to the event of interest.

The method follows the framework of Arce Domingo-Relloso et al. (2024), which extends the mediational g-formula to accommodate longitudinal mediators and competing events simultaneously. When both the mediator and the competing event process evolve over time, standard mediation estimators may not correctly separate direct and indirect pathways. The approach implemented in cmest_pathcomprisk overcomes this limitation by modeling the full joint longitudinal structure—exposure, mediator history, competing event history, and survival process—and applying the g-formula to evaluate counterfactual interventions.

Path-specific effects

Under this framework, the total effect of an exposure on a time-to-event outcome is decomposed into four interpretable components.

  1. Direct effect (DE)
    The portion of the exposure effect not operating through either the mediator trajectory or the competing event process. This effect is defined by modifying the exposure while holding both mediator and competing event histories at levels that would occur under a reference exposure value.

  2. Indirect effect through the mediator (IEM)
    The part of the effect transmitted through the entire history of the time-varying mediator, while fixing the competing event process to its reference distribution. This represents how changes in the mediator pathway contribute to the exposure–outcome association.

  3. Indirect effect through the competing event (IED)
    The portion of the effect transmitted through the competing-risk process. Because a competing event can alter the risk set for the primary outcome, this pathway measures how differences in the occurrence or timing of the competing event contribute to the exposure effect.

  4. Total effect (TE)
    The overall effect of the exposure on the outcome when the mediator pathway, the competing event pathway, and any direct mechanisms are all allowed to vary naturally.

Under the identification assumptions presented in Arce Domingo-Relloso et al. (2024), each path-specific effect can be expressed using the longitudinal mediational g-formula. The cmest_pathcomprisk function implements these estimands by specifying regression models for the exposure, mediator, competing event process, and time-to-event outcome; generating counterfactual trajectories under fixed exposure values; and computing the corresponding g-formula risk contrasts.

In the remainder of this example, we construct a simplified dataset with a mediator measured at three time points, a competing event indicator also measured at three time points, and a time-to-event outcome. We then demonstrate how to estimate the four path-specific effects using pathcomprisk.

Estimands

We consider a baseline exposure AA, a vector of baseline covariates L0L_0, a longitudinal mediator MK=(M1,,MK)\bar M_K = (M_1,\dots,M_K), a competing event process DK=(D1,,DK)\bar D_K = (D_1,\dots,D_K) and a time-to-event outcome YY observed over discrete follow-up intervals k=1,,Kk=1,\dots,K. Let SK=(S1,,SK)\bar S_K = (S_1,\dots,S_K) denote the survival indicator up to each interval, where Sk=1S_k = 1 if the individual is alive and event-free at time kk.

We define counterfactual outcomes under possibly contrary-to-fact interventions on the exposure, the mediator history, and the survival process, following the path-specific framework of Domingo-Relloso et al. (2024). For two exposure levels aa (reference) and $a^\*$ (higher level), the quantities of interest at time k+1k+1 are effects on the risk difference scale:

  • Direct effect (DE): effect of changing AA from aa to $a^\*$ while holding both the mediator and the survival process at the levels they would take under exposure aa: $$ DE_{k+1}(a^\*, a) = \mathbb{E}\big\{Y_{k+1}(a^\*, \bar M_k(a,\bar S_{k-1}(a)), \bar S_k(a,\bar M_k(a)))\big\} - \mathbb{E}\big\{Y_{k+1}(a, \bar M_k(a,\bar S_{k-1}(a)), \bar S_k(a,\bar M_k(a)))\big\}. $$

  • Indirect effect through the longitudinal mediator (IEM): effect that operates by changing the entire mediator trajectory from what it would be under A=aA=a to what it would be under $A=a^\*$, while forcing the survival process to evolve as under A=aA=a: $$ IEM_{k+1}(a^\*, a) = \mathbb{E}\big\{Y_{k+1}(a^\*, \bar M_k(a^\*,\bar S_{k-1}(a)), \bar S_k(a,\bar M_k(a^\*)))\big\} - \mathbb{E}\big\{Y_{k+1}(a^\*, \bar M_k(a,\bar S_{k-1}(a)), \bar S_k(a,\bar M_k(a)))\big\}. $$

  • Indirect effect through the competing event (IED): effect that operates by changing the survival and competing risk process from what it would be under A=aA=a to what it would be under $A=a^\*$, while keeping the mediator trajectory at the level it would take under $A=a^\*$: $$ IED_{k+1}(a^\*, a) = \mathbb{E}\big\{Y_{k+1}(a^\*, \bar M_k(a^\*,\bar S_{k-1}(a^\*)), \bar S_k(a^\*,\bar M_k(a^\*)))\big\} - \mathbb{E}\big\{Y_{k+1}(a^\*, \bar M_k(a^\*,\bar S_{k-1}(a)), \bar S_k(a,\bar M_k(a^\*)))\big\}. $$

  • Total effect (TE): overall effect of changing the exposure from aa to $a^\*$, allowing both the mediator and the competing risk process to evolve under their natural levels for each exposure: $$ TE_{k+1}(a^\*, a) = \mathbb{E}\big\{Y_{k+1}(a^\*, \bar M_k(a^\*,\bar S_{k-1}(a^\*)), \bar S_k(a^\*,\bar M_k(a^\*)))\big\} - \mathbb{E}\big\{Y_{k+1}(a, \bar M_k(a,\bar S_{k-1}(a)), \bar S_k(a,\bar M_k(a)))\big\}. $$

By construction, the three path-specific effects sum to the total effect: $$ TE_{k+1}(a^\*, a) = DE_{k+1}(a^\*, a) + IEM_{k+1}(a^\*, a) + IED_{k+1}(a^\*, a). $$

In practice, the function cmest_pathcomprisk() reports these effects at a user-specified time horizon $t^\*$ (for example, 10 years of follow-up), expressed as differences in cumulative risk (or in cumulative incidence derived from the additive hazards model). It also returns the proportion of the total effect explained by the paths through the mediator and competing risk, $$ Q(t^\*) = \frac{IEM(t^\*) + IED(t^\*)}{TE(t^\*)} \times 100, $$ interpreted as the percentage of the total effect operating through the mediator trajectory (including the nested pathway through the competing event).

Estimation

Under standard identifiability conditions for longitudinal mediation with competing risks (exchangeability, positivity, consistency and no interference), the counterfactual risks above are identified by the mediational g-formula. For generic arguments (aM,aS,aY)(a_M, a_S, a_Y) indicating the exposure level carried through the mediator, survival and outcome components, the g-formula identifies ϕ(aM,aS,aY)=𝔼{Yk+1(aY,Mk(aM,Sk1(aS)),Sk(aS,Mk(aM)))} \phi(a_M, a_S, a_Y) = \mathbb{E}\{Y_{k+1}(a_Y, \bar M_k(a_M,\bar S_{k-1}(a_S)), \bar S_k(a_S,\bar M_k(a_M)))\} as a functional of the observed data distribution. The path-specific effects can then be written as contrasts of ϕ\phi, for example $$ DE_{k+1}(a^\*, a) = \phi(a,a,a^\*) - \phi(a,a,a), \quad IEM_{k+1}(a^\*, a) = \phi(a^\*,a,a^\*) - \phi(a,a,a^\*), $$ $$ IED_{k+1}(a^\*, a) = \phi(a^\*,a^\*,a^\*) - \phi(a^\*,a,a^\*), \quad TE_{k+1}(a^\*, a) = \phi(a^\*,a^\*,a^\*) - \phi(a,a,a). $$

The function pathcomprisk() implements a parametric mediational g-formula algorithm tailored to time-varying mediators and competing risks:

  1. Choose two exposure levels aa and $a^\*$ (for example, the 25th and 75th percentiles of the exposure distribution) and a time horizon $t^\*$.

  2. Fit sequential regression models for the longitudinal mediator and the competing event at each visit k=1,,Kk = 1,\dots,K, typically using linear or logistic models of the form Mkk1fM(MkMk1,A,L0),DkkfD(DkMk,A,L0), M_k \mid \mathcal{H}_{k-1} \sim f_M(M_k \mid M_{k-1}, A, L_0), \quad D_k \mid \mathcal{H}_k \sim f_D(D_k \mid M_k, A, L_0), where k1\mathcal{H}_{k-1} denotes the observed history up to visit k1k-1.

  3. Transform the data into counting-process format and fit an additive hazards model for the time-to-event outcome including the full mediator history, exposure and baseline covariates, for example using the timereg package: λY(tA,M(t),L0)=β0(t)+βA(t)A+βM(t)M(t)+βL(t)L0. \lambda_Y(t \mid A, \bar M(t), L_0) = \beta_0(t) + \beta_A(t) A + \beta_M(t)^\top \bar M(t) + \beta_L(t)^\top L_0.

  4. Using the fitted models, simulate counterfactual trajectories of MK\bar M_K, DK\bar D_K and SK\bar S_K under each of the required intervention regimes:

    • mediator and survival under exposure aa,
    • mediator under $a^\*$ with survival under aa,
    • mediator and survival under $a^\*$, and, if needed, hybrid histories that mix these components. For each regime, predict the counterfactual outcome process Y(t)Y(t) up to $t^\*$, either on the rate difference scale (via the additive hazards coefficients) or on the survival probability scale (via the model-based survival function).
  5. Average the simulated outcomes over individuals to obtain empirical estimates of the counterfactual risks at $t^\*$ for each regime. Plug these estimates into the contrasts above to obtain $\widehat{DE}(t^\*)$, $\widehat{IEM}(t^\*)$, $\widehat{IED}(t^\*)$ and $\widehat{TE}(t^\*)$, and compute $$ \widehat{Q}(t^\*) = \frac{\widehat{IEM}(t^\*) + \widehat{IED}(t^\*)}{\widehat{TE}(t^\*)} \times 100. $$

  6. Optionally, repeat the entire procedure over bootstrap resamples to obtain confidence intervals for each effect and for the mediated proportion $Q(t^\*)$.

This algorithm mirrors the multistate mediator framework implemented in cmest_multistate, but extends it to the path-specific decomposition with a longitudinal mediator nested within the competing risk process, as described in Domingo-Relloso et al. (2024).

DAG representation


## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
cmdag(
  outcome  = "Y",
  exposure = "A",
  mediator = c("M1", "M2", "M3"),     # mediator measured at 3 time points
  basec    = "L0",                    # baseline confounder
  postc    = c("D1", "D2", "D3"),     # competing risk indicator at 3 time points
  node     = TRUE,
  text_col = "white"
)

## 
## Attaching package: 'ggdag'
## The following object is masked from 'package:stats':
## 
##     filter
library(dagitty)
library(ggplot2)
library(tibble)

## 1. Manually specify coordinates (ggdag manual style)
coords <- tribble(
  ~name, ~x, ~y,
  "L0",  0,   0,
  "A",   1,   0,
  "M1",  2,   0,
  "M2",  3,   0,
  "M3",  4,   0,
  "Y",   5,   0,
  "D1",  2,  -1,
  "D2",  3,  -1,
  "D3",  4,  -1
)

## 2. Define the DAG structure
dag_longi <- dagify(
  # outcome
  Y  ~ A + L0 + M1 + M2 + M3 + D1 + D2 + D3,

  # mediator process
  M1 ~ A + L0,
  M2 ~ A + L0 + M1,
  M3 ~ A + L0 + M2,

  # competing risk process
  D1 ~ A + L0,
  D2 ~ A + L0 + D1 + M1,
  D3 ~ A + L0 + D2 + M2,

  # exposure model
  A  ~ L0,

  exposure = "A",
  outcome  = "Y",
  coords   = coords
)



## 4. Plot following the ggdag manual style
ggplot(dag_longi, aes(x = x, y = y, xend = xend, yend = yend)) +
  geom_dag_edges() +
  geom_dag_node() +
  geom_dag_text(aes(label = name), vjust = -0.8) +
  theme_dag()

pathcomprisk function

To easily implement this estimation procedure, this feature is available via the pathcomprisk() function. It automates the mediational g-formula for time-varying mediators and competing risks on the hazard scale.

Basic Usage Example

Here is a basic example of how to use the pathcomprisk() function for time-to-event mediation analysis with competing risks:

library(CMAverse)
library(survival)
library(timereg)
library(data.table)

# Fit mediator models (at each time point)
M1 <- lm(M_1 ~ E + C, data = mydata)
M2 <- lm(M_2 ~ E + M_1 + C, data = mydata)
mreg_list <- list(M1, M2)

# Fit competing risk models
L1 <- glm(D1 ~ E + M_1 + C, data = mydata, family = binomial())
L2 <- glm(D2 ~ E + M_2 + C, data = mydata, family = binomial())
D_list <- list(L1, L2)

# Fit outcome model (Aalen additive hazards model)
Y <- timereg::aalen(
  Surv(tstart, tstop, event) ~ const(E) + const(M) + const(C),
  data = df_counted,
  resample.iid = 1
)

# Define exposure levels
a <- 1        # Active exposure value
astar <- 0    # Reference exposure value

# Estimate path-specific effects
results <- pathcomprisk(
  D = D_list,                    # Competing risk models
  mreg = mreg_list,              # Mediator models
  mvar = c("M_1", "M_2"),       # Mediator names
  yreg = Y,                      # Outcome model
  avar = "E",                    # Exposure name
  a = a,                         # Active exposure
  astar = astar,                 # Reference exposure
  data = mydata,
  time_points = c("1", "2"),    # Time points for mediators
  peryr = 100000,                # Person-years divisor
  nboot = 1000,                  # Bootstrap samples
  refit = TRUE                   # Exact match bootstrapping
)

# View results
summary(results)

Simulated data example

We simulate the data using a simulated dataset created by Yuchen Zhang. The exact matching mode utilizes an inner bootstrap loop via the refit = TRUE logic to adequately capture variability.

library(CMAverse) # or library(cmcrhazard) / library(cmmed)
library(survival)
library(timereg)
library(data.table)

load("simData.Y10D10.Population.RData")

set.seed(2)  
# Subset the data for demonstration
simData <- simData[sample(1:nrow(simData), size = 10000, replace = FALSE), ]
simData <- as.data.table(simData)
simData[, M_1 := M1]
simData[, M_2 := M2]

# Counting-process helpers the outcome model needs
simData[, `:=`(tstart = 0, tstop = time_to_event,
               time.since.first.exam_1 = 0,
               time.since.first.exam_2 = time_to_event)]

# Unique ID for tmerge / reshape
simData[, id_unique := .I]
simData <- as.data.frame(simData)

# ---- Mediator & Competing risks models (fit on the baseline sample as priors) ----
M1 <- lm(M_1 ~ E + C, data = simData)
M2 <- lm(M_2 ~ E + M_1 + C, data = simData)
mreg_list <- list(M1, M2)

L1 <- glm(D1 ~ E + M_1 + C, data = simData, family = binomial())
L2 <- glm(D2 ~ E + M_2 + C, data = simData, family = binomial())
D_list <- list(L1, L2)

# ---- Transformation to counting-process data for the outcome model ----
df <- survival::tmerge(
  data1 = simData,
  data2 = simData,
  id    = id_unique,
  endpt = event(time_to_event, eventHappened)
)

df_tv <- reshape(
  as.data.frame(simData),
  direction = "long",
  varying   = list(M = c("M_1", "M_2"), time = c("time.since.first.exam_1", "time.since.first.exam_2")),
  v.names = c("M", "time.since.first.exam"),
  times   = c(1, 2),
  idvar   = "id_unique"
)
df_tv <- df_tv[order(df_tv$id_unique, df_tv$time), ]
df <- survival::tmerge(df, df_tv, id = id_unique, M = tdc(time.since.first.exam, M))

Y <- timereg::aalen(
  Surv(tstart, tstop, endpt) ~ const(E) + const(M) + const(C),
  data = df, resample.iid = 1, n.sim = 0
)

# ---- Call the unified analysis function ----
a <- as.numeric(quantile(simData$E, 0.25, na.rm = TRUE))
a_star <- as.numeric(quantile(simData$E, 0.75, na.rm = TRUE))

results <- cmest_pathcomprisk(
  D = D_list, 
  mreg = mreg_list, 
  mvar = c("M_1", "M_2"), 
  yreg = Y, 
  avar = "E", 
  a = a,
  astar = a_star,
  data = simData,
  time_points = c("1", "2"), 
  peryr = 100000, 
  nboot = 1000,
  refit = TRUE,                 # Use Exact Match bootstrapping mode 
  yreg_time = "time_to_event",  
  yreg_event = "eventHappened"
)

We can directly inspect the calculated effects:

results
$DE
[1] "116.04 (50.61, 184.85)"

$IEM
[1] "31.52 (-26.96, 88.81)"

$IED
[1] "-0.36 (-0.67, 0)"

$TE
[1] "147.6 (100.05, 194.07)"

$Q
[1] 21.35 

The exposure had a large direct effect on the outcome (DE = 116.0), with more modest and uncertain mediation through the longitudinal mediator (IEM = 31.5) and a negligible competing-risk pathway (IED = -0.36). The total effect (TE = 147.6) reflects that most of the exposure’s influence operates directly rather than through mediated biological or competing-risk pathways.