Causal Mediation Analysis with the Regression-Based Approach

cmest_rb is used to implement the the regression-based approach by Valeri & VanderWeele (2013) and VanderWeele & Vansteelandt (2014) for causal mediation analysis with a single exposure, a single outcome, and a single or multiple mediators.

cmest_rb(
  data = NULL,
  outcome = NULL,
  event = NULL,
  exposure = NULL,
  mediator = NULL,
  EMint = NULL,
  basec = NULL,
  yreg = NULL,
  mreg = NULL,
  estimation = "imputation",
  inference = "bootstrap",
  astar = NULL,
  a = NULL,
  mval = NULL,
  basecval = NULL,
  yval = NULL,
  nboot = 200,
  boot.ci.type = "per",
  casecontrol = FALSE,
  yrare = NULL,
  yprevalence = NULL,
  multimp = FALSE,
  args_mice = NULL
)

Arguments

data: a data frame (or object coercible by as.data.frame to a data frame) containing the variables in the model.
outcome: variable name of the outcome.
event: (required when yreg is coxph, aft_exp, or aft_weibull) variable name of the event.
exposure: variable name of the exposure.
mediator: a vector of variable name(s) of mediator(s).
EMint: a logical value. TRUE indicates there is exposure-mediator interaction in yreg.
basec: a vector of variable names of confounders. See Details.
yreg: outcome regression model. See Details.
mreg: a list of mediator regression models following the order in mediator. See Details.
estimation: estimation method. paramfunc and imputation are implemented (the first 4 letters are sufficient). Default is imputation. See Details.
inference: inference method. delta and bootstrap are implemented (the first 4 letters are sufficient). Default is bootstrap. See Details.
astar: the control value of the exposure.
a: the treatment value of the exposure.
mval: a list of values at which each mediator is controlled to calculate the cde, following the order in mediator.
basecval: (required when estimation is paramfunc and EMint is TRUE) a list of values at which each confounder is conditioned on, following the order in basec. If NULL, the mean of each confounder is used.
yval: (required when the outcome is categorical) the level of the outcome at which causal effects on the risk ratio scale are estimated. If NULL, the last level is used.
nboot: (used when inference is bootstrap) the number of bootstraps applied. Default is 200.
boot.ci.type: (used when inference is bootstrap) the type of bootstrap confidence interval. If per, percentile bootstrap confidence intervals are estimated; if bca, bias-corrected and accelerated (BCa) bootstrap confidence intervals are estimated. Default is per.
casecontrol: a logical value. TRUE indicates a case control study in which the first level of the outcome is treated as the control and the second level of the outcome is treated as the case. Default is FALSE.
yrare: (used when casecontrol is TRUE) a logical value. TRUE indicates the case is rare.
yprevalence: (used when casecontrol is TRUE) the prevalence of the case.
multimp: a logical value. If TRUE, conduct multiple imputations using the mice function. Default is FALSE.
args_mice: a list of additional arguments passed to the mice function. See mice for details.
x: an object of class cmest.
object: an object of class cmest.
digits: minimal number of significant digits. See print.default.

Value

An object of classes cmest and cmest_rb is returned:

call: the function call,
data: the data frame,
methods: a list of methods used which may include estimation, inference, nboot, boot.ci.type, casecontrol, yrare, and yprevalence,
variables: a list of variables used which may include outcome, event, exposure, mediator, EMint, and basec,
ref: reference values used which may include astar, a, mval, basecval and yval,
reg.input: a list of regressions input,
reg.output: a list of regressions output. If multimp is TRUE, reg.output contains regression models fitted by each imputed dataset,
multimp: a list of arguments used for multiple imputation,
effect.pe: point estimates of causal effects,
effect.se: standard errors of causal effects,
effect.ci.low: lower limits of the 95% confidence intervals of causal effects,
effect.ci.high: higher limits of the 95% confidence intervals of causal effects,
effect.pval: p-values of causal effects,

...

Details

Assumptions of the regression-based approach

There is no unmeasured exposure-outcome confounding: given basec and postc, exposure is independent of outcome.
There is no unmeasured mediator-outcome confounding: given exposure and basec, mediator is independent of outcome.
There is no unmeasured exposure-mediator confounding: given basec, exposure is independent of mediator.
There is no mediator-outcome confounder affected by the exposure: there is no variable in basec affected by exposure.

Regression models

Each regression model in yreg and mreg can be specified by a fitted regression object or the character name of a regression model.

The Character Name of a Regression Model:

linear: linear regression fitted by glm with family = gaussian()
logistic: logistic regression fitted by glm with family = logit()
loglinear: loglinear regression fitted by glm with family = poisson()
poisson: poisson regression fitted by glm with family = poisson()
quasipoisson: quasipoisson regression fitted by glm with family = quasipoisson()
negbin: negative binomial regression fitted by glm.nb
multinomial: multinomial regression fitted by multinom
ordinal: ordered logistic regression fitted by polr
coxph: cox proportional hazard model fitted by coxph
aft_exp: accelerated failure time model fitted by survreg with dist = "exponential"
aft_weibull: accelerated failure time model fitted by survreg with dist = "weibull"

coxph, aft_exp and aft_weibull are currently not implemented for mreg.

If EMint is TRUE and yreg is specified by the character name of a regression model, yreg is fitted with the interaction between the exposure and each mediator.

A Fitted Regression Object:

Regression objects can be fitted by lm, glm, glm.nb, gam, multinom, polr, coxph and survreg.
Regression objects fitted by coxph and survreg are currently not supported for mreg.
yreg should regress outcome on exposure, mediator and basec.
For p=1,...,k, mreg[p] should regress mediator[p] on exposure and basec, where k is the number of mediators.
yreg can't include mediator-mediator interactions when there are multiple mediators (VanderWeele TJ & Vansteelandt, 2014).

Estimation Methods

paramfunc: (only available for a single mediator) closed-form parameter function estimation by Valeri & VanderWeele (2013). Each causal effect is estimated by a closed-form formula of regression coefficients.
imputation: direct counterfactual imputation estimation by Imai, et al (2010). Each causal effect is estimated by imputing counterfactuals directly.

To use paramfunc, yreg and mreg must be specified by the character name of a regression model. yreg can be chosen from linear, logistic, loglinear, poisson, quasipoisson, negbin, coxph, aft_exp and aft_weibull. mreg can be chosen from linear, logistic and multinomial.

To use paramfunc with yreg = "logistic" or yreg = "coxph", the outcome must be rare.

Inference Methods

delta: (only available when estimation = "paramfunc") inferences about causal effects are obtained by the delta method.
bootstrap: inferences about causal effects are obtained by bootstrapping.

Estimated Causal Effects

For a continuous outcome, causal effects on the difference scale are estimated. For a categorical, count or survival outcome, causal effects on the ratio scale are estimated. Depending on the outcome type, the ratio can be risk ratio for a categorical outcome, rate ratio for a count outcome, hazard ratio for a survival outcome fitted by coxph, mean survival ratio for a survival outcome fitted by survreg, etc.

When EMint is FALSE, two-way decomposition (Valeri & VanderWeele, 2013) is conducted, i.e.,

for a continuous outcome: cde (controlled direct effect), pnde (pure natural direct effect), tnde (total natural direct effect), pnie (pure natural indirect effect), tnie (total natural indirect effect), te (total effect), and pm (proportion mediated) are estimated.
for a categorical, count or survival outcome: Rcde (cde ratio), Rpnde (pnde ratio), Rtnde (tnde ratio), Rpnie (pnie ratio), Rtnie (tnie ratio), Rte (te ratio), and pm are estimated.

When EMint is TRUE: additional four-way decomposition (VanderWeele, 2014) is conducted, i.e.,

for a continuous outcome: intref (reference interaction), intmed (mediated interaction), cde(prop) (proportion cde), intref(prop) (proportion intref), intmed(prop) (proportion intmed), pnie(prop) (proportion pnie), int (proportion attributable to interaction), and pe (proportion eliminated) are estimated.
for a categorical, count or survival outcome: ERcde (excess ratio due to cde), ERintref (excess ratio due to intref), ERintmed (excess ratio due to intmed), ERpnie (excess ratio due to pnie), ERcde(prop) (proportion ERcde), ERintref(prop) (proportion ERintref), ERintmed(prop) (proportion ERintmed), ERpnie(prop) (proportion ERpnie), int, and pe are estimated.

When EMint is TRUE and estimation is paramfunc, causal effects conditional on basecval are estimated. Otherwise, marginal causal effects are estimated.

References

Valeri L, VanderWeele TJ (2013). Mediation analysis allowing for exposure-mediator interactions and causal interpretation: theoretical assumptions and implementation with SAS and SPSS macros. Psychological Methods. 18(2): 137 - 150.

VanderWeele TJ, Vansteelandt S (2014). Mediation analysis with multiple mediators. Epidemiologic Methods. 2(1): 95 - 115.

VanderWeele TJ (2014). A unification of mediation and interaction: a 4-way decomposition. Epidemiology. 25(5): 749 - 61.

Imai K, Keele L, Tingley D (2010). A general approach to causal mediation analysis. Psychological Methods. 15(4): 309 - 334.

Schomaker M, Heumann C (2018). Bootstrap inference when using multiple imputation. Statistics in Medicine. 37(14): 2252 - 2266.

Efron B (1987). Better Bootstrap Confidence Intervals. Journal of the American Statistical Association. 82(397): 171-185.

Examples


if (FALSE) { # \dontrun{
library(CMAverse)

# single-mediator case without exposure-mediator interaction
exp1 <- cmest_rb(data = cma2020, outcome = "contY", 
exposure = "A", mediator = "M1", basec = c("C1", "C2"), 
EMint = FALSE, yreg = "linear", mreg = list("logistic"), 
estimation = "paramfunc", inference = "delta", astar = 0, a = 1, mval = list(1))
summary(exp1)

# single-mediator case with exposure-mediator interaction
exp2 <- cmest_rb(data = cma2020, outcome = "contY", 
exposure = "A", mediator = "M2", basec = c("C1", "C2"), 
EMint = TRUE, yreg = "linear", mreg = list("multinomial"), 
estimation = "paramfunc", inference = "delta", astar = 0, a = 1, mval = list("M2_0"))
summary(exp2)

# multiple-mediators case
exp3 <- cmest_rb(data = cma2020, outcome = "contY", 
exposure = "A", mediator = c("M1", "M2"), EMint = TRUE, basec = c("C1", "C2"), 
yreg = "linear", mreg = list("logistic", "multinomial"), 
estimation = "imputation", inference = "bootstrap", 
astar = 0, a = 1, mval = list(0, "M2_0"), 
nboot = 100, boot.ci.type = "bca")
summary(exp3)

# specify regression models by fitted regression objects
exp4 <- cmest_rb(data = cma2020, outcome = "contY", 
exposure = "A", mediator = c("M1", "M2"), EMint = TRUE, basec = c("C1", "C2"), 
yreg = lm(contY ~ A + M1 + M2 + C1 + C2, data = cma2020), 
mreg = list(glm(M1 ~ A + C1 + C2, data = cma2020, family = binomial()),
nnet::multinom(M2 ~ A + C1 + C2, data = cma2020)),
estimation = "imputation", inference = "bootstrap", 
astar = 0, a = 1, mval = list(0, "M2_0"), 
nboot = 100, boot.ci.type = "bca")
summary(exp4)

} # }