Chapter 3 Applications

3.1 Application I: Individual-level data

The dataset used in this example is a subset of data obtained from the example datasets provided by MetaboAnalyst in the statistical meta-analysis section (Xia and Wishart, 2009). These data are from a study investigating blood-based biomarkers of adenocarcinoma and comprise 14 metabolites (N = 14) measured across 172 samples from two independent studies (K = 2). We will use `MetaHD’ for combining summary estimates obtained from these multiple metabolomic studies.

This real dataset is available in the `MetaHD’ package and can be loaded using the following command.

# Read the data
example_data  <-  realdata

As explained in Section 3.1 `MetaHDInput’ function can then be used to calculate the treatment effect sizes and their associated variances and within-study covariance matrices as shown below.

input_data <- MetaHDInput(example_data)

Y <- input_data$Y
Slist <- input_data$Slist

Y
##               met1      met2       met3      met4        met5         met6
## study1 -0.08509765 0.0693374 0.21205246 0.1555140 -0.07131115 -0.010729262
## study2 -0.19103975 0.1985993 0.05266387 0.3042082 -0.08507508 -0.002200277
##               met7        met8        met9      met10      met11       met12
## study1 -0.06351313  0.07239534 -0.02854481 -0.1874413 0.07670815 -0.04170964
## study2 -0.02427915 -0.08497286 -0.10315078 -0.1196099 0.07105905  0.21213759
##            met13       met14
## study1 0.3378766 -0.20944830
## study2 0.2136121 -0.03345043
head(Slist[[1]])
##               met1         met2          met3         met4         met5
## met1  0.0194438538 0.0005017922 -0.0025219163 -0.002766591  0.004992537
## met2  0.0005017922 0.0161291933  0.0042058671  0.019843118  0.002535913
## met3 -0.0025219163 0.0042058671  0.0078052190  0.007030307 -0.001285968
## met4 -0.0027665912 0.0198431182  0.0070303073  0.074406890 -0.003247293
## met5  0.0049925375 0.0025359133 -0.0012859676 -0.003247293  0.007725915
## met6  0.0036952818 0.0035899796  0.0007582538  0.004244171  0.003155352
##              met6          met7          met8        met9        met10
## met1 0.0036952818  0.0036649961  0.0014738763 0.002466527  0.006057970
## met2 0.0035899796  0.0023004763  0.0009291760 0.003908964  0.001622693
## met3 0.0007582538 -0.0002967142  0.0010746462 0.001839553 -0.002939468
## met4 0.0042441714  0.0033677211  0.0053704235 0.004464203 -0.001121570
## met5 0.0031553517  0.0026034856 -0.0006582206 0.002925312  0.005845677
## met6 0.0035383608  0.0025416742  0.0003542152 0.003045991  0.003098343
##            met11        met12        met13        met14
## met1 0.003742686 0.0058389366 -0.017656900  0.006189749
## met2 0.006128897 0.0057661483 -0.015584915  0.002615291
## met3 0.002551313 0.0003375944  0.001663327 -0.001850528
## met4 0.005545238 0.0033374745 -0.012707585  0.001019416
## met5 0.004305489 0.0060072612 -0.020351263  0.004988612
## met6 0.003080440 0.0039539767 -0.012411644  0.003099160

Now, we can carry out the multivariate meta-analysis using MetaHD as follows.

# Multivariate random-effects meta analysis
model <- MetaHD(Y = Y, 
                Slist = Slist, 
                method = "multi")
model$estimate
##  [1] -0.1294875230  0.1390541472  0.1187040998  0.2031757519 -0.0406585464
##  [6] -0.0001290338 -0.0338842144 -0.0052315949 -0.0474366927 -0.1240326912
## [11]  0.0946724585  0.1105743431  0.1909300556 -0.1084060240
model$pVal
##  [1] 0.10014867 0.08615324 0.09690759 0.25810338 0.36261181 0.99461969
##  [7] 0.28079034 0.92795765 0.41512986 0.01378248 0.03066823 0.35119400
## [13] 0.01358336 0.12513848

When both within and between study correlations are set to zero, MetaHD reduces to the usual random effects model analysis of individual outcomes (i.e. univariate meta-analyses). Additionally, when the between-study variances are also set to zero, then MetaHD reduces to a fixed-effects model univariate meta-analysis. We can perform univariate random-effects and univariate fixed-effects meta-analysis as follows:

K <- nrow(Y)
N <- ncol(Y)
## Create a K × N matrix of within-study variances
Smat <- matrix(0, nrow = K, ncol = N)
for (i in 1:K) {
  Smat[i, ] <- diag(Slist[[i]])
}

# Univariate random-effects meta-analysis
REmodel <- MetaHD(Y = Y, 
                  Slist = Smat, 
                  method = "REM")
REmodel$estimate
##  [1] -0.152812325  0.137090057  0.125447557  0.236485102 -0.079158213
##  [6] -0.003400283 -0.041614291 -0.005823009 -0.067093550 -0.145743652
## [11]  0.073429398  0.097637046  0.224769893 -0.115441462
REmodel$pVal
##  [1] 0.068095503 0.117624883 0.114092805 0.198923845 0.169579668 0.878875331
##  [7] 0.281592194 0.941005451 0.297222174 0.012441248 0.161809469 0.439540239
## [13] 0.009502417 0.188537212
# Univariate fixed-effects meta-analysis
FEmodel <- MetaHD(Y = Y, 
                  Slist = Smat, 
                  method = "FEM")
FEmodel$estimate
##  [1] -0.152812327  0.137090057  0.119263841  0.236485102 -0.079158213
##  [6] -0.003400281 -0.041614290 -0.004927685 -0.067093550 -0.145743651
## [11]  0.073429398  0.139309196  0.224769883 -0.102735148
FEmodel$pVal
##  [1] 0.068095484 0.117624861 0.036764299 0.198923825 0.169579632 0.878875296
##  [7] 0.281592165 0.914730005 0.297222122 0.012441239 0.161809425 0.011347289
## [13] 0.009502382 0.034970100

When missing data are available, we can use MetaHD to handle missing values by setting the argument “impute.na = TRUE”. This replaces the missing effect sizes and variances by zero and a large constant value respectively, so that the missing outcome is allocated a lower weight in the meta-analysis.

3.2 Application II: Summary-level data

In this example, we use a simulated dataset as described in Liyanage et al. (2024). We have prepared separate data frames for effect sizes and within-study covariance matrices in the simdata.1 dataset, where K = 12 (no. of studies) and N = 30 (no. of outcomes).

Now, we can carry out the meta-analysis using MetaHD as follows.

Y <- simdata.1$Y 
Slist <- simdata.1$Slist

# Multivariate random-effects meta analysis
model_fastMetaHD <- MetaHD(Y = Y,
                           Slist = Slist,
                           method = "multi")
model_fastMetaHD$estimate
##  [1] -0.1661335 -1.9809593 -3.7329305 -0.3676187 -1.2805349  2.2047143
##  [7] -1.8351657  1.9742259 -0.7673506 -0.9405747  0.6333586 -0.8693423
## [13]  1.0744949  1.3249918 -1.2437328  1.4046179  1.1997847  0.8963671
## [19]  1.7543779  1.1155858 -0.8517850 -0.6183473  2.4147259  0.5123058
## [25] -0.8787377 -3.6860782 -1.2630573 -1.7269780  3.0223912  5.5121450

If within-study correlations are not available, use only the variances and enter these values in a K x N matrix:

K <- nrow(Y)
N <- ncol(Y)
Smat <- matrix(0, nrow = K, ncol = N)
for (i in 1:K) {
  Smat[i, ] <- diag(Slist[[i]])
}

model_est.wscor<- MetaHD(Y = Y,
                         Slist = Smat,
                         method = "multi",
                         est.wscor = TRUE)
model_est.wscor$estimate
##  [1] -0.07611975 -1.86284770 -3.65256960 -0.17757953 -1.23106654  2.31543281
##  [7] -1.72883345  2.08240604 -0.66048927 -0.84341837  0.75958054 -0.75718192
## [13]  1.19626776  1.42655963 -1.12929204  1.49558714  1.26777701  0.94828614
## [19]  1.78974887  1.22870385 -0.77606707 -0.51324789  2.48715772  0.60812053
## [25] -0.76160149 -3.49213078 -1.12631340 -1.58589106  3.09407333  5.59067908

Implementing univariate random-effects and fixed-effects meta-analysis using MetaHD:

# Univariate random-effects meta-analysis
model_random <- MetaHD(Y = Y, 
                       Slist = Smat, 
                       method = "REM")
model_random$estimate
##  [1]  0.01698189 -1.70802518 -3.51511837 -0.03225311 -1.15230990  2.41607988
##  [7] -1.61460969  2.19082837 -0.49260199 -0.79088482  0.90238451 -0.75906486
## [13]  1.26660939  1.61034022 -1.07672311  1.60752377  1.37279438  1.03478444
## [19]  1.88133989  1.29619146 -0.67308106 -0.38921814  2.65499223  0.66754678
## [25] -0.66945960 -3.49897667 -0.95282762 -1.52839939  3.20064280  5.68589161
# Univariate fixed-effects meta-analysis
model_fixed <- MetaHD(Y = Y, 
                      Slist = Smat, 
                      method = "FEM")
model_fixed$estimate
##  [1]  0.01698185 -1.70802519 -3.51511838 -0.03225310 -1.15482644  2.41607988
##  [7] -1.61460970  2.10129564 -0.49260199 -0.79088504  0.85325058 -0.76041187
## [13]  1.26660937  1.61034017 -1.07672316  1.60752371  1.36330932  1.03478442
## [19]  1.85911034  1.29619127 -0.67308106 -0.38921813  2.65499223  0.61682697
## [25] -0.69067483 -3.49897668 -0.95282762 -1.52839940  3.20064276  5.47152372

The results obtained from different meta-analysis approaches are compared below.

true_values <- c(-0.1675687 , -1.9658875 ,-3.7501346, -0.3722893, -1.2669714,  2.1815949, -1.8274545,  2.0032794, -0.7985332 ,-0.9362461 , 0.6539242 ,-0.8254938,  1.1240729,  1.3267165, -1.2057946,1.3967554,  1.1916929 , 0.9041837,  1.7934879,  1.1444330, -0.8233060, -0.5886543,  2.4371479,  0.4882229, -0.8903039, -3.6956073, -1.2576506, -1.7221614,  3.0298406,  5.4704779)

# Create a data frame of the differences
df <- data.frame(
          Model = factor(rep(c("fastMetaHD", "fastMetaHD_est.wscor", "REM", "FEM"),each = N),
                         levels=c("fastMetaHD", "fastMetaHD_est.wscor", "REM", "FEM")),
          Difference = c(true_values - model_fastMetaHD$estimate,
                         true_values - model_est.wscor$estimate,
                         true_values - model_random$estimate,
                         true_values - model_fixed$estimate))

# Create the boxplots
ggplot(df, aes(x = Model, y = Difference, fill = Model)) +
  geom_boxplot() +
  geom_hline(yintercept = 0, linetype = "dashed") +
  scale_fill_manual(values = c("blue", "darkgreen","red","purple")) +
  labs(title = "Differences between true and estimated values for \nfastMetaHD (without divide-and-conquer), REM, and FEM",
       x = "Model",
       y = "Difference (True value - Estiamted values)")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5))

When N is large, users can implement fastMetaHD via the two-step divide-and-conquer approach. For illustrative purposes, we use the same example dataset above to demonstrate the implementation of fastMetaHD using the divide-and-conquer approach within MetaHD.

# Multivariate random-effects meta analysis via divide-and-conquer approach
model_fastMetaHD.DC <- MetaHD(Y = Y,
                              Slist = Slist,
                              method = "multi",
                              useDivideConquer = TRUE)
model_fastMetaHD.DC$estimate
##  [1] -0.1578428 -1.9352834 -3.7598564 -0.3375427 -1.2563954  2.2000801
##  [7] -1.8272919  2.0054146 -0.7308405 -0.9120364  0.6522972 -0.8197314
## [13]  1.1002493  1.2870452 -1.2030986  1.3970863  1.2458113  0.8949060
## [19]  1.7686248  1.1297233 -0.8696852 -0.5973989  2.4082564  0.5099695
## [25] -0.8434881 -3.6713428 -1.2027013 -1.6841568  3.0539740  5.5333904
# Multivariate random-effects meta-analysis via divide-and-conquer approach when within-study correlations are unavailable
model.DC_est.wscor <- MetaHD(Y = Y,
                             Slist = Smat,
                             method = "multi",
                             useDivideConquer = TRUE,
                             est.wscor = TRUE)
model.DC_est.wscor$estimate
##  [1] -0.0605143 -1.8330212 -3.6727761 -0.1625469 -1.2307935  2.2921464
##  [7] -1.7475588  2.1067749 -0.6624231 -0.8485452  0.7721486 -0.7472549
## [13]  1.1507732  1.4154186 -1.1329312  1.5198437  1.2748999  0.9525194
## [19]  1.8061356  1.2429361 -0.7641483 -0.5054921  2.4908075  0.5390861
## [25] -0.7555769 -3.5559144 -1.1954681 -1.5784445  3.1005675  5.5662196
# Create a data frame of the differences
df <- data.frame(
          Model = factor(rep(c("fastMetaHD", "fastMetaHD_est.wscor","REM", "FEM"),each = N),
                         levels=c("fastMetaHD", "fastMetaHD_est.wscor", "REM", "FEM")),
          Difference = c(true_values - model_fastMetaHD.DC$estimate,
                         true_values - model.DC_est.wscor$estimate,
                         true_values - model_random$estimate,
                         true_values - model_fixed$estimate))

# Create the boxplots
ggplot(df, aes(x = Model, y = Difference, fill = Model)) +
  geom_boxplot() +
  geom_hline(yintercept = 0, linetype = "dashed") +
  scale_fill_manual(values = c("blue", "darkgreen", "red", "purple")) +
  labs(title = "Differences between true and estimated values for \nfastMetaHD (with divide-and-conquer), REM, and FEM",
       x = "Model",
       y = "Difference (True value - Estiamted values)")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5))