Calculates the correlation matrix between the numeric variables in a given dataframe and includes descriptives (mean and standard deviation) - ready for creating a nice table with report_cor_table()

cor_matrix(
  data,
  var_names = NULL,
  missing = c("pairwise", "listwise", "fiml"),
  conf_level = 0.95,
  method = c("pearson", "spearman", "kendall"),
  adjust = "none",
  bootstrap = NULL,
  seed = NULL
)

Source

Adapted from http://www.sthda.com/english/wiki/elegant-correlation-table-using-xtable-r-package

Arguments

data

Dataframe. Only numeric variables are included into correlation matrix.

var_names

A named character vector with new variable names or a tibble as provided by get_rename_tribbles() for variables. If NULL, then the variables are not renamed. If names are provided, only the variables included here are retained. This is most helpful when the results are passed to some print function, such as report_cor_table()

missing

How should missing data be dealt with? Options are "pairwise" deletion, "listwise" deletion or "fiml" for full information maximum likelihood estimation of the correlation table. Note that if you use "fiml", this will also be applied to the estimation of means and standard deviations.

conf_level

Confidence level to use for confidence intervals, defaults to .95

method

method="pearson" is the default value. The alternatives to be passed to cor are "spearman" and "kendall". These last two are much slower, particularly for big data sets.

adjust

What adjustment for multiple tests should be used? ("holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"). See p.adjust for details about why to use "holm" rather than "bonferroni").

bootstrap

When using FIML estimation (with missing = "fiml"), significance tests and confidence intervals can be bootstrapped. If you want to do that, pass the number of desired bootstrap resamples (e.g., 5000) to this parameter, but beware that this can take a while.

seed

Pass an integer to set the seed for bootstrapping and thus make this reproducible

Value

A list including the correlation matrix, p-values, standard errors, t-values, pairwise number of observations, confidence intervals, descriptives and (if var_names was provided) a tibble with old and new variable names

Details

By setting missing to "fiml", this function uses the lavaan package to estimate the correlations (and descriptives) by using a full-information maximum likelihood algorithm. This estimates the covariances separately for each pattern of missingness and then combines them based on the frequency of each pattern. This will take longer than pairwise deletion, but should yield more precise estimates in the presence of missing data. To date, FIML correlation matrices can be obtained with the psych::corFiml() function, but that function does not report significance tests or confidence intervals, or with the lavaan::lavCor() function - this function also uses lavaan for the estimation, but facilitates bootstrapping and returns the results in a more accessible format.

References

For evidence on the utility of the FIML estimator, see Enders, C. K. (2001) The performance of the full information maximum likelihood estimator in multiple regression models with missing data