This runs lm() after standardising all continuous variables, while leaving factors and character variables intact.
lm_std(
formula,
data = NULL,
weights = NULL,
weighted_standardize = "auto",
...
)
an object of class "formula"
(or one that
can be coerced to that class): a symbolic description of the
model to be fitted. The details of model specification are given
under ‘Details’.
an optional data frame, list or environment (or object
coercible by as.data.frame
to a data frame) containing
the variables in the model. If not found in data
, the
variables are taken from environment(formula)
,
typically the environment from which lm
is called.
an optional vector of weights to be used in the fitting
process. Should be NULL
or a numeric vector.
If non-NULL, weighted least squares is used with weights
weights
(that is, minimizing sum(w*e^2)
); otherwise
ordinary least squares is used. See also ‘Details’,
How to standardize numeric variables when weights
are present. Can be one of "auto"
(the default), TRUE
, or FALSE
.
If "auto"
, weighted standardization is performed if a weights
argument is provided. See details.
Additional arguments passed to lm()
. Note that subset
cannot
be used here and should be applied to data
beforehand.
An lm
object with standardized coefficients.
When using weights, this function calculates standardized coefficients by
default using weighted means and standard deviations. This ensures that the
resulting coefficients can be interpreted as the estimated change in standard
deviations of Y for a one standard deviation change in X. If you require
coefficients based on unweighted sample standard deviations, you can set
weighted_standardize = FALSE
. This is fairly common practice, but the
results should not be interpreted as population estimates.
Note that standardization is applied to the raw numeric variables from the
data
frame. Any in-formula transformations (e.g., log(x)
, I(x^2)
) or
interactions (x1 * x2
) are applied by lm()
after the base variables
(x
, x1
, x2
) have been standardized. This is important for the
interpretation of the resulting coefficients.
In the model call, the weights variable will always be internally renamed to
.weights
. This may be relevant if you update the model object later.
See Fox, J. (2015). Applied Regression Analysis and Generalized Linear Models. Sage.
# Basic usage with mtcars dataset
lm_std(mpg ~ cyl + disp, data = mtcars)
#> Warning: The following numeric variables have fewer than three distinct values: cyl.
#> Consider converting them to factors as standardizing them is typically not
#> recommended.
#>
#> Call:
#> stats::lm(formula = mpg ~ cyl + disp, data = data)
#>
#> Coefficients:
#> (Intercept) cyl disp
#> -2.460e-17 -4.703e-01 -4.233e-01
#>
# Using weights
mtcars$wt_sample <- runif(nrow(mtcars), 1, 3)
lm_std(mpg ~ cyl + disp, data = mtcars, weights = wt_sample)
#> Warning: The following numeric variables have fewer than three distinct values: cyl.
#> Consider converting them to factors as standardizing them is typically not
#> recommended.
#> Standardizing variables using weighted means and standard deviations.
#>
#> Call:
#> stats::lm(formula = mpg ~ cyl + disp, data = data, weights = .weights)
#>
#> Coefficients:
#> (Intercept) cyl disp
#> 1.618e-17 -5.802e-01 -3.002e-01
#>
# Handling variables with few levels
mtcars$am_factor <- factor(mtcars$am)
lm_std(mpg ~ cyl + disp + am_factor, data = mtcars)
#> Warning: The following numeric variables have fewer than three distinct values: cyl.
#> Consider converting them to factors as standardizing them is typically not
#> recommended.
#>
#> Call:
#> stats::lm(formula = mpg ~ cyl + disp + am_factor, data = data)
#>
#> Coefficients:
#> (Intercept) cyl disp am_factor1
#> -0.1300 -0.4795 -0.3206 0.3200
#>