This runs lm() after standardising all continuous variables, while leaving factors and character variables intact.

lm_std(
  formula,
  data = NULL,
  weights = NULL,
  weighted_standardize = "auto",
  ...
)

Arguments

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under ‘Details’.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which lm is called.

weights

an optional vector of weights to be used in the fitting process. Should be NULL or a numeric vector. If non-NULL, weighted least squares is used with weights weights (that is, minimizing sum(w*e^2)); otherwise ordinary least squares is used. See also ‘Details’,

weighted_standardize

How to standardize numeric variables when weights are present. Can be one of "auto" (the default), TRUE, or FALSE. If "auto", weighted standardization is performed if a weights argument is provided. See details.

...

Additional arguments passed to lm(). Note that subset cannot be used here and should be applied to data beforehand.

Value

An lm object with standardized coefficients.

Standardization Method

When using weights, this function calculates standardized coefficients by default using weighted means and standard deviations. This ensures that the resulting coefficients can be interpreted as the estimated change in standard deviations of Y for a one standard deviation change in X. If you require coefficients based on unweighted sample standard deviations, you can set weighted_standardize = FALSE. This is fairly common practice, but the results should not be interpreted as population estimates.

Transformations and Interactions

Note that standardization is applied to the raw numeric variables from the data frame. Any in-formula transformations (e.g., log(x), I(x^2)) or interactions (x1 * x2) are applied by lm() after the base variables (x, x1, x2) have been standardized. This is important for the interpretation of the resulting coefficients.

Internal Details

In the model call, the weights variable will always be internally renamed to .weights. This may be relevant if you update the model object later.

References

See Fox, J. (2015). Applied Regression Analysis and Generalized Linear Models. Sage.

Examples

# Basic usage with mtcars dataset
lm_std(mpg ~ cyl + disp, data = mtcars)
#> Warning: The following numeric variables have fewer than three distinct values: cyl.
#> Consider converting them to factors as standardizing them is typically not
#> recommended.
#> 
#> Call:
#> stats::lm(formula = mpg ~ cyl + disp, data = data)
#> 
#> Coefficients:
#> (Intercept)          cyl         disp  
#>  -2.460e-17   -4.703e-01   -4.233e-01  
#> 

# Using weights
mtcars$wt_sample <- runif(nrow(mtcars), 1, 3)
lm_std(mpg ~ cyl + disp, data = mtcars, weights = wt_sample)
#> Warning: The following numeric variables have fewer than three distinct values: cyl.
#> Consider converting them to factors as standardizing them is typically not
#> recommended.
#> Standardizing variables using weighted means and standard deviations.
#> 
#> Call:
#> stats::lm(formula = mpg ~ cyl + disp, data = data, weights = .weights)
#> 
#> Coefficients:
#> (Intercept)          cyl         disp  
#>   1.618e-17   -5.802e-01   -3.002e-01  
#> 

# Handling variables with few levels
mtcars$am_factor <- factor(mtcars$am)
lm_std(mpg ~ cyl + disp + am_factor, data = mtcars)
#> Warning: The following numeric variables have fewer than three distinct values: cyl.
#> Consider converting them to factors as standardizing them is typically not
#> recommended.
#> 
#> Call:
#> stats::lm(formula = mpg ~ cyl + disp + am_factor, data = data)
#> 
#> Coefficients:
#> (Intercept)          cyl         disp   am_factor1  
#>     -0.1300      -0.4795      -0.3206       0.3200  
#>