sim.anova.Rd
For teaching basic statistics, it is useful to be able to generate examples suitable for analysis of variance or simple linear models. sim.anova will generate the design matrix of three independent variables (IV1, IV2, IV3) with an arbitrary number of levels and effect sizes for each main effect and interaction. IVs can be either continuous or categorical and can have linear or quadratic effects. Either a single dependent variable or multiple (within subject) dependent variables are generated according to the specified model. The repeated measures are assumed to be tau equivalent with a specified reliability.
sim.anova(es1 = 0, es2 = 0, es3 = 0, es12 = 0, es13 = 0,
es23 = 0, es123 = 0, es11=0,es22=0, es33=0,n = 2,n1 = 2, n2 = 2, n3 = 2,
within=NULL,r=.8,factors=TRUE,center = TRUE,std=TRUE)
Effect size of IV1
Effect size of IV2
Effect size of IV3
Effect size of the IV1 x IV2 interaction
Effect size of the IV1 x IV3 interaction
Effect size of the IV2 x IV3 interaction
Effect size of the IV1 x IV2 * IV3 interaction
Effect size of the quadratric term of IV1
Effect size of the quadratric term of IV2
Effect size of the quadratric term of IV3
Sample size per cell (if all variables are categorical) or (if at least one variable is continuous), the total sample size
Number of levels of IV1 (0) if continuous
Number of levels of IV2
Number of levels of IV3
if not NULL, then within should be a vector of the means of any repeated measures.
the correlation between the repeated measures (if they exist). This can be thought of as the reliablility of the measures.
report the IVs as factors rather than numeric
center=TRUE provides orthogonal contrasts, center=FALSE adds the minimum value + 1 to all contrasts
Standardize the effect sizes by standardizing the IVs
A simple simulation for teaching about ANOVA, regression and reliability. A variety of demonstrations of the relation between anova and lm can be shown.
The default is to produce categorical IVs (factors). For more than two levels of an IV, this will show the difference between the linear model and anova in terms of the comparisons made.
The within vector can be used to add congenerically equivalent dependent variables. These will have intercorrelations (reliabilities) of r and means as specified as values of within.
To demonstrate the effect of centered versus non-centering, make factors = center=FALSE. The default is to center the IVs. By not centering them, the lower order effects will be incorrect given the higher order interaction terms.
y.df is a data.frame of the 3 IV values as well as the DV values.
Independent variables 1 ... 3
If there is a single dependent variable
If within is specified, then the n within subject dependent variables
The general set of simulation functions in the psych package sim
set.seed(42)
data.df <- sim.anova(es1=1,es2=-.5,es13=1) # two main effect and one interaction
psych::describe(data.df)
#> vars n mean sd median trimmed mad min max range skew kurtosis se
#> IV1* 1 16 1.50 0.52 1.50 1.50 0.74 1.00 2.00 1.00 0.00 -2.12 0.13
#> IV2* 2 16 1.50 0.52 1.50 1.50 0.74 1.00 2.00 1.00 0.00 -2.12 0.13
#> IV3* 3 16 1.50 0.52 1.50 1.50 0.74 1.00 2.00 1.00 0.00 -2.12 0.13
#> DV 4 16 0.49 1.66 0.62 0.59 1.96 -2.81 2.47 5.28 -0.59 -0.91 0.42
pairs.panels(data.df) #show how the design variables are orthogonal
#
data.df <- char2numeric(data.df,flag=FALSE)
summary(lm(DV~IV1*IV2*IV3,data=data.df))
#>
#> Call:
#> lm(formula = DV ~ IV1 * IV2 * IV3, data = data.df)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.8966 -0.3917 0.0000 0.3917 0.8966
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 24.073 6.488 3.710 0.00596 **
#> IV1 -13.992 4.104 -3.410 0.00923 **
#> IV2 -9.858 4.104 -2.402 0.04303 *
#> IV3 -15.515 4.104 -3.781 0.00538 **
#> IV1:IV2 5.987 2.595 2.307 0.04995 *
#> IV1:IV3 9.411 2.595 3.626 0.00672 **
#> IV2:IV3 5.395 2.595 2.079 0.07128 .
#> IV1:IV2:IV3 -3.352 1.641 -2.042 0.07540 .
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 0.8207 on 8 degrees of freedom
#> Multiple R-squared: 0.8702, Adjusted R-squared: 0.7567
#> F-statistic: 7.664 on 7 and 8 DF, p-value: 0.005065
#>
summary(aov(DV~IV1*IV2*IV3,data=data.df))
#> Df Sum Sq Mean Sq F value Pr(>F)
#> IV1 1 9.749 9.749 14.473 0.005204 **
#> IV2 1 0.433 0.433 0.642 0.445963
#> IV3 1 2.890 2.890 4.290 0.072097 .
#> IV1:IV2 1 0.918 0.918 1.362 0.276751
#> IV1:IV3 1 19.202 19.202 28.507 0.000695 ***
#> IV2:IV3 1 0.134 0.134 0.199 0.667572
#> IV1:IV2:IV3 1 2.810 2.810 4.171 0.075397 .
#> Residuals 8 5.389 0.674
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
lmCor(DV~IV1*IV2*IV3,data=data.df, std=FALSE)
#> Call: lmCor(y = DV ~ IV1 * IV2 * IV3, data = data.df, std = FALSE)
#>
#> Multiple Regression from raw data
#>
#> DV = DV
#> slope se t p lower.ci upper.ci VIF Vy.x
#> (Intercept) 0.49 0.21 2.41 0.04300 0.02 0.97 1 0.00
#> IV1 1.56 0.41 3.80 0.00520 0.61 2.51 1 0.23
#> IV2 -0.33 0.41 -0.80 0.45000 -1.28 0.62 1 0.01
#> IV3 -0.85 0.41 -2.07 0.07200 -1.80 0.10 1 0.07
#> IV1*IV2 0.96 0.82 1.17 0.28000 -0.93 2.85 1 0.02
#> IV1*IV3 4.38 0.82 5.34 0.00069 2.49 6.27 1 0.46
#> IV2*IV3 0.37 0.82 0.45 0.67000 -1.53 2.26 1 0.00
#> IV1*IV2*IV3 -3.35 1.64 -2.04 0.07500 -7.14 0.43 1 0.07
#>
#> Residual Standard Error = 0.82 with 8 degrees of freedom
#>
#> Multiple Regression
#> R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
#> DV 0.93 0.87 0.28 0.08 0.76 0.03 7.66 7 8 0.00506
set.seed(42)
#demonstrate the effect of not centering the data on the regression
data.df <- sim.anova(es1=1,es2=.5,es13=1,center=FALSE) #
psych::describe(data.df)
#> vars n mean sd median trimmed mad min max range skew kurtosis se
#> IV1* 1 16 1.50 0.52 1.5 1.50 0.74 1.00 2.00 1.0 0.00 -2.12 0.13
#> IV2* 2 16 1.50 0.52 1.5 1.50 0.74 1.00 2.00 1.0 0.00 -2.12 0.13
#> IV3* 3 16 1.50 0.52 1.5 1.50 0.74 1.00 2.00 1.0 0.00 -2.12 0.13
#> DV 4 16 0.49 1.85 1.0 0.62 1.63 -3.78 3.03 6.8 -0.67 -0.45 0.46
#
#this one is incorrect, because the IVs are not centered
data.df <- char2numeric(data.df,flag=FALSE)
summary(lm(DV~IV1*IV2*IV3,data=data.df))
#>
#> Call:
#> lm(formula = DV ~ IV1 * IV2 * IV3, data = data.df)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.8966 -0.3917 0.0000 0.3917 0.8966
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 21.168 6.488 3.262 0.01149 *
#> IV1 -13.992 4.104 -3.410 0.00923 **
#> IV2 -7.921 4.104 -1.930 0.08968 .
#> IV3 -15.515 4.104 -3.781 0.00538 **
#> IV1:IV2 5.987 2.595 2.307 0.04995 *
#> IV1:IV3 9.411 2.595 3.626 0.00672 **
#> IV2:IV3 5.395 2.595 2.079 0.07128 .
#> IV1:IV2:IV3 -3.352 1.641 -2.042 0.07540 .
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 0.8207 on 8 degrees of freedom
#> Multiple R-squared: 0.8952, Adjusted R-squared: 0.8035
#> F-statistic: 9.764 on 7 and 8 DF, p-value: 0.002274
#>
data.df <- char2numeric(data.df,flag=FALSE)
summary(aov(DV~IV1*IV2*IV3,data=data.df)) #compare with the lm model
#> Df Sum Sq Mean Sq F value Pr(>F)
#> IV1 1 9.749 9.749 14.473 0.005204 **
#> IV2 1 10.337 10.337 15.346 0.004435 **
#> IV3 1 2.890 2.890 4.290 0.072097 .
#> IV1:IV2 1 0.918 0.918 1.362 0.276751
#> IV1:IV3 1 19.202 19.202 28.507 0.000695 ***
#> IV2:IV3 1 0.134 0.134 0.199 0.667572
#> IV1:IV2:IV3 1 2.810 2.810 4.171 0.075397 .
#> Residuals 8 5.389 0.674
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#but lmCor by default zero centers which works
lmCor(DV~IV1*IV2*IV3,data=data.df)
#> Call: lmCor(y = DV ~ IV1 * IV2 * IV3, data = data.df)
#>
#> Multiple Regression from raw data
#>
#> DV = DV
#> slope se t p lower.ci upper.ci VIF Vy.x
#> (Intercept) 0.00 0.11 0.00 1.00000 -0.26 0.26 1 0.00
#> IV1 0.44 0.11 3.80 0.00520 0.17 0.70 1 0.19
#> IV2 0.45 0.11 3.92 0.00440 0.18 0.71 1 0.20
#> IV3 -0.24 0.11 -2.07 0.07200 -0.50 0.03 1 0.06
#> IV1*IV2 0.13 0.11 1.17 0.28000 -0.13 0.40 1 0.02
#> IV1*IV3 0.61 0.11 5.34 0.00069 0.35 0.87 1 0.37
#> IV2*IV3 0.05 0.11 0.45 0.67000 -0.21 0.31 1 0.00
#> IV1*IV2*IV3 -0.23 0.11 -2.04 0.07500 -0.50 0.03 1 0.05
#>
#> Residual Standard Error = 0.44 with 8 degrees of freedom
#>
#> Multiple Regression
#> R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p
#> DV 0.95 0.9 0.46 0.21 0.8 0.02 9.76 7 8 0.00227
#now examine multiple levels and quadratic terms
set.seed(42)
data.df <- sim.anova(es1=1,es13=1,n2=3,n3=4,es22=1)
summary(lm(DV~IV1*IV2*IV3,data=data.df))
#>
#> Call:
#> lm(formula = DV ~ IV1 * IV2 * IV3, data = data.df)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2.0018 -0.3397 0.0000 0.3397 2.0018
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 3.4260 0.7844 4.367 0.000207 ***
#> IV11 -2.7790 1.1093 -2.505 0.019434 *
#> IV20 -3.0489 1.1093 -2.748 0.011190 *
#> IV21 -1.2009 1.1093 -1.083 0.289778
#> IV3-1 -1.5254 1.1093 -1.375 0.181825
#> IV31 -4.4713 1.1093 -4.031 0.000488 ***
#> IV33 -5.1016 1.1093 -4.599 0.000115 ***
#> IV11:IV20 1.5126 1.5689 0.964 0.344596
#> IV11:IV21 1.3254 1.5689 0.845 0.406549
#> IV11:IV3-1 3.2038 1.5689 2.042 0.052272 .
#> IV11:IV31 6.1556 1.5689 3.924 0.000639 ***
#> IV11:IV33 8.5233 1.5689 5.433 1.4e-05 ***
#> IV20:IV3-1 2.1234 1.5689 1.353 0.188512
#> IV21:IV3-1 1.1223 1.5689 0.715 0.481280
#> IV20:IV31 1.3930 1.5689 0.888 0.383389
#> IV21:IV31 2.2484 1.5689 1.433 0.164711
#> IV20:IV33 1.5838 1.5689 1.010 0.322782
#> IV21:IV33 1.5504 1.5689 0.988 0.332899
#> IV11:IV20:IV3-1 -2.6968 2.2187 -1.215 0.236008
#> IV11:IV21:IV3-1 -1.2671 2.2187 -0.571 0.573237
#> IV11:IV20:IV31 -0.4246 2.2187 -0.191 0.849846
#> IV11:IV21:IV31 -3.3169 2.2187 -1.495 0.147959
#> IV11:IV20:IV33 -2.4872 2.2187 -1.121 0.273368
#> IV11:IV21:IV33 -0.6422 2.2187 -0.289 0.774713
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 1.109 on 24 degrees of freedom
#> Multiple R-squared: 0.8651, Adjusted R-squared: 0.7359
#> F-statistic: 6.693 on 23 and 24 DF, p-value: 8.495e-06
#>
summary(aov(DV~IV1*IV2*IV3,data=data.df))
#> Df Sum Sq Mean Sq F value Pr(>F)
#> IV1 1 36.11 36.11 29.346 1.45e-05 ***
#> IV2 2 32.23 16.12 13.096 0.000143 ***
#> IV3 3 10.76 3.59 2.914 0.054955 .
#> IV1:IV2 2 0.03 0.01 0.011 0.988728
#> IV1:IV3 3 98.02 32.67 26.549 8.49e-08 ***
#> IV2:IV3 6 3.71 0.62 0.502 0.800543
#> IV1:IV2:IV3 6 8.58 1.43 1.162 0.358682
#> Residuals 24 29.54 1.23
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
pairs.panels(data.df)
#
data.df <- sim.anova(es1=1,es2=-.5,within=c(-1,0,1),n=10)
pairs.panels(data.df)