Simulate a 3 way balanced ANOVA or linear model, with or without repeated measures.

For teaching basic statistics, it is useful to be able to generate examples suitable for analysis of variance or simple linear models. sim.anova will generate the design matrix of three independent variables (IV1, IV2, IV3) with an arbitrary number of levels and effect sizes for each main effect and interaction. IVs can be either continuous or categorical and can have linear or quadratic effects. Either a single dependent variable or multiple (within subject) dependent variables are generated according to the specified model. The repeated measures are assumed to be tau equivalent with a specified reliability.

sim.anova(es1 = 0, es2 = 0, es3 = 0, es12 = 0, es13 = 0,
    es23 = 0, es123 = 0, es11=0,es22=0, es33=0,n = 2,n1 = 2, n2 = 2, n3 = 2, 
    within=NULL,r=.8,factors=TRUE,center = TRUE,std=TRUE)

Arguments

es1: Effect size of IV1
es2: Effect size of IV2
es3: Effect size of IV3
es12: Effect size of the IV1 x IV2 interaction
es13: Effect size of the IV1 x IV3 interaction
es23: Effect size of the IV2 x IV3 interaction
es123: Effect size of the IV1 x IV2 * IV3 interaction
es11: Effect size of the quadratric term of IV1
es22: Effect size of the quadratric term of IV2
es33: Effect size of the quadratric term of IV3
n: Sample size per cell (if all variables are categorical) or (if at least one variable is continuous), the total sample size
n1: Number of levels of IV1 (0) if continuous
n2: Number of levels of IV2
n3: Number of levels of IV3
within: if not NULL, then within should be a vector of the means of any repeated measures.
r: the correlation between the repeated measures (if they exist). This can be thought of as the reliablility of the measures.
factors: report the IVs as factors rather than numeric
center: center=TRUE provides orthogonal contrasts, center=FALSE adds the minimum value + 1 to all contrasts
std: Standardize the effect sizes by standardizing the IVs

Details

A simple simulation for teaching about ANOVA, regression and reliability. A variety of demonstrations of the relation between anova and lm can be shown.

The default is to produce categorical IVs (factors). For more than two levels of an IV, this will show the difference between the linear model and anova in terms of the comparisons made.

The within vector can be used to add congenerically equivalent dependent variables. These will have intercorrelations (reliabilities) of r and means as specified as values of within.

To demonstrate the effect of centered versus non-centering, make factors = center=FALSE. The default is to center the IVs. By not centering them, the lower order effects will be incorrect given the higher order interaction terms.

Value

y.df is a data.frame of the 3 IV values as well as the DV values.

IV1 ... IV3: Independent variables 1 ... 3
DV: If there is a single dependent variable
DV.1 ... DV.n: If within is specified, then the n within subject dependent variables

Author

William Revelle

Examples

set.seed(42)
data.df <- sim.anova(es1=1,es2=-.5,es13=1)  # two main effect and one interaction
psych::describe(data.df)
#>      vars  n mean   sd median trimmed  mad   min  max range  skew kurtosis   se
#> IV1*    1 16 1.50 0.52   1.50    1.50 0.74  1.00 2.00  1.00  0.00    -2.12 0.13
#> IV2*    2 16 1.50 0.52   1.50    1.50 0.74  1.00 2.00  1.00  0.00    -2.12 0.13
#> IV3*    3 16 1.50 0.52   1.50    1.50 0.74  1.00 2.00  1.00  0.00    -2.12 0.13
#> DV      4 16 0.49 1.66   0.62    0.59 1.96 -2.81 2.47  5.28 -0.59    -0.91 0.42
pairs.panels(data.df)   #show how the design variables are orthogonal

#
data.df <- char2numeric(data.df,flag=FALSE)

summary(lm(DV~IV1*IV2*IV3,data=data.df))
#> 
#> Call:
#> lm(formula = DV ~ IV1 * IV2 * IV3, data = data.df)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -0.8966 -0.3917  0.0000  0.3917  0.8966 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)   
#> (Intercept)   24.073      6.488   3.710  0.00596 **
#> IV1          -13.992      4.104  -3.410  0.00923 **
#> IV2           -9.858      4.104  -2.402  0.04303 * 
#> IV3          -15.515      4.104  -3.781  0.00538 **
#> IV1:IV2        5.987      2.595   2.307  0.04995 * 
#> IV1:IV3        9.411      2.595   3.626  0.00672 **
#> IV2:IV3        5.395      2.595   2.079  0.07128 . 
#> IV1:IV2:IV3   -3.352      1.641  -2.042  0.07540 . 
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 0.8207 on 8 degrees of freedom
#> Multiple R-squared:  0.8702,	Adjusted R-squared:  0.7567 
#> F-statistic: 7.664 on 7 and 8 DF,  p-value: 0.005065
#> 

summary(aov(DV~IV1*IV2*IV3,data=data.df))
#>             Df Sum Sq Mean Sq F value   Pr(>F)    
#> IV1          1  9.749   9.749  14.473 0.005204 ** 
#> IV2          1  0.433   0.433   0.642 0.445963    
#> IV3          1  2.890   2.890   4.290 0.072097 .  
#> IV1:IV2      1  0.918   0.918   1.362 0.276751    
#> IV1:IV3      1 19.202  19.202  28.507 0.000695 ***
#> IV2:IV3      1  0.134   0.134   0.199 0.667572    
#> IV1:IV2:IV3  1  2.810   2.810   4.171 0.075397 .  
#> Residuals    8  5.389   0.674                     
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
lmCor(DV~IV1*IV2*IV3,data=data.df, std=FALSE)

#> Call: lmCor(y = DV ~ IV1 * IV2 * IV3, data = data.df, std = FALSE)
#> 
#> Multiple Regression from raw data 
#> 
#>  DV =  DV 
#>             slope   se     t       p lower.ci upper.ci VIF Vy.x
#> (Intercept)  0.49 0.21  2.41 0.04300     0.02     0.97   1 0.00
#> IV1          1.56 0.41  3.80 0.00520     0.61     2.51   1 0.23
#> IV2         -0.33 0.41 -0.80 0.45000    -1.28     0.62   1 0.01
#> IV3         -0.85 0.41 -2.07 0.07200    -1.80     0.10   1 0.07
#> IV1*IV2      0.96 0.82  1.17 0.28000    -0.93     2.85   1 0.02
#> IV1*IV3      4.38 0.82  5.34 0.00069     2.49     6.27   1 0.46
#> IV2*IV3      0.37 0.82  0.45 0.67000    -1.53     2.26   1 0.00
#> IV1*IV2*IV3 -3.35 1.64 -2.04 0.07500    -7.14     0.43   1 0.07
#> 
#> Residual Standard Error =  0.82  with  8  degrees of freedom
#> 
#>  Multiple Regression
#>       R   R2  Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2       p
#> DV 0.93 0.87 0.28 0.08        0.76     0.03      7.66   7   8 0.00506
set.seed(42)
 #demonstrate the effect of not centering the data on the regression
data.df <- sim.anova(es1=1,es2=.5,es13=1,center=FALSE)  #
psych::describe(data.df)
#>      vars  n mean   sd median trimmed  mad   min  max range  skew kurtosis   se
#> IV1*    1 16 1.50 0.52    1.5    1.50 0.74  1.00 2.00   1.0  0.00    -2.12 0.13
#> IV2*    2 16 1.50 0.52    1.5    1.50 0.74  1.00 2.00   1.0  0.00    -2.12 0.13
#> IV3*    3 16 1.50 0.52    1.5    1.50 0.74  1.00 2.00   1.0  0.00    -2.12 0.13
#> DV      4 16 0.49 1.85    1.0    0.62 1.63 -3.78 3.03   6.8 -0.67    -0.45 0.46
#
#this one is incorrect, because the IVs are not centered
data.df <- char2numeric(data.df,flag=FALSE)
summary(lm(DV~IV1*IV2*IV3,data=data.df)) 
#> 
#> Call:
#> lm(formula = DV ~ IV1 * IV2 * IV3, data = data.df)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -0.8966 -0.3917  0.0000  0.3917  0.8966 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)   
#> (Intercept)   21.168      6.488   3.262  0.01149 * 
#> IV1          -13.992      4.104  -3.410  0.00923 **
#> IV2           -7.921      4.104  -1.930  0.08968 . 
#> IV3          -15.515      4.104  -3.781  0.00538 **
#> IV1:IV2        5.987      2.595   2.307  0.04995 * 
#> IV1:IV3        9.411      2.595   3.626  0.00672 **
#> IV2:IV3        5.395      2.595   2.079  0.07128 . 
#> IV1:IV2:IV3   -3.352      1.641  -2.042  0.07540 . 
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 0.8207 on 8 degrees of freedom
#> Multiple R-squared:  0.8952,	Adjusted R-squared:  0.8035 
#> F-statistic: 9.764 on 7 and 8 DF,  p-value: 0.002274
#> 
data.df <- char2numeric(data.df,flag=FALSE)

summary(aov(DV~IV1*IV2*IV3,data=data.df)) #compare with the lm model
#>             Df Sum Sq Mean Sq F value   Pr(>F)    
#> IV1          1  9.749   9.749  14.473 0.005204 ** 
#> IV2          1 10.337  10.337  15.346 0.004435 ** 
#> IV3          1  2.890   2.890   4.290 0.072097 .  
#> IV1:IV2      1  0.918   0.918   1.362 0.276751    
#> IV1:IV3      1 19.202  19.202  28.507 0.000695 ***
#> IV2:IV3      1  0.134   0.134   0.199 0.667572    
#> IV1:IV2:IV3  1  2.810   2.810   4.171 0.075397 .  
#> Residuals    8  5.389   0.674                     
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#but lmCor by default zero centers which works
lmCor(DV~IV1*IV2*IV3,data=data.df)

#> Call: lmCor(y = DV ~ IV1 * IV2 * IV3, data = data.df)
#> 
#> Multiple Regression from raw data 
#> 
#>  DV =  DV 
#>             slope   se     t       p lower.ci upper.ci VIF Vy.x
#> (Intercept)  0.00 0.11  0.00 1.00000    -0.26     0.26   1 0.00
#> IV1          0.44 0.11  3.80 0.00520     0.17     0.70   1 0.19
#> IV2          0.45 0.11  3.92 0.00440     0.18     0.71   1 0.20
#> IV3         -0.24 0.11 -2.07 0.07200    -0.50     0.03   1 0.06
#> IV1*IV2      0.13 0.11  1.17 0.28000    -0.13     0.40   1 0.02
#> IV1*IV3      0.61 0.11  5.34 0.00069     0.35     0.87   1 0.37
#> IV2*IV3      0.05 0.11  0.45 0.67000    -0.21     0.31   1 0.00
#> IV1*IV2*IV3 -0.23 0.11 -2.04 0.07500    -0.50     0.03   1 0.05
#> 
#> Residual Standard Error =  0.44  with  8  degrees of freedom
#> 
#>  Multiple Regression
#>       R  R2  Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2       p
#> DV 0.95 0.9 0.46 0.21         0.8     0.02      9.76   7   8 0.00227
#now examine multiple levels and quadratic terms
set.seed(42)
data.df <- sim.anova(es1=1,es13=1,n2=3,n3=4,es22=1)
summary(lm(DV~IV1*IV2*IV3,data=data.df))
#> 
#> Call:
#> lm(formula = DV ~ IV1 * IV2 * IV3, data = data.df)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -2.0018 -0.3397  0.0000  0.3397  2.0018 
#> 
#> Coefficients:
#>                 Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)       3.4260     0.7844   4.367 0.000207 ***
#> IV11             -2.7790     1.1093  -2.505 0.019434 *  
#> IV20             -3.0489     1.1093  -2.748 0.011190 *  
#> IV21             -1.2009     1.1093  -1.083 0.289778    
#> IV3-1            -1.5254     1.1093  -1.375 0.181825    
#> IV31             -4.4713     1.1093  -4.031 0.000488 ***
#> IV33             -5.1016     1.1093  -4.599 0.000115 ***
#> IV11:IV20         1.5126     1.5689   0.964 0.344596    
#> IV11:IV21         1.3254     1.5689   0.845 0.406549    
#> IV11:IV3-1        3.2038     1.5689   2.042 0.052272 .  
#> IV11:IV31         6.1556     1.5689   3.924 0.000639 ***
#> IV11:IV33         8.5233     1.5689   5.433  1.4e-05 ***
#> IV20:IV3-1        2.1234     1.5689   1.353 0.188512    
#> IV21:IV3-1        1.1223     1.5689   0.715 0.481280    
#> IV20:IV31         1.3930     1.5689   0.888 0.383389    
#> IV21:IV31         2.2484     1.5689   1.433 0.164711    
#> IV20:IV33         1.5838     1.5689   1.010 0.322782    
#> IV21:IV33         1.5504     1.5689   0.988 0.332899    
#> IV11:IV20:IV3-1  -2.6968     2.2187  -1.215 0.236008    
#> IV11:IV21:IV3-1  -1.2671     2.2187  -0.571 0.573237    
#> IV11:IV20:IV31   -0.4246     2.2187  -0.191 0.849846    
#> IV11:IV21:IV31   -3.3169     2.2187  -1.495 0.147959    
#> IV11:IV20:IV33   -2.4872     2.2187  -1.121 0.273368    
#> IV11:IV21:IV33   -0.6422     2.2187  -0.289 0.774713    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 1.109 on 24 degrees of freedom
#> Multiple R-squared:  0.8651,	Adjusted R-squared:  0.7359 
#> F-statistic: 6.693 on 23 and 24 DF,  p-value: 8.495e-06
#> 
summary(aov(DV~IV1*IV2*IV3,data=data.df))
#>             Df Sum Sq Mean Sq F value   Pr(>F)    
#> IV1          1  36.11   36.11  29.346 1.45e-05 ***
#> IV2          2  32.23   16.12  13.096 0.000143 ***
#> IV3          3  10.76    3.59   2.914 0.054955 .  
#> IV1:IV2      2   0.03    0.01   0.011 0.988728    
#> IV1:IV3      3  98.02   32.67  26.549 8.49e-08 ***
#> IV2:IV3      6   3.71    0.62   0.502 0.800543    
#> IV1:IV2:IV3  6   8.58    1.43   1.162 0.358682    
#> Residuals   24  29.54    1.23                     
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
pairs.panels(data.df)

#
data.df <- sim.anova(es1=1,es2=-.5,within=c(-1,0,1),n=10)
pairs.panels(data.df)