Prediction function for factor analysis, principal components (pca), bestScales

Finds predicted factor/component scores from a factor analysis or principal components analysis (pca) of data set A predicted to data set B. Predicted factor scores use the weights matrix used to find estimated factor scores, predicted components use the loadings matrix. Scores are either standardized with respect to the prediction sample or based upon the original data. Predicted scores from a bestScales model are based upon the statistics from the original sample.

# S3 method for class 'psych'
predict(object, data,old.data,options=NULL,missing=FALSE,impute="none",...)

Arguments

object: the result of a factor analysis, principal components analysis (pca) or bestScales of data set A
data: Data set B, of the same number of variables as data set A.
old.data: if specified, the data set B will be standardized in terms of values from the old data. This is probably the preferred option. This is done automatically if object is from bestScales
options: scoring options for bestScales objects ("best.keys","weights","optimal.keys","optimal.weights")
missing: If missing=FALSE, cases with missing data are given NA scores, otherwise they are given the values based upon the wts x complete data
impute: Should missing cases be replaced by "means", "medians" or treated as missing ("none" is the default
...: More options to pass to predictions

Value

Predicted factor/components/criteria scores. If predicting from either fa or pca,the scores are based upon standardized items where the standardization is either that of the original data (old.data) or of the prediction set. This latter case can lead to confusion if just a small number of predicted scores are found.

If the object is from bestScales, unit weighted scales are found (by default) using the best.keys and the predicted scores are then put into the metric of the means and standard deviations of the derivation sample. Other scoring key options may be specified using the "options" parameter. Possible values are best.keys","weights","optimal.keys","optimal.weights". See bestScales for details.

By default, predicted scores are found by the matrix product of the standardized data with the factor or regression weights. If missing is TRUE, then the predicted scores are the mean of the standardized data x weights for those data points that are not NA.

Author

William Revelle

Note

Thanks to Reinhold Hatzinger for the suggestion and request and to Sarah McDougald for the bestScales prediction.

Examples

set.seed(42)
x <- sim.item(12,500)
f2 <- fa(x[1:250,],2,scores="regression")  # a two factor solution
p2 <- principal(x[1:250,],2,scores=TRUE)  # a two component solution
round(cor(f2$scores,p2$scores),2)  #correlate the components and factors from the A set
#>      RC1  RC2
#> MR1 1.00 0.07
#> MR2 0.05 1.00
#find the predicted scores (The B set)
pf2 <- predict(f2,x[251:500,],x[1:250,]) 

  #use the original data for standardization values 
pp2 <- predict(p2,x[251:500,],x[1:250,]) 
 #standardized based upon the first set 
round(cor(pf2,pp2),2)   #find the correlations in the B set
#>       RC1   RC2
#> MR1  1.00 -0.02
#> MR2 -0.02  1.00
#test how well these predicted scores match the factor scores from the second set
fp2 <- fa(x[251:500,],2,scores=TRUE)
round(cor(fp2$scores,pf2),2)
#>       MR1   MR2
#> MR1  0.01 -0.98
#> MR2 -0.98  0.04

pf2.n <- predict(f2,x[251:500,])  #Standardized based upon the new data set
round(cor(fp2$scores,pf2.n))   
#>     MR1 MR2
#> MR1   0  -1
#> MR2  -1   0
   #predict factors of set two from factors of set 1, factor order is arbitrary


#note that the signs of the factors in the second set are arbitrary
# \donttest{
#predictions from bestScales
#the derivation sample
bs <- bestScales(bfi[1:1400,], cs(gender,education,age),folds=10,p.keyed=.5) 
#> Number of iterations set to the number of folds =  10
pred <- predict(bs,bfi[1401:2800,]) #The prediction sample
cor2(pred,bfi[1401:2800,26:28] ) #the validity of the prediction
#>           gender education  age
#> gender      0.27     -0.01 0.07
#> education   0.05      0.16 0.12
#> age         0.10      0.06 0.22
summary(bs) #compare with bestScales cross validations
#> 
#> Call = bestScales(x = bfi[1:1400, ], criteria = cs(gender, education, 
#>     age), folds = 10, p.keyed = 0.5)
#>           derivation.mean derivation.sd validation.m validation.sd final.valid
#> gender               0.33        0.0239         0.29         0.091        0.31
#> education            0.17        0.0094         0.17         0.077        0.17
#> age                  0.25        0.0108         0.24         0.121        0.25
#>           final.wtd N.wtd
#> gender         0.31    10
#> education      0.19    10
#> age            0.25    10
# }