GP_fit.Rd
For an (n x d) design matrix, X
,
and the corresponding (n x 1) simulator output Y
,
this function fits the GP model and returns the parameter estimates.
The optimization routine assumes that
the inputs are scaled to the unit hypercube \([0,1]^d\).
the (n x d
) design matrix
the (n x 1
) vector of simulator outputs.
a vector of parameters used in the search for optimal beta (search grid size, percent, number of clusters). See `Details'.
a parameter used in computing the nugget. See `Details'.
logical, if TRUE
, will provide information on the
optim
runs
the maximum number of iterations within optim
,
defaults to 100
a list of parameters for the specifing the correlation to be
used. See corr_matrix
.
a matrix of potentially likely starting values for
correlation hyperparameters for the optim
runs, i.e., initial
guess of the d-vector beta
an object of class GP
containing parameter estimates
beta
and sig2
, nugget parameter delta
, the data
(X
and Y
), and a specification of the correlation structure
used.
This function fits the following GP model,
\(y(x) = \mu + Z(x)\),
\(x \in [0,1]^{d}\), where \(Z(x)\) is
a GP with mean 0, \(Var(Z(x_i)) = \sigma^2\), and
\(Cov(Z(x_i),Z(x_j)) = \sigma^2R_{ij}\). Entries in covariance matrix R are determined by
corr
and parameterized by beta
, a d
-vector of
parameters. For computational stability \(R^{-1}\) is replaced with
\(R_{\delta_{lb}}^{-1}\), where \(R_{\delta{lb}} = R + \delta_{lb}I\)
and \(\delta_{lb}\) is the nugget parameter described in Ranjan et al.
(2011).
The parameter estimate beta
is obtained by minimizing
the deviance using a multi-start gradient based search (L-BFGS-B)
algorithm. The starting points are selected using the k-means
clustering algorithm on a large maximin LHD for values of
beta
, after discarding beta
vectors
with high deviance. The control
parameter determines the
quality of the starting points of the L-BFGS-B algoritm.
control
is a vector of three tunable parameters used
in the deviance optimization algorithm. The default values
correspond to choosing 2*d clusters (using k-means clustering
algorithm) based on 80*d best points (smallest deviance,
refer to GP_deviance
) from a 200*d - point
random maximin LHD in beta
. One can change these values
to balance the trade-off between computational cost and robustness
of likelihood optimization (or prediction accuracy).
For details see MacDonald et al. (2015).
The nug_thres
parameter is outlined in Ranjan et al. (2011) and is
used in finding the lower bound on the nugget
(\(\delta_{lb}\)).
MacDonald, K.B., Ranjan, P. and Chipman, H. (2015). GPfit: An R
Package for Fitting a Gaussian Process Model to Deterministic Simulator
Outputs. Journal of Statistical Software, 64(12), 1-23.
http://www.jstatsoft.org/v64/i12/
Ranjan, P., Haynes, R., and Karsten, R. (2011). A Computationally Stable Approach to Gaussian Process Interpolation of Deterministic Computer Simulation Data, Technometrics, 53(4), 366 - 378.
plot
for plotting in 1 and 2 dimensions; predict
for predicting the response and error surfaces; optim
for information on the L-BFGS-B procedure; GP_deviance
for computing the deviance.
## 1D Example 1
n = 5
d = 1
computer_simulator <- function(x){
x = 2 * x + 0.5
y = sin(10 * pi * x) / (2 * x) + (x - 1)^4
return(y)
}
set.seed(3)
library(lhs)
x = maximinLHS(n, d)
y = computer_simulator(x)
GPmodel = GP_fit(x, y)
print(GPmodel)
#>
#> Number Of Observations: n = 5
#> Input Dimensions: d = 1
#>
#> Correlation: Exponential (power = 1.95)
#> Correlation Parameters:
#> beta_hat
#> [1] 0.6433793
#>
#> sigma^2_hat: [1] 7.262407
#>
#> delta_lb(beta_hat): [1] 0
#>
#> nugget threshold parameter: 20
#>
## 1D Example 2
n = 7
d = 1
computer_simulator <- function(x) {
y <- log(x + 0.1) + sin(5 * pi * x)
return(y)
}
set.seed(1)
library(lhs)
x = maximinLHS(n, d)
y = computer_simulator(x)
GPmodel = GP_fit(x, y)
print(GPmodel, digits = 4)
#>
#> Number Of Observations: n = 7
#> Input Dimensions: d = 1
#>
#> Correlation: Exponential (power = 1.95)
#> Correlation Parameters:
#> beta_hat
#> [1] 1.698
#>
#> sigma^2_hat: [1] 0.8583
#>
#> delta_lb(beta_hat): [1] 0
#>
#> nugget threshold parameter: 20
#>
## 2D Example: GoldPrice Function
computer_simulator <- function(x) {
x1 = 4 * x[, 1] - 2
x2 = 4 * x[, 2] - 2
t1 = 1 + (x1 + x2 + 1)^2 * (19 - 14 * x1 + 3 * x1^2 - 14 * x2 +
6 * x1 *x2 + 3 * x2^2)
t2 = 30 + (2 * x1 - 3 * x2)^2 * (18 - 32 * x1 + 12 * x1^2 + 48 * x2 -
36 * x1 * x2 + 27 * x2^2)
y = t1 * t2
return(y)
}
n = 30
d = 2
set.seed(1)
library(lhs)
x = maximinLHS(n, d)
y = computer_simulator(x)
GPmodel = GP_fit(x, y)
#> Warning: NaNs produced
#> Error in GP_deviance(beta = row, X = X, Y = Y, nug_thres = nug_thres, corr = corr): Infinite values of the Deviance Function,
#> unable to find optimum parameters
print(GPmodel)
#>
#> Number Of Observations: n = 7
#> Input Dimensions: d = 1
#>
#> Correlation: Exponential (power = 1.95)
#> Correlation Parameters:
#> beta_hat
#> [1] 1.69817
#>
#> sigma^2_hat: [1] 0.858342
#>
#> delta_lb(beta_hat): [1] 0
#>
#> nugget threshold parameter: 20
#>