Vector Cubic Smoothing Spline

Fits a vector cubic smoothing spline.

vsmooth.spline(x, y, w = NULL, df = rep(5, M), spar = NULL,
               i.constraint = diag(M),
               x.constraint = diag(M),
               constraints = list("(Intercepts)" = i.constraint,
                                  x = x.constraint),
               all.knots = FALSE, var.arg = FALSE, scale.w = TRUE,
               nk = NULL, control.spar = list())

Arguments

x: A vector, matrix or a list. If a list, the x component is used. If a matrix, the first column is used. x may also be a complex vector, in which case the real part is used, and the imaginary part is used for the response. In this help file, n is the number of unique values of x.
y: A vector, matrix or a list. If a list, the y component is used. If a matrix, all but the first column is used. In this help file, M is the number of columns of y if there are no constraints on the functions.
w: The weight matrices or the number of observations. If the weight matrices, then this must be a n-row matrix with the elements in matrix-band form (see iam). If a vector, then these are the number of observations. By default, w is the M by M identity matrix, denoted by matrix(1, n, M).
df: Numerical vector containing the degrees of freedom for each component function (smooth). If necessary, the vector is recycled to have length equal to the number of component functions to be estimated (M if there are no constraints), which equals the number of columns of the x-constraint matrix. A value of 2 means a linear fit, and each element of df should lie between 2 and n. The larger the values of df the more wiggly the smooths.
spar: Numerical vector containing the non-negative smoothing parameters for each component function (smooth). If necessary, the vector is recycled to have length equal to the number of component functions to be estimated (M if there are no constraints), which equals the number of columns of the x-constraint matrix. A value of zero means the smooth goes through the data and hence is wiggly. A value of Inf may be assigned, meaning the smooth will be linear. By default, the NULL value of spar means df is used to determine the smoothing parameters.
all.knots: Logical. If TRUE then each distinct value of x will be a knot. By default, only a subset of the unique values of x are used; typically, the number of knots is O(n^0.25) for n large, but if n <= 40 then all the unique values of x are used.
i.constraint: A M-row constraint matrix for the intercepts. It must be of full column rank. By default, the constraint matrix for the intercepts is the M by M identity matrix, meaning no constraints.
x.constraint: A M-row constraint matrix for x. It must be of full column rank. By default, the constraint matrix for the intercepts is the M by M identity matrix, meaning no constraints.
constraints: An alternative to specifying i.constraint and x.constraint, this is a list with two components corresponding to the intercept and x respectively. They must both be a M-row constraint matrix with full column rank.
var.arg: Logical: return the pointwise variances of the fit? Currently, this corresponds only to the nonlinear part of the fit, and may be wrong.
scale.w: Logical. By default, the weights w are scaled so that the diagonal elements have mean 1.
nk: Number of knots. If used, this argument overrides all.knots, and must lie between 6 and n+2 inclusive.
control.spar: See smooth.spline.

Details

The algorithm implemented is detailed in Yee (2000). It involves decomposing the component functions into a linear and nonlinear part, and using B-splines. The cost of the computation is O(n M^3).

The argument spar contains scaled smoothing parameters.

Value

An object of class "vsmooth.spline" (see vsmooth.spline-class).

References

Yee, T. W. (2000). Vector Splines and Other Vector Smoothers. Pages 529–534. In: Bethlehem, J. G. and van der Heijde, P. G. M. Proceedings in Computational Statistics COMPSTAT 2000. Heidelberg: Physica-Verlag.

Author

Thomas W. Yee

Note

This function is quite similar to smooth.spline but offers less functionality. For example, cross validation is not implemented here. For M = 1, the results will be generally different, mainly due to the different way the knots are selected.

The vector cubic smoothing spline which s() represents is computationally demanding for large \(M\). The cost is approximately \(O(n M^3)\) where \(n\) is the number of unique abscissae.

Yet to be done: return the unscaled smoothing parameters.

WARNING

See vgam for information about an important bug.

Examples

nn <- 20; x <- 2 + 5*(nn:1)/nn
x[2:4] <- x[5:7]  # Allow duplication
y1 <- sin(x) + rnorm(nn, sd = 0.13)
y2 <- cos(x) + rnorm(nn, sd = 0.13)
y3 <- 1 + sin(x) + rnorm(nn, sd = 0.13)  # For constraints
y <- cbind(y1, y2, y3)
ww <- cbind(rep(3, nn), 4, (1:nn)/nn)

(fit <- vsmooth.spline(x, y, w = ww, df = 5))
#> Call:
#> vsmooth.spline(x = x, y = y, w = ww, df = 5)
#> 
#> Smoothing Parameter (Spar): 0.5534466, 0.5534466, 0.5359578 
#> 
#> Equivalent Degrees of Freedom (Df): 4.999474, 4.999474, 4.999402 
if (FALSE) { # \dontrun{
plot(fit)  # The 1st & 3rd functions dont differ by a constant
} # }

mat <- matrix(c(1,0,1, 0,1,0), 3, 2)
(fit2 <- vsmooth.spline(x, y, w = ww, df = 5, i.constr = mat,
                        x.constr = mat))
#> Call:
#> vsmooth.spline(x = x, y = y, w = ww, df = 5, i.constraint = mat, 
#>     x.constraint = mat)
#> 
#> Smoothing Parameter (Spar): 0.5535428, 0.5534466 
#> 
#> Equivalent Degrees of Freedom (Df): 4.999470, 4.999474 
#> 
#> Constraint matrices:
#> $`(Intercepts)`
#>      [,1] [,2]
#> [1,]    1    0
#> [2,]    0    1
#> [3,]    1    0
#> 
#> $x
#>      [,1] [,2]
#> [1,]    1    0
#> [2,]    0    1
#> [3,]    1    0
#> 
# The 1st and 3rd functions do differ by a constant:
mycols <- c("orange", "blue", "orange")
if (FALSE)  plot(fit2, lcol = mycols, pcol = mycols, las = 1)  # \dontrun{}

p <- predict(fit, x = model.matrix(fit, type = "lm"), deriv = 0)
max(abs(depvar(fit) - with(p, y)))  # Should be 0
#> [1] 6.033285e-11

par(mfrow = c(3, 1))
ux <- seq(1, 8, len = 100)
for (dd in 1:3) {
  pp <- predict(fit, x = ux, deriv = dd)
if (FALSE) { # \dontrun{
with(pp, matplot(x, y, type = "l", main = paste("deriv =", dd),
                 lwd = 2, ylab = "", cex.axis = 1.5,
                 cex.lab = 1.5, cex.main = 1.5)) } # }
}