steamUse.RdThe monthly use of steam (Steam) in a factory may be
modeled and described as function of the
operating days per month (Operating.Days) and
mean outside temperature per month (Temperature).
data("steamUse", package="robustbase")A data frame with 25 observations on the following 9 variables.
Steam:regression response \(Y\), the poinds of steam used monthly.
fattyAcid:pounds of Real Fatty Acid in storage per month.
glycerine:pounds of crude glycerine made.
wind:average wind velocity in miles per hour (a numeric vector).
days:an integer vector with number of days of that month, i.e., in \(28..31\).
op.days:the number of operating days for the given month (integer).
freeze.d:the number of days below 32 degrees Fahrenheit (\(= 0\)°C (C=Celsius) \(=\) freezing temperature of water).
temperature:a numeric vector of average outside temperature in Fahrenheit (F).
startups:the number of startups (of production in that month).
Nor further information is given in Draper and Smith, about the place and exacts years of the measurements, though some educated guesses should be possible, see the examples.
Data from Draper and Smith, 1st ed, 1966; appendix A.
A version of this has been used in teaching at SfS ETH Zurich, since at least 1996, https://stat.ethz.ch/Teaching/Datasets/NDK/dsteam.dat
The package aprean3 contains all data sets from the 3rd
edition of Draper and Smith (1998), and this data set with variable
names x1 .. x10 (x9 being wind^2, hence extraneous).
Draper and Smith (1981) Applied Regression Analysis (2nd ed., p. 615 ff)
if (FALSE) { # \dontrun{
if(require("aprean3")) { # show how 'steamUse' is related to 'dsa01a'
stm <- dsa01a
names(stm) <- c("Steam", "fattyAcid", "glycerine", "wind",
"days", "op.days", "freeze.d",
"temperature", "wind.2", "startups")
## prove that wind.2 is wind^2, "traditionally" rounded to 1 digit:
stopifnot(all.equal(floor(0.5 + 10*stm[,"wind"]^2)/10,
stm[,"wind.2"], tol = 1e-14))
## hence drop it
steamUse <- stm[, names(stm) != "wind.2"]
}
} # }
data(steamUse)
str(steamUse)
#> 'data.frame': 25 obs. of 9 variables:
#> $ Steam : num 10.98 11.13 12.51 8.4 9.27 ...
#> $ fattyAcid : num 5.2 5.12 6.19 3.89 6.28 5.76 3.45 6.57 5.69 6.14 ...
#> $ glycerine : num 0.61 0.64 0.78 0.49 0.84 0.74 0.42 0.87 0.75 0.76 ...
#> $ wind : num 7.4 8 7.4 7.5 5.5 8.9 4.1 4.1 4.1 4.5 ...
#> $ days : int 31 29 31 30 31 30 31 31 30 31 ...
#> $ op.days : int 20 20 23 20 21 22 11 23 21 20 ...
#> $ freeze.d : int 22 25 17 22 0 0 0 0 0 0 ...
#> $ temperature: num 35.3 29.7 30.8 58.8 61.4 71.3 74.4 76.7 70.7 57.5 ...
#> $ startups : int 4 5 4 4 5 4 2 5 4 5 ...
## Looking at this,
cbind(M=rep_len(month.abb, 25), steamUse[,5:8, drop=FALSE])
#> M days op.days freeze.d temperature
#> 1 Jan 31 20 22 35.3
#> 2 Feb 29 20 25 29.7
#> 3 Mar 31 23 17 30.8
#> 4 Apr 30 20 22 58.8
#> 5 May 31 21 0 61.4
#> 6 Jun 30 22 0 71.3
#> 7 Jul 31 11 0 74.4
#> 8 Aug 31 23 0 76.7
#> 9 Sep 30 21 0 70.7
#> 10 Oct 31 20 0 57.5
#> 11 Nov 30 20 11 46.4
#> 12 Dec 31 21 12 28.9
#> 13 Jan 31 21 25 28.1
#> 14 Feb 28 19 18 39.1
#> 15 Mar 31 23 5 46.8
#> 16 Apr 30 20 7 48.5
#> 17 May 31 22 0 59.3
#> 18 Jun 30 22 0 70.0
#> 19 Jul 31 11 0 70.0
#> 20 Aug 31 23 0 74.5
#> 21 Sep 30 20 0 72.1
#> 22 Oct 31 21 1 58.1
#> 23 Nov 30 20 14 44.6
#> 24 Dec 31 20 22 33.4
#> 25 Jan 31 22 28 28.6
## one will conjecture that these were 25 months, Jan--Jan in a row,
## starting in a leap year (perhaps 1960 ?).
plot(steamUse)
summary(fm1 <- lmrob(Steam ~ temperature + op.days, data=steamUse))
#>
#> Call:
#> lmrob(formula = Steam ~ temperature + op.days, data = steamUse)
#> \--> method = "MM"
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.2968 -0.2180 0.1365 0.4614 3.8812
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 3.087049 1.419653 2.175 0.0407 *
#> temperature -0.082858 0.006305 -13.142 6.80e-12 ***
#> op.days 0.514717 0.071214 7.228 3.05e-07 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Robust residual standard error: 0.5841
#> Multiple R-squared: 0.8696, Adjusted R-squared: 0.8577
#> Convergence in 9 IRWLS iterations
#>
#> Robustness weights:
#> 2 observations c(7,19) are outliers with |weight| = 0 ( < 0.004);
#> one weight is ~= 1. The remaining 22 ones are summarized as
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.6013 0.9133 0.9772 0.9313 0.9881 0.9987
#> Algorithmic parameters:
#> tuning.chi bb tuning.psi refine.tol
#> 1.548e+00 5.000e-01 4.685e+00 1.000e-07
#> rel.tol scale.tol solve.tol zero.tol
#> 1.000e-07 1.000e-10 1.000e-07 1.000e-10
#> eps.outlier eps.x warn.limit.reject warn.limit.meanrw
#> 4.000e-03 1.395e-10 5.000e-01 5.000e-01
#> nResample max.it best.r.s k.fast.s k.max
#> 500 50 2 1 200
#> maxit.scale trace.lev mts compute.rd fast.s.large.n
#> 200 0 1000 0 2000
#> psi subsampling cov
#> "bisquare" "nonsingular" ".vcov.avar1"
#> compute.outlier.stats
#> "SM"
#> seed : int(0)
## diagnoses 2 outliers: month of July, maybe company-wide summer vacations
## KS2014 alone seems not robust enough:
summary(fm.14 <- lmrob(Steam ~ temperature + op.days, data=steamUse,
setting="KS2014"))
#>
#> Call:
#> lmrob(formula = Steam ~ temperature + op.days, data = steamUse, setting = "KS2014")
#> \--> method = "SMDM"
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.62052 -0.46568 0.09035 0.47131 0.90895
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 9.243976 1.104246 8.371 2.76e-08 ***
#> temperature -0.073052 0.008028 -9.099 6.53e-09 ***
#> op.days 0.200307 0.045753 4.378 0.00024 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Robust residual standard error: 0.6705
#> Multiple R-squared: 0.8499, Adjusted R-squared: 0.8363
#> Convergence in 6 IRWLS iterations
#>
#> Robustness weights:
#> 19 weights are ~= 1. The remaining 6 ones are
#> 3 11 12 17 22 23
#> 0.9477 0.5665 0.9671 0.9875 0.9938 0.8497
#> Algorithmic parameters:
#> tuning.chi1 tuning.chi2 tuning.chi3 tuning.chi4
#> -5.000e-01 1.500e+00 NA 5.000e-01
#> bb tuning.psi1 tuning.psi2 tuning.psi3
#> 5.000e-01 -5.000e-01 1.500e+00 9.500e-01
#> tuning.psi4 refine.tol rel.tol scale.tol
#> NA 1.000e-07 1.000e-07 1.000e-10
#> solve.tol zero.tol eps.outlier eps.x
#> 1.000e-07 1.000e-10 4.000e-03 1.395e-10
#> warn.limit.reject warn.limit.meanrw
#> 5.000e-01 5.000e-01
#> nResample max.it best.r.s k.fast.s k.max
#> 1000 500 20 2 2000
#> maxit.scale trace.lev mts compute.rd numpoints
#> 200 0 1000 0 10
#> fast.s.large.n
#> 2000
#> setting psi subsampling
#> "KS2014" "lqq" "nonsingular"
#> cov compute.outlier.stats
#> ".vcov.w" "SMDM"
#> seed : int(0)
pairs(Steam ~ temperature+op.days, steamUse)