RepeatedMedian.Rd
Computes the repeated median slope proposed by Siegel (1982) using the algorithm by Matousek et. al (1998). The algorithm runs in an expected \(O(n (log n)^2)\) time, which is typically significantly faster than the \(O(n^2)\) computational cost of the naive algorithm, and requires \(O(n)\) storage.
RepeatedMedian(x, y, alpha = NULL, beta = NULL, verbose = TRUE)
A vector of predictor values.
A vector of response values.
Determines the outer order statistic, which is equal to \([alpha*n]\), where \(n\) denotes the sample size. Defaults to NULL
, which corresponds
with the (upper) median.
Determines the inner order statistic, which is equal to \([beta*(n-1)]\), where \(n\) denotes the sample size. Defaults to NULL
, which corresponds
with the (upper) median.
Whether or not to print out the progress of the algorithm. Defaults to TRUE
.
Given two input vectors x
and y
of length \(n\), the repeated median is computed as \(med_i med_j (y_i - y_j)/(x_i-x_j)\). The default "outer” median is the \(\lfloor (n + 2) / 2 \rfloor\) largest element in the ordered median slopes. The inner median, which for each observation is calculated as the median of the slopes connected to this observation, is the \(\lfloor (n +1) / 2 \rfloor\) largest element in the ordered slopes. By changing alpha
and beta
, other repeated order statistics of the slopes can be computed.
A list with elements:
The estimate of the intercept.
The Theil-Sen estimate of the slope.
Siegel, A. F. (1982). Robust regression using repeated medians. Biometrika, 69(1), 242-244.
Matousek, J., Mount, D. M., & Netanyahu, N. S. (1998). Efficient randomized algorithms for the repeated median line estimator. Algorithmica, 20(2), 136-150.
Raymaekers (2023). "The R Journal: robslopes: Efficient Computation of the (Repeated) Median Slope", The R Journal. (link to open access pdf)
# We compare the implemented algorithm against a naive brute-force approach.
bruteForceRM <- function(x, y) {
n <- length(x)
medind1 <- floor((n+2) / 2)
medind2 <- floor((n+1) / 2)
temp <- t(sapply(1:n, function(z) sort(apply(cbind(x, y), 1 ,
function(k) (k[2] - y[z]) /
(k[1] - x[z])))))
RMslope <- sort(temp[, medind2])[medind1]
RMintercept <- sort(y - x * RMslope)[medind1]
return(list(intercept = RMintercept, slope = RMslope))
}
n = 100
set.seed(2)
x = rnorm(n)
y = x + rnorm(n)
t0 <- proc.time()
RM.fast <- RepeatedMedian(x, y, NULL, NULL, FALSE)
t1 <- proc.time()
t1 - t0
#> user system elapsed
#> 0.001 0.000 0.001
t0 <- proc.time()
RM.naive <- bruteForceRM(x, y)
t1 <- proc.time()
t1 - t0
#> user system elapsed
#> 0.053 0.000 0.052
RM.fast$slope - RM.naive$slope
#> [1] -1.110223e-16