Theil-Sen slope and intercept estimator.

Computes the Theil-Sen median slope estimator by Theil (1950) and Sen (1968). The implemented algorithm was proposed by Dillencourt et. al (1992) and runs in an expected \(O(n log n)\) time while requiring \(O(n)\) storage.

TheilSen(x, y, alpha = NULL, verbose = TRUE)

Arguments

x: A vector of predictor values.
y: A vector of response values.
alpha: Determines the order statistic of the target slope, which is equal to \([alpha*n*(n-1)]\), where \(n\) denotes the sample size. Defaults to NULL, which corresponds with the (upper) median.
verbose: Whether or not to print out the progress of the algorithm. Defaults to TRUE.

Details

Given two input vectors x and y of length \(n\), the Theil-Sen estimator is computed as \(med_{ij} (y_i - y_j)/(x_i-x_j)\). By default, the median in this experssion is the upper median, defined as \(\lfloor (n +2) / 2 \rfloor\). By changing alpha, other order statistics of the slopes can be computed.

Value

A list with elements:

intecept: The estimate of the intercept.
slope: The Theil-Sen estimate of the slope.

References

Theil, H. (1950), A rank-invariant method of linear and polynomial regression analysis (Parts 1-3), Ned. Akad. Wetensch. Proc. Ser. A, 53, 386-392, 521-525, 1397-1412.

Sen, P. K. (1968). Estimates of the regression coefficient based on Kendall's tau. Journal of the American statistical association, 63(324), 1379-1389.

Dillencourt, M. B., Mount, D. M., & Netanyahu, N. S. (1992). A randomized algorithm for slope selection. International Journal of Computational Geometry & Applications, 2(01), 1-27.

Raymaekers (2023). "The R Journal: robslopes: Efficient Computation of the (Repeated) Median Slope", The R Journal. (link to open access pdf)

Author

Jakob Raymaekers

Examples

# We compare the implemented algorithm against a naive brute-force approach.

bruteForceTS <- function(x, y) {
  
  n <- length(x)
  medind1 <- floor(((n * (n - 1)) / 2 + 2) / 2)
  medind2 <- floor((n + 2) / 2)
  temp <-  t(sapply(1:n, function(z)  apply(cbind(x, y), 1 ,
                                                  function(k) (k[2] - y[z]) /
                                                    (k[1] - x[z]))))
  TSslope <- sort(as.vector(temp[lower.tri(temp)]))[medind1]
  TSintercept <- sort(y - x * TSslope)[medind2]
  return(list(intercept = TSintercept, slope = TSslope))
}


n = 100
set.seed(2)
x = rnorm(n)
y = x + rnorm(n)

t0 <- proc.time()
TS.fast <- TheilSen(x, y, NULL, FALSE)
t1 <- proc.time()
t1 - t0
#>    user  system elapsed 
#>   0.001   0.000   0.002 

t0 <- proc.time()
TS.naive <- bruteForceTS(x, y)
t1 <- proc.time()
t1 - t0
#>    user  system elapsed 
#>   0.051   0.000   0.050 

TS.fast$slope - TS.naive$slope
#> [1] 0