Lags a variable using a formula

Lags a variable using panel id + time identifiers in a formula.

# S3 method for class 'formula'
lag(
  x,
  k = 1,
  data,
  time.step = NULL,
  fill = NA,
  duplicate.method = c("none", "first"),
  ...
)

lag_fml(
  x,
  k = 1,
  data,
  time.step = NULL,
  fill = NA,
  duplicate.method = c("none", "first"),
  ...
)

Arguments

x: A formula of the type var ~ id + time where var is the variable to be lagged, id is a variable representing the panel id, and time is the time variable of the panel.
k: An integer giving the number of lags. Default is 1. For leads, just use a negative number.
data: Optional, the data.frame in which to evaluate the formula. If not provided, variables will be fetched in the current environment.
time.step: The method to compute the lags, default is NULL (which means automatically set). Can be equal to: "unitary", "consecutive", "within.consecutive", or to a number. If "unitary", then the largest common divisor between consecutive time periods is used (typically if the time variable represents years, it will be 1). This method can apply only to integer (or convertible to integer) variables. If "consecutive", then the time variable can be of any type: two successive time periods represent a lag of 1. If "witihn.consecutive" then within a given id, two successive time periods represent a lag of 1. Finally, if the time variable is numeric, you can provide your own numeric time step.
fill: Scalar. How to fill the observations without defined lead/lag values. Default is NA.
duplicate.method: If several observations have the same id and time values, then the notion of lag is not defined for them. If duplicate.method = "none" (default) and duplicate values are found, this leads to an error. You can use duplicate.method = "first" so that the first occurrence of identical id/time observations will be used as lag.
...: Not currently used.

Value

It returns a vector of the same type and length as the variable to be lagged in the formula.

Functions

lag_fml(): Lags a variable using a formula syntax

Author

Laurent Berge

Examples

# simple example with an unbalanced panel
base = data.frame(id = rep(1:2, each = 4),
                  time = c(1, 2, 3, 4, 1, 4, 6, 9), x = 1:8)

base$lag1 = lag(x~id+time,  1, base) # lag 1
base$lead1 = lag(x~id+time, -1, base) # lead 1
base$lag2_fill0 = lag(x~id+time, 2, base, fill = 0)
# with time.step = "consecutive"
base$lag1_consecutive = lag(x~id+time, 1, base, time.step = "consecutive")
#   => works for indiv. 2 because 9 (resp. 6) is consecutive to 6 (resp. 4)
base$lag1_within.consecutive = lag(x~id+time, 1, base, time.step = "within")
#   => now two consecutive years within each indiv is one lag

print(base)
#>   id time x lag1 lead1 lag2_fill0 lag1_consecutive lag1_within.consecutive
#> 1  1    1 1   NA     2          0               NA                      NA
#> 2  1    2 2    1     3          0                1                       1
#> 3  1    3 3    2     4          1                2                       2
#> 4  1    4 4    3    NA          2                3                       3
#> 5  2    1 5   NA    NA          0               NA                      NA
#> 6  2    4 6   NA    NA          0               NA                       5
#> 7  2    6 7   NA    NA          6                6                       6
#> 8  2    9 8   NA    NA          0                7                       7

# Argument time.step = "consecutive" is
# mostly useful when the time variable is not a number:
# e.g. c("1991q1", "1991q2", "1991q3") etc

# with duplicates
base_dup = data.frame(id = rep(1:2, each = 4),
                      time = c(1, 1, 1, 2, 1, 2, 2, 3), x = 1:8)

# Error because of duplicate values for (id, time)
try(lag(x~id+time, 1, base_dup))
#> Error : in lag.formula(x ~ id + time, 1, base_dup): 
#> The panel identifiers contain duplicate values: this is not allowed since
#> lag/leads are not defined for them. For example (id, time) = (1, 1) appears
#> three times. Please provide data without duplicates -- or you can also use
#> duplicate.method = 'first' (see Details).


# Error is bypassed, lag corresponds to first occurence of (id, time)
lag(x~id+time, 1, base_dup, duplicate.method = "first")
#> [1] NA NA NA  1 NA  5  5  6


# Playing with time steps
base = data.frame(id = rep(1:2, each = 4),
                  time = c(1, 2, 3, 4, 1, 4, 6, 9), x = 1:8)

# time step: 0.5 (here equivalent to lag of 1)
lag(x~id+time, 2, base, time.step = 0.5)
#> [1] NA  1  2  3 NA NA NA NA

# Error: wrong time step
try(lag(x~id+time, 2, base, time.step = 7))
#> Error : in lag.formula(x ~ id + time, 2, base, time.step = 7): 
#> If 'time.step' is a number, then it must be an exact divisor of all the
#> difference between two consecutive time periods. This is currently not the
#> case: 7 is not a divisor of 1 (the difference btw the time periods 2 and 1).

# Adding NAs + unsorted IDs
base = data.frame(id = rep(1:2, each = 4),
                  time = c(4, NA, 3, 1, 2, NA, 1, 3), x = 1:8)

base$lag1 = lag(x~id+time, 1, base)
base$lag1_within = lag(x~id+time, 1, base, time.step = "w")
base_bis = base[order(base$id, base$time),]

print(base_bis)
#>   id time x lag1 lag1_within
#> 4  1    1 4   NA          NA
#> 3  1    3 3   NA           4
#> 1  1    4 1    3           3
#> 2  1   NA 2   NA          NA
#> 7  2    1 7   NA          NA
#> 5  2    2 5    7           7
#> 8  2    3 8    5           5
#> 6  2   NA 6   NA          NA

# You can create variables without specifying the data within data.table:
if(require("data.table")){
  base = data.table(id = rep(1:2, each = 3), year = 1990 + rep(1:3, 2), x = 1:6)
  base[, x.l1 := lag(x~id+year, 1)]
}
#>       id  year     x  x.l1
#>    <int> <num> <int> <int>
#> 1:     1  1991     1    NA
#> 2:     1  1992     2     1
#> 3:     1  1993     3     2
#> 4:     2  1991     4    NA
#> 5:     2  1992     5     4
#> 6:     2  1993     6     5

Arguments

Value

Functions

See also

Author

Examples