Constructs a fixest panel data base out of a data.frame which allows to use leads and lags
in fixest estimations and to create new variables from leads and lags if the data.frame
was also a data.table::data.table.
panel(data, panel.id, time.step = NULL, duplicate.method = "none")A data.frame.
The panel identifiers. Can either be: i) a one sided formula
(e.g. panel.id = ~id+time), ii) a character vector of length 2
(e.g. panel.id=c('id', 'time'), or iii) a character scalar of two variables
separated by a comma (e.g. panel.id='id,time'). Note that you can combine variables
with ^ only inside formulas (see the dedicated section in feols).
The method to compute the lags, default is NULL (which means
automatically set). Can be equal to: "unitary", "consecutive", "within.consecutive",
or to a number. If "unitary", then the largest common divisor between consecutive
time periods is used (typically if the time variable represents years, it will be 1).
This method can apply only to integer (or convertible to integer) variables.
If "consecutive", then the time variable can be of any type: two successive
time periods represent a lag of 1. If "witihn.consecutive" then within a given id,
two successive time periods represent a lag of 1. Finally, if the time variable is numeric,
you can provide your own numeric time step.
If several observations have the same id and time values,
then the notion of lag is not defined for them. If duplicate.method = "none" (default)
and duplicate values are found, this leads to an error. You can use
duplicate.method = "first" so that the first occurrence of identical id/time
observations will be used as lag.
It returns a data base identical to the one given in input, but with an additional attribute: “panel_info”. This attribute contains vectors used to efficiently create lags/leads of the data. When the data is subselected, some bookeeping is performed on the attribute “panel_info”.
This function allows you to use leads and lags in a fixest estimation without having to
provide the argument panel.id. It also offers more options on how to set the panel
(with the additional arguments 'time.step' and 'duplicate.method').
When the initial data set was also a data.table, not all operations are supported and some may
dissolve the fixest_panel. This is the case when creating subselections of the initial data
with additional attributes (e.g. pdt[x>0, .(x, y, z)] would dissolve the fixest_panel,
meaning only a data.table would be the result of the call).
If the initial data set was also a data.table, then you can create new variables from lags
and leads using the functions l and f. See the example.
data(base_did)
# Setting a data set as a panel...
pdat = panel(base_did, ~id+period)
# ...then using the functions l and f
est1 = feols(y~l(x1, 0:1), pdat)
#> NOTE: 108 observations removed because of NA values (RHS: 108).
est2 = feols(f(y)~l(x1, -1:1), pdat)
#> NOTE: 216 observations removed because of NA values (LHS: 108, RHS: 216).
est3 = feols(l(y)~l(x1, 0:3), pdat)
#> NOTE: 324 observations removed because of NA values (LHS: 108, RHS: 324).
etable(est1, est2, est3, order = c("f", "^x"), drop="Int")
#> est1 est2 est3
#> Dependent Var.: y f(y) l(y)
#>
#> Constant 2.235*** (0.1577) 2.464*** (0.1697) 2.196*** (0.1750)
#> l(x1,0) 0.9948*** (0.0532) 0.0081 (0.0584) -0.0534 (0.0599)
#> l(x1,1) 0.0410 (0.0540) 0.0157 (0.0585) 0.9871*** (0.0613)
#> l(x1,-1) 0.9940*** (0.0579)
#> l(x1,2) 0.0220 (0.0607)
#> l(x1,3) 0.0102 (0.0598)
#> _______________ __________________ __________________ __________________
#> S.E. type IID IID IID
#> Observations 972 864 756
#> R2 0.26558 0.25697 0.25875
#> Adj. R2 0.26406 0.25438 0.25480
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# or using the argument panel.id
feols(f(y)~l(x1, -1:1), base_did, panel.id = ~id+period)
#> NOTE: 216 observations removed because of NA values (LHS: 108, RHS: 216).
#> OLS estimation, Dep. Var.: f(y)
#> Observations: 864
#> Standard-errors: IID
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 2.464313 0.169710 14.520756 < 2.2e-16 ***
#> l(x1, -1) 0.994018 0.057861 17.179278 < 2.2e-16 ***
#> l(x1, 0) 0.008072 0.058400 0.138217 0.89010
#> l(x1, 1) 0.015693 0.058540 0.268068 0.78871
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 4.97418 Adj. R2: 0.254377
# You can use panel.id in various ways:
pdat = panel(base_did, ~id+period)
# is identical to:
pdat = panel(base_did, c("id", "period"))
# and also to:
pdat = panel(base_did, "id,period")
# l() and f() can also be used within a data.table:
if(require("data.table")){
pdat_dt = panel(as.data.table(base_did), ~id+period)
# Now since pdat_dt is also a data.table
# you can create lags/leads directly
pdat_dt[, x1_l1 := l(x1)]
pdat_dt[, c("x1_l1_fill0", "y_f2") := .(l(x1, fill = 0), f(y, 2))]
}