These are methods for dplyr's group_by()
and ungroup()
generics.
Grouping is translated to the either keyby
and by
argument of
[.data.table
depending on the value of the arrange
argument.
In group_by()
, variables or computations to group by.
Computations are always done on the ungrouped data frame.
To perform computations on the grouped data, you need to use
a separate mutate()
step before the group_by()
.
Computations are not allowed in nest_by()
.
In ungroup()
, variables to remove from the grouping.
When FALSE
, the default, group_by()
will
override existing groups. To add to the existing groups, use
.add = TRUE
.
This argument was previously called add
, but that prevented
creating a new grouping variable called add
, and conflicts with
our naming conventions.
If TRUE
, will automatically arrange the output of
subsequent grouped operations by group. If FALSE
, output order will be
left unchanged. In the generated data.table code this switches between
using the keyby
(TRUE
) and by
(FALSE
) arguments.
A tbl()
library(dplyr, warn.conflicts = FALSE)
dt <- lazy_dt(mtcars)
# group_by() is usually translated to `keyby` so that the groups
# are ordered in the output
dt %>%
group_by(cyl) %>%
summarise(mpg = mean(mpg))
#> Source: local data table [3 x 2]
#> Call: `_DT15`[, .(mpg = mean(mpg)), keyby = .(cyl)]
#>
#> cyl mpg
#> <dbl> <dbl>
#> 1 4 26.7
#> 2 6 19.7
#> 3 8 15.1
#>
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
# use `arrange = FALSE` to instead use `by` so the original order
# or groups is preserved
dt %>%
group_by(cyl, arrange = FALSE) %>%
summarise(mpg = mean(mpg))
#> Source: local data table [3 x 2]
#> Call: `_DT15`[, .(mpg = mean(mpg)), by = .(cyl)]
#>
#> cyl mpg
#> <dbl> <dbl>
#> 1 6 19.7
#> 2 4 26.7
#> 3 8 15.1
#>
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results