These are methods for dplyr's dplyr::group_by() and dplyr::ungroup() generics.
Grouping is translated to the either keyby and by argument of
[.data.table depending on the value of the arrange argument.
In group_by(), variables or computations to group by.
Computations are always done on the ungrouped data frame.
To perform computations on the grouped data, you need to use
a separate mutate() step before the group_by().
Computations are not allowed in nest_by().
In ungroup(), variables to remove from the grouping.
When FALSE, the default, group_by() will
override existing groups. To add to the existing groups, use
.add = TRUE.
This argument was previously called add, but that prevented
creating a new grouping variable called add, and conflicts with
our naming conventions.
If TRUE, will automatically arrange the output of
subsequent grouped operations by group. If FALSE, output order will be
left unchanged. In the generated data.table code this switches between
using the keyby (TRUE) and by (FALSE) arguments.
A tbl()
library(dplyr, warn.conflicts = FALSE)
dt <- lazy_dt(mtcars)
# group_by() is usually translated to `keyby` so that the groups
# are ordered in the output
dt %>%
group_by(cyl) %>%
summarise(mpg = mean(mpg))
#> Source: local data table [3 x 2]
#> Call: `_DT15`[, .(mpg = mean(mpg)), keyby = .(cyl)]
#>
#> cyl mpg
#> <dbl> <dbl>
#> 1 4 26.7
#> 2 6 19.7
#> 3 8 15.1
#>
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
# use `arrange = FALSE` to instead use `by` so the original order
# or groups is preserved
dt %>%
group_by(cyl, arrange = FALSE) %>%
summarise(mpg = mean(mpg))
#> Source: local data table [3 x 2]
#> Call: `_DT15`[, .(mpg = mean(mpg)), by = .(cyl)]
#>
#> cyl mpg
#> <dbl> <dbl>
#> 1 6 19.7
#> 2 4 26.7
#> 3 8 15.1
#>
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results