These are methods for the dplyr generics dplyr::slice_min(), dplyr::slice_max(), and
dplyr::slice_sample(). They are translated to SQL using dplyr::filter() and
window functions (ROWNUMBER, MIN_RANK, or CUME_DIST depending on
arguments). slice(), slice_head(), and slice_tail() are not supported
since database tables have no intrinsic order.
If data is grouped, the operation will be performed on each group so that
(e.g.) slice_min(db, x, n = 3) will select the three rows with the smallest
value of x in each group.
# S3 method for class 'tbl_lazy'
slice_min(
.data,
order_by,
...,
n,
prop,
by = NULL,
with_ties = TRUE,
na_rm = TRUE
)
# S3 method for class 'tbl_lazy'
slice_max(
.data,
order_by,
...,
n,
by = NULL,
prop,
with_ties = TRUE,
na_rm = TRUE
)
# S3 method for class 'tbl_lazy'
slice_sample(.data, ..., n, prop, by = NULL, weight_by = NULL, replace = FALSE)A lazy data frame backed by a database query.
Variable or function of variables to order by.
Not used.
Provide either n, the number of rows, or prop, the
proportion of rows to select. If neither are supplied, n = 1 will be
used.
If n is greater than the number of rows in the group (or prop > 1),
the result will be silently truncated to the group size. If the proportion
of a group size is not an integer, it is rounded down.
<tidy-select> Optionally, a selection of columns to
group by for just this operation, functioning as an alternative to group_by(). For
details and examples, see ?dplyr_by.
Should ties be kept together? The default, TRUE, may
return more rows than you request. Use FALSE to ignore ties, and return
the first n rows.
Should missing values in order_by be removed from the result?
If FALSE, NA values are sorted to the end (like in arrange()), so
they will only be included if there are insufficient non-missing values to
reach n/prop.
Not supported for database backends.
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(x = 1:3, y = c(1, 1, 2))
db %>% slice_min(x) %>% show_query()
#> <SQL>
#> SELECT `x`, `y`
#> FROM (
#> SELECT `dbplyr_1N7gYBNlcp`.*, RANK() OVER (ORDER BY `x`) AS `col01`
#> FROM `dbplyr_1N7gYBNlcp`
#> ) AS `q01`
#> WHERE (`col01` <= 1)
db %>% slice_max(x) %>% show_query()
#> <SQL>
#> SELECT `x`, `y`
#> FROM (
#> SELECT `dbplyr_1N7gYBNlcp`.*, RANK() OVER (ORDER BY `x` DESC) AS `col01`
#> FROM `dbplyr_1N7gYBNlcp`
#> ) AS `q01`
#> WHERE (`col01` <= 1)
db %>% slice_sample() %>% show_query()
#> <SQL>
#> SELECT `x`, `y`
#> FROM (
#> SELECT
#> `dbplyr_1N7gYBNlcp`.*,
#> ROW_NUMBER() OVER (ORDER BY (0.5 + RANDOM() / 18446744073709551616.0)) AS `col01`
#> FROM `dbplyr_1N7gYBNlcp`
#> ) AS `q01`
#> WHERE (`col01` <= 1)
db %>% group_by(y) %>% slice_min(x) %>% show_query()
#> <SQL>
#> SELECT `x`, `y`
#> FROM (
#> SELECT
#> `dbplyr_1N7gYBNlcp`.*,
#> RANK() OVER (PARTITION BY `y` ORDER BY `x`) AS `col01`
#> FROM `dbplyr_1N7gYBNlcp`
#> ) AS `q01`
#> WHERE (`col01` <= 1)
# By default, ties are includes so you may get more rows
# than you expect
db %>% slice_min(y, n = 1)
#> # Source: SQL [?? x 2]
#> # Database: sqlite 3.50.4 [:memory:]
#> x y
#> <int> <dbl>
#> 1 1 1
#> 2 2 1
db %>% slice_min(y, n = 1, with_ties = FALSE)
#> # Source: SQL [?? x 2]
#> # Database: sqlite 3.50.4 [:memory:]
#> x y
#> <int> <dbl>
#> 1 1 1
# Non-integer group sizes are rounded down
db %>% slice_min(x, prop = 0.5)
#> # Source: SQL [?? x 2]
#> # Database: sqlite 3.50.4 [:memory:]
#> x y
#> <int> <dbl>
#> 1 1 1