Memory and CPU Usage Information for Parallel R Code

This function is a wrapper around the system command ps that can be used to benchmark (peak) memory and CPU usage of parallel R code. By taking snapshots the memory usage of R processes at a regular interval, the function dynamically builds up a profile of their usage of system resources.

syrup(expr, interval = 0.5, peak = FALSE, env = caller_env())

Arguments

expr: An expression.
interval: The interval at which to take snapshots of respirce usage. In practice, there's an overhead on top of each of these intervals.
peak: Whether to return rows for only the "peak" memory usage. Interpreted as the id with the maximum rss sum. Defaults to FALSE, but may be helpful to set peak = TRUE for potentially very long-running processes so that the tibble doesn't grow too large.
env: The environment to evaluate expr in.

Value

A tibble with columns id and time and a number of columns from ps::ps() output describing memory and CPU usage. Notably, the process ID pid, parent process ID ppid, percent CPU usage, and resident set size rss (a measure of memory usage).

Details

While much of the verbiage in the package assumes that the supplied expression will be distributed across CPU cores, there's nothing specific about this package that necessitates the expression provided to syrup() is run in parallel. Said another way, syrup() will work just fine with "normal," sequentially-run R code (as in the examples). That said, there are many better, more fine-grained tools for the job in the case of sequential R code, such as Rprofmem(), the profmem package, the bench package, and packages in the R-prof GitHub organization.

Loosely, the function works by:

Setting up another R process (call it sesh) that queries system information using ps::ps() at a regular interval,
Evaluating the supplied expression,
Reading the queried system information back into the main process from sesh,
Closing sesh, and then
Returning the queried system information.

Note that information on the R process sesh is filtered out from the results automatically.

Examples

# pass any expression to syrup. first, sequentially:
res_syrup <- syrup({res_output <- Sys.sleep(1)})

res_syrup
#> # A tibble: 132 × 8
#>       id time                    pid    ppid name       pct_cpu     rss      vms
#>    <dbl> <dttm>                <int>   <int> <chr>        <dbl> <bch:b> <bch:by>
#>  1     1 2025-07-09 13:57:31 2825985 2818763 R               NA 191.5MB 785.22MB
#>  2     1 2025-07-09 13:57:31 2825787 2825448 R               NA 257.8MB 925.91MB
#>  3     1 2025-07-09 13:57:31 2825448 2825447 R               NA  82.1MB 665.31MB
#>  4     1 2025-07-09 13:57:31 2818763 2814503 R               NA 292.9MB   1.21GB
#>  5     1 2025-07-09 13:57:31 2814503 2814502 R               NA 119.2MB 702.88MB
#>  6     1 2025-07-09 13:57:31 2788560 2788479 rsession        NA 219.8MB   1.38GB
#>  7     1 2025-07-09 13:57:31 2788479 2788477 rsession-…      NA     3MB    4.4MB
#>  8     1 2025-07-09 13:57:31 2766449 2766360 R               NA  58.1MB    563MB
#>  9     1 2025-07-09 13:57:31 2766448 2766360 R               NA  58.2MB    563MB
#> 10     1 2025-07-09 13:57:31 2766442 2766360 R               NA  99.5MB 678.14MB
#> # ℹ 122 more rows

# to snapshot memory and CPU information more (or less) often, set `interval`
syrup(Sys.sleep(1), interval = .01)
#> # A tibble: 198 × 8
#>       id time                    pid    ppid name       pct_cpu     rss      vms
#>    <dbl> <dttm>                <int>   <int> <chr>        <dbl> <bch:b> <bch:by>
#>  1     1 2025-07-09 13:57:33 2825985 2818763 R               NA   207MB 926.83MB
#>  2     1 2025-07-09 13:57:33 2825787 2825448 R               NA   267MB 935.19MB
#>  3     1 2025-07-09 13:57:33 2825448 2825447 R               NA    83MB 666.22MB
#>  4     1 2025-07-09 13:57:33 2818763 2814503 R               NA 292.9MB   1.21GB
#>  5     1 2025-07-09 13:57:33 2814503 2814502 R               NA 119.2MB 702.88MB
#>  6     1 2025-07-09 13:57:33 2788560 2788479 rsession        NA 219.8MB   1.38GB
#>  7     1 2025-07-09 13:57:33 2788479 2788477 rsession-…      NA     3MB    4.4MB
#>  8     1 2025-07-09 13:57:33 2766449 2766360 R               NA  58.1MB    563MB
#>  9     1 2025-07-09 13:57:33 2766448 2766360 R               NA  58.2MB    563MB
#> 10     1 2025-07-09 13:57:33 2766442 2766360 R               NA  99.5MB 678.14MB
#> # ℹ 188 more rows

# use `peak = TRUE` to return only the snapshot with
# the highest memory usage (as `sum(rss)`)
syrup(Sys.sleep(1), interval = .01, peak = TRUE)
#> # A tibble: 65 × 8
#>       id time                    pid    ppid name       pct_cpu     rss      vms
#>    <dbl> <dttm>                <int>   <int> <chr>        <dbl> <bch:b> <bch:by>
#>  1     2 2025-07-09 13:57:35 2825787 2825448 R               NA 267.8MB  936.1MB
#>  2     2 2025-07-09 13:57:35 2825448 2825447 R               NA  83.9MB 667.12MB
#>  3     2 2025-07-09 13:57:35 2818763 2814503 R               NA   312MB   1.24GB
#>  4     2 2025-07-09 13:57:35 2814503 2814502 R               NA 119.2MB 702.88MB
#>  5     2 2025-07-09 13:57:35 2788560 2788479 rsession        NA 219.8MB   1.38GB
#>  6     2 2025-07-09 13:57:35 2788479 2788477 rsession-…      NA     3MB    4.4MB
#>  7     2 2025-07-09 13:57:35 2766449 2766360 R               NA  58.1MB    563MB
#>  8     2 2025-07-09 13:57:35 2766448 2766360 R               NA  58.2MB    563MB
#>  9     2 2025-07-09 13:57:35 2766442 2766360 R               NA  99.5MB 678.14MB
#> 10     2 2025-07-09 13:57:35 2766441 2766360 R               NA  83.5MB 661.89MB
#> # ℹ 55 more rows

# results from syrup are more---or maybe only---useful when
# computations are evaluated in parallel. see package README
# for an example.