R/io-.R
, R/io-csv.R
, R/io-parquet.R
df_from_file.Rd
df_from_file()
uses arbitrary table functions to read data.
See https://duckdb.org/docs/data/overview for a documentation
of the available functions and their options.
To read multiple files with the same schema,
pass a wildcard or a character vector to the path
argument,
duckplyr_df_from_file()
is a thin wrapper around df_from_file()
that calls as_duckplyr_df()
on the output.
These functions ingest data from a file using a table function. The results are transparently converted to a data frame, but the data is only read when the resulting data frame is actually accessed.
df_from_csv()
reads a CSV file using the read_csv_auto()
table function.
duckplyr_df_from_csv()
is a thin wrapper around df_from_csv()
that calls as_duckplyr_df()
on the output.
df_from_parquet()
reads a Parquet file using the read_parquet()
table function.
duckplyr_df_from_parquet()
is a thin wrapper around df_from_parquet()
that calls as_duckplyr_df()
on the output.
df_to_parquet()
writes a data frame to a Parquet file via DuckDB.
If the data frame is a duckplyr_df
, the materialization occurs outside of R.
An existing file will be overwritten.
This function requires duckdb >= 0.10.0.
df_from_file(path, table_function, ..., options = list(), class = NULL)
duckplyr_df_from_file(
path,
table_function,
...,
options = list(),
class = NULL
)
df_from_csv(path, ..., options = list(), class = NULL)
duckplyr_df_from_csv(path, ..., options = list(), class = NULL)
df_from_parquet(path, ..., options = list(), class = NULL)
duckplyr_df_from_parquet(path, ..., options = list(), class = NULL)
df_to_parquet(data, path)
Path to files, glob patterns *
and ?
are supported.
The name of a table-valued
DuckDB function such as "read_parquet"
,
"read_csv"
, "read_csv_auto"
or "read_json"
.
These dots are for future extensions and must be empty.
Arguments to the DuckDB function
indicated by table_function
.
The class of the output.
By default, a tibble is created.
The returned object will always be a data frame.
Use class = "data.frame"
or class = character()
to create a plain data frame.
A data frame to be written to disk.
A data frame for df_from_file()
, or a duckplyr_df
for
duckplyr_df_from_file()
, extended by the provided class
.
# Create simple CSV file
path <- tempfile("duckplyr_test_", fileext = ".csv")
write.csv(data.frame(a = 1:3, b = letters[4:6]), path, row.names = FALSE)
# Reading is immediate
df <- df_from_csv(path)
# Materialization only upon access
names(df)
#> [1] "a" "b"
df$a
#> materializing:
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> read_csv_auto(/tmp/RtmpGQUiUT/duckplyr_test_2f8bc31163c2a9.csv)
#>
#> ---------------------
#> -- Result Columns --
#> ---------------------
#> - a (BIGINT)
#> - b (VARCHAR)
#>
#> [1] 1 2 3
# Return as tibble, specify column types:
df_from_file(
path,
"read_csv",
options = list(delim = ",", types = list(c("DOUBLE", "VARCHAR"))),
class = class(tibble())
)
#> materializing:
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> read_csv(/tmp/RtmpGQUiUT/duckplyr_test_2f8bc31163c2a9.csv)
#>
#> ---------------------
#> -- Result Columns --
#> ---------------------
#> - a (DOUBLE)
#> - b (VARCHAR)
#>
#> # A tibble: 3 × 2
#> a b
#> <dbl> <chr>
#> 1 1 d
#> 2 2 e
#> 3 3 f
# Read multiple file at once
path2 <- tempfile("duckplyr_test_", fileext = ".csv")
write.csv(data.frame(a = 4:6, b = letters[7:9]), path2, row.names = FALSE)
duckplyr_df_from_csv(file.path(tempdir(), "duckplyr_test_*.csv"))
#> materializing:
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> read_csv_auto(/tmp/RtmpGQUiUT/duckplyr_test_*.csv)
#>
#> ---------------------
#> -- Result Columns --
#> ---------------------
#> - a (BIGINT)
#> - b (VARCHAR)
#>
#> # A tibble: 6 × 2
#> a b
#> <dbl> <chr>
#> 1 1 d
#> 2 2 e
#> 3 3 f
#> 4 4 g
#> 5 5 h
#> 6 6 i
unlink(c(path, path2))
# Write a Parquet file:
path_parquet <- tempfile(fileext = ".parquet")
df_to_parquet(df, path_parquet)
# With a duckplyr_df, the materialization occurs outside of R:
df %>%
as_duckplyr_df() %>%
mutate(b = a + 1) %>%
df_to_parquet(path_parquet)
duckplyr_df_from_parquet(path_parquet)
#> materializing:
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> read_parquet(/tmp/RtmpGQUiUT/file2f8bc36a978387.parquet)
#>
#> ---------------------
#> -- Result Columns --
#> ---------------------
#> - a (DOUBLE)
#> - b (DOUBLE)
#>
#> # A tibble: 3 × 2
#> a b
#> <dbl> <dbl>
#> 1 1 2
#> 2 2 3
#> 3 3 4
unlink(path_parquet)