Basic Functions

recipes-package recipes

recipes: A package for computing and preprocessing design matrices.

recipe()

Create a recipe for preprocessing data

formula(<recipe>)

Create a formula from a prepared recipe

print(<recipe>)

Print a Recipe

summary(<recipe>)

Summarize a recipe

prep()

Estimate a preprocessing recipe

bake()

Apply a trained preprocessing recipe

juice()

Extract transformed training set

selections selection

Methods for selecting variables in step functions

has_role() has_type() all_outcomes() all_predictors() all_date() all_date_predictors() all_datetime() all_datetime_predictors() all_double() all_double_predictors() all_factor() all_factor_predictors() all_integer() all_integer_predictors() all_logical() all_logical_predictors() all_nominal() all_nominal_predictors() all_numeric() all_numeric_predictors() all_ordered() all_ordered_predictors() all_string() all_string_predictors() all_unordered() all_unordered_predictors() current_info()

Role Selection

add_role() update_role() remove_role()

Manually alter roles

update_role_requirements()

Update role specific requirements

get_case_weights() averages() medians() variances() correlations() covariances() pca_wts() are_weights_used()

Helpers for steps with case weights

case_weights

Using case weights with recipes

Step Functions - Imputation

step_impute_bag() imp_vars()

Impute via bagged trees

step_impute_knn()

Impute via k-nearest neighbors

step_impute_linear()

Impute numeric variables via a linear model

step_impute_lower()

Impute numeric data below the threshold of measurement

step_impute_mean()

Impute numeric data using the mean

step_impute_median()

Impute numeric data using the median

step_impute_mode()

Impute nominal data using the most common value

step_impute_roll()

Impute numeric data using a rolling window statistic

step_unknown()

Assign missing categories to "unknown"

Step Functions - Individual Transformations

step_BoxCox()

Box-Cox transformation for non-negative data

step_bs()

B-spline basis functions

step_harmonic()

Add sin and cos terms for harmonic analysis

step_hyperbolic()

Hyperbolic transformations

step_inverse()

Inverse transformation

step_invlogit()

Inverse logit transformation

step_log()

Logarithmic transformation

step_logit()

Logit transformation

step_mutate()

Add new variables using dplyr

step_ns()

Natural spline basis functions

step_poly()

Orthogonal polynomial basis functions

step_poly_bernstein()

Generalized bernstein polynomial basis

step_relu()

Apply (smoothed) rectified linear transformation

step_spline_b()

Basis splines

step_spline_convex()

Convex splines

step_spline_monotone()

Monotone splines

step_spline_natural()

Natural splines

step_spline_nonnegative()

Non-negative splines

step_sqrt()

Square root transformation

step_YeoJohnson()

Yeo-Johnson transformation

Step Functions - Discretization

step_discretize()

Discretize Numeric Variables

discretize() predict(<discretize>)

Discretize Numeric Variables

step_cut()

Cut a numeric variable into a factor

Step Functions - Dummy Variables and Encodings

step_bin2factor()

Create a factors from A dummy variable

step_count()

Create counts of patterns using regular expressions

step_dummy()

Create traditional dummy variables

step_dummy_extract()

Extract patterns from nominal data

step_dummy_multi_choice()

Handle levels in multiple predictors together

step_factor2string()

Convert factors to strings

step_indicate_na()

Create missing data column indicators

step_integer()

Convert values to predefined integers

step_novel()

Simple value assignments for novel factor levels

step_num2factor()

Convert numbers to factors

step_ordinalscore()

Convert ordinal factors to numeric scores

step_other()

Collapse infrequent categorical levels

step_percentile()

Percentile transformation

step_regex()

Detect a regular expression

step_relevel()

Relevel factors to a desired level

step_string2factor()

Convert strings to factors

step_unknown()

Assign missing categories to "unknown"

step_unorder()

Convert ordered factors to unordered factors

Step Functions - Date and Datetime

step_date()

Date feature generator

step_time()

Time feature generator

step_holiday()

Holiday feature generator

Step Functions - Interactions

step_interact()

Create interaction variables

Step Functions - Normalization

step_center()

Centering numeric data

step_normalize()

Center and scale numeric data

step_range()

Scaling numeric data to a specific range

step_scale()

Scaling numeric data

Step Functions - Multivariate Transformations

step_classdist()

Distances to class centroids

step_classdist_shrunken()

Compute shrunken centroid distances for classification models

step_depth()

Data depths

step_geodist()

Distance between two locations

step_ica()

ICA signal extraction

step_isomap()

Isomap embedding

step_kpca()

Kernel PCA signal extraction

step_kpca_poly()

Polynomial kernel PCA signal extraction

step_kpca_rbf()

Radial basis function kernel PCA signal extraction

step_mutate_at()

Mutate multiple columns using dplyr

step_nnmf()

Non-negative matrix factorization signal extraction

step_nnmf_sparse()

Non-negative matrix factorization signal extraction with lasso penalization

step_pca()

PCA signal extraction

step_pls()

Partial least squares feature extraction

step_ratio() denom_vars()

Ratio variable creation

step_spatialsign()

Spatial sign preprocessing

Step Functions - Filters

step_corr()

High correlation filter

step_filter_missing()

Missing value column filter

step_lincomb()

Linear combination filter

step_nzv()

Near-zero variance filter

step_rm()

General variable filter

step_select()

Select variables using dplyr

step_zv()

Zero variance filter

Step Functions - Row Operations

step_arrange()

Sort rows using dplyr

step_filter()

Filter rows using dplyr

step_lag()

Create a lagged predictor

step_naomit()

Remove observations with missing values

step_impute_roll()

Impute numeric data using a rolling window statistic

step_sample()

Sample rows using dplyr

step_shuffle()

Shuffle variables

step_slice()

Filter rows by position using dplyr

Step Functions - Others

step_intercept()

Add intercept (or constant) column

step_profile()

Create a profiling version of a data set

step_rename()

Rename variables by name using dplyr

step_rename_at()

Rename multiple columns using dplyr

step_window()

Moving window functions

Check Functions

check_class()

Check variable class

check_cols()

Check if all columns are present

check_missing()

Check for missing values

check_new_values()

Check for new values

check_range()

Check range consistency

Developer Functions

developer_functions

Developer functions for creating recipes steps

add_step() add_check()

Add a New Operation to the Current Recipe

detect_step()

Detect if a particular step or check is used in a recipe

fully_trained()

Check to see if a recipe is trained/prepared

.get_data_types()

Get types for use in recipes

names0() dummy_names() dummy_extract_names()

Naming Tools

prepper()

Wrapper function for preparing recipes within resampling

recipes_argument_select()

Evaluate a selection with tidyselect semantics for arguments

recipes_eval_select()

Evaluate a selection with tidyselect semantics specific to recipes

recipes_extension_check()

Checks that steps have all S3 methods

recipes_ptype()

Prototype of recipe object

recipes_ptype_validate()

Validate prototype of recipe object

recipes_names_predictors() recipes_names_outcomes()

Role indicators

sparse_data

Using sparse data with recipes

update(<step>)

Update a recipe step

Tidy Methods

tidy(<step_BoxCox>) tidy(<step_YeoJohnson>) tidy(<step_arrange>) tidy(<step_bin2factor>) tidy(<step_bs>) tidy(<step_center>) tidy(<check_class>) tidy(<step_classdist>) tidy(<step_classdist_shrunken>) tidy(<check_cols>) tidy(<step_corr>) tidy(<step_count>) tidy(<step_cut>) tidy(<step_date>) tidy(<step_depth>) tidy(<step_discretize>) tidy(<step_dummy>) tidy(<step_dummy_extract>) tidy(<step_dummy_multi_choice>) tidy(<step_factor2string>) tidy(<step_filter>) tidy(<step_filter_missing>) tidy(<step_geodist>) tidy(<step_harmonic>) tidy(<step_holiday>) tidy(<step_hyperbolic>) tidy(<step_ica>) tidy(<step_impute_bag>) tidy(<step_impute_knn>) tidy(<step_impute_linear>) tidy(<step_impute_lower>) tidy(<step_impute_mean>) tidy(<step_impute_median>) tidy(<step_impute_mode>) tidy(<step_impute_roll>) tidy(<step_indicate_na>) tidy(<step_integer>) tidy(<step_interact>) tidy(<step_intercept>) tidy(<step_inverse>) tidy(<step_invlogit>) tidy(<step_isomap>) tidy(<step_kpca>) tidy(<step_kpca_poly>) tidy(<step_kpca_rbf>) tidy(<step_lag>) tidy(<step_lincomb>) tidy(<step_log>) tidy(<step_logit>) tidy(<check_missing>) tidy(<step_mutate>) tidy(<step_mutate_at>) tidy(<step_naomit>) tidy(<check_new_values>) tidy(<step_nnmf>) tidy(<step_nnmf_sparse>) tidy(<step_normalize>) tidy(<step_novel>) tidy(<step_ns>) tidy(<step_num2factor>) tidy(<step_nzv>) tidy(<step_ordinalscore>) tidy(<step_other>) tidy(<step_pca>) tidy(<step_percentile>) tidy(<step_pls>) tidy(<step_poly>) tidy(<step_poly_bernstein>) tidy(<step_profile>) tidy(<step_range>) tidy(<check_range>) tidy(<step_ratio>) tidy(<step_regex>) tidy(<step_relevel>) tidy(<step_relu>) tidy(<step_rename>) tidy(<step_rename_at>) tidy(<step_rm>) tidy(<step_sample>) tidy(<step_scale>) tidy(<step_select>) tidy(<step_shuffle>) tidy(<step_slice>) tidy(<step_spatialsign>) tidy(<step_spline_b>) tidy(<step_spline_convex>) tidy(<step_spline_monotone>) tidy(<step_spline_natural>) tidy(<step_spline_nonnegative>) tidy(<step_sqrt>) tidy(<step_string2factor>) tidy(<recipe>) tidy(<step>) tidy(<check>) tidy(<step_time>) tidy(<step_unknown>) tidy(<step_unorder>) tidy(<step_window>) tidy(<step_zv>)

Tidy the result of a recipe