Spark Sessions |
|
|---|---|
|
|
Manage Spark Connections |
Read Spark Configuration |
|
|
|
Download and install various versions of Spark |
View Entries in the Spark Log |
|
Open the Spark web interface |
|
Spark Data |
|
Read file(s) into a Spark DataFrame using a custom reader |
|
Read Apache Avro data into a Spark DataFrame. |
|
Read binary data into a Spark DataFrame. |
|
Read a CSV file into a Spark DataFrame |
|
Read from Delta Lake into a Spark DataFrame. |
|
Read image data into a Spark DataFrame. |
|
Read from JDBC connection into a Spark DataFrame. |
|
Read a JSON file into a Spark DataFrame |
|
Read libsvm file into a Spark DataFrame. |
|
Read a ORC file into a Spark DataFrame |
|
Read a Parquet file into a Spark DataFrame |
|
Read from a generic source into a Spark DataFrame. |
|
Reads from a Spark Table into a Spark DataFrame. |
|
Read a Text file into a Spark DataFrame |
|
Write Spark DataFrame to file using a custom writer |
|
Serialize a Spark DataFrame into Apache Avro format |
|
Write a Spark DataFrame to a CSV |
|
Writes a Spark DataFrame into Delta Lake |
|
Writes a Spark DataFrame into a JDBC table |
|
Write a Spark DataFrame to a JSON file |
|
Write a Spark DataFrame to a ORC file |
|
Write a Spark DataFrame to a Parquet file |
|
Write Spark DataFrame to RDS files |
|
Writes a Spark DataFrame into a generic source |
|
Writes a Spark DataFrame into a Spark table |
|
Write a Spark DataFrame to a Text file |
|
Inserts a Spark DataFrame into a Spark table |
|
Saves a Spark DataFrame as a Spark table |
|
Collect Spark data serialized in RDS format into R |
|
Spark Tables |
|
Show database list |
|
Cache a Spark Table |
|
Use specific database |
|
Uncache a Spark Table |
|
Spark DataFrames |
|
dplyr wrappers for Apache Spark higher order functions |
|
|
|
Save / Load a Spark DataFrame |
|
|
Spark ML – Transform, fit, and predict methods (sdf_ interface) |
Create DataFrame for along Object |
|
Bind multiple Spark DataFrames by row and column |
|
Broadcast hint |
|
Checkpoint a Spark DataFrame |
|
Coalesces a Spark DataFrame |
|
Collect a Spark DataFrame into R. |
|
Copy an Object into Spark |
|
Cross Tabulation |
|
Debug Info for Spark DataFrame |
|
Compute summary statistics for columns of a data frame |
|
Support for Dimension Operations |
|
Invoke distinct on a Spark DataFrame |
|
Remove duplicates from a Spark DataFrame |
|
Create a Spark dataframe containing all combinations of inputs |
|
Convert column(s) from avro format |
|
Spark DataFrame is Streaming |
|
Returns the last index of a Spark DataFrame |
|
Create DataFrame for Length |
|
Gets number of partitions of a Spark DataFrame |
|
Compute the number of records within each partition of a Spark DataFrame |
|
Persist a Spark DataFrame |
|
Pivot a Spark DataFrame |
|
Project features onto principal components |
|
Compute (Approximate) Quantiles with a Spark DataFrame |
|
Partition a Spark Dataframe |
|
Generate random samples from a Beta distribution |
|
Generate random samples from a binomial distribution |
|
Generate random samples from a Cauchy distribution |
|
Generate random samples from a chi-squared distribution |
|
Read a Column from a Spark DataFrame |
|
Register a Spark DataFrame |
|
Repartition a Spark DataFrame |
|
Model Residuals |
|
Generate random samples from an exponential distribution |
|
Generate random samples from a Gamma distribution |
|
Generate random samples from a geometric distribution |
|
Generate random samples from a hypergeometric distribution |
|
Generate random samples from a log normal distribution |
|
Generate random samples from the standard normal distribution |
|
Generate random samples from a Poisson distribution |
|
Generate random samples from a t-distribution |
|
Generate random samples from the uniform distribution U(0, 1). |
|
Generate random samples from a Weibull distribution. |
|
Randomly Sample Rows from a Spark DataFrame |
|
Read the Schema of a Spark DataFrame |
|
Separate a Vector Column into Scalar Columns |
|
Create DataFrame for Range |
|
Sort a Spark DataFrame |
|
Spark DataFrame from SQL |
|
Convert column(s) to avro format |
|
Unnest longer |
|
Unnest wider |
|
Perform Weighted Random Sampling on a Spark DataFrame |
|
Add a Sequential ID Column to a Spark DataFrame |
|
Add a Unique ID Column to a Spark DataFrame |
|
Apply Aggregate Function to Array Column |
|
Sorts array using a custom comparator |
|
Determine Whether Some Element Exists in an Array Column |
|
Filter Array Column |
|
Checks whether all elements in an array satisfy a predicate |
|
Filters a map |
|
Merges two maps into one |
|
Transform Array Column |
|
Transforms keys of a map |
|
Transforms values of a map |
|
Combines 2 Array Columns |
|
transform a subset of column(s) in a Spark Dataframe |
|
Spark ML - Regression |
|
Spark ML – Linear Regression |
|
Spark ML – Survival Regression |
|
Spark ML – Isotonic Regression |
|
Spark ML – Generalized Linear Regression |
|
Spark ML - Classification |
|
Spark ML – Naive-Bayes |
|
Spark ML – OneVsRest |
|
Spark ML – Logistic Regression |
|
|
|
Spark ML – Multilayer Perceptron |
Spark ML – LinearSVC |
|
Spark ML - Tree |
|
|
|
Spark ML – Decision Trees |
|
|
Spark ML – Gradient Boosted Trees |
|
|
Spark ML – Random Forest |
Spark ML - Feature Importance for Tree Models |
|
Spark ML - Clustering |
|
|
|
Spark ML – K-Means Clustering |
Evaluate a K-mean clustering |
|
Spark ML – Bisecting K-Means Clustering |
|
Spark ML – Gaussian Mixture clustering. |
|
Spark ML – Power Iteration Clustering |
|
Spark ML - Text |
|
|
|
Spark ML – Latent Dirichlet Allocation |
Chi-square hypothesis testing for categorical data. |
|
Default stop words |
|
Frequent Pattern Mining – FPGrowth |
|
Frequent Pattern Mining – PrefixSpan |
|
Feature Transformation – CountVectorizer (Estimator) |
|
Spark ML - Recommendations |
|
Spark ML – ALS |
|
Spark ML - Hyper-parameter tuning |
|
|
|
Spark ML – Tuning |
Spark ML - Evaluation |
|
Extracts metrics from a fitted table |
|
Extracts metrics from a fitted table |
|
Extracts metrics from a fitted table |
|
Evaluate the Model on a Validation Set |
|
|
|
Spark ML - Evaluators |
Spark ML - Clustering Evaluator |
|
Spark ML - Operations |
|
Extracts data associated with a Spark ML model |
|
Wrap a Spark ML JVM object |
|
Compute correlation matrix |
|
|
|
Spark ML – Transform, fit, and predict methods (ml_ interface) |
Feature Transformation – StringIndexer (Estimator) |
|
Spark ML – Model Persistence |
|
Spark ML – ML Params |
|
Standardize Formula Input for `ml_model` |
|
Spark ML – Extraction of summary metrics |
|
|
|
Constructors for `ml_model` Objects |
Spark ML – UID |
|
Spark Pipelines |
|
Spark ML – Pipelines |
|
Spark ML – Pipeline stage extraction |
|
Add a Stage to a Pipeline |
|
Spark Feature Transformers |
|
Feature Transformation – Binarizer (Transformer) |
|
Feature Transformation – Bucketizer (Transformer) |
|
Feature Transformation – ChiSqSelector (Estimator) |
|
Feature Transformation – CountVectorizer (Estimator) |
|
Feature Transformation – Discrete Cosine Transform (DCT) (Transformer) |
|
Feature Transformation – ElementwiseProduct (Transformer) |
|
Feature Transformation – FeatureHasher (Transformer) |
|
Feature Transformation – HashingTF (Transformer) |
|
Feature Transformation – IDF (Estimator) |
|
Feature Transformation – Imputer (Estimator) |
|
Feature Transformation – IndexToString (Transformer) |
|
Feature Transformation – Interaction (Transformer) |
|
Feature Transformation – LSH (Estimator) |
|
Utility functions for LSH models |
|
Feature Transformation – MaxAbsScaler (Estimator) |
|
Feature Transformation – MinMaxScaler (Estimator) |
|
Feature Transformation – NGram (Transformer) |
|
Feature Transformation – Normalizer (Transformer) |
|
Feature Transformation – OneHotEncoder (Transformer) |
|
Feature Transformation – OneHotEncoderEstimator (Estimator) |
|
Feature Transformation – PCA (Estimator) |
|
Feature Transformation – PolynomialExpansion (Transformer) |
|
Feature Transformation – QuantileDiscretizer (Estimator) |
|
Feature Transformation – RFormula (Estimator) |
|
Feature Transformation – RegexTokenizer (Transformer) |
|
Feature Transformation – RobustScaler (Estimator) |
|
Feature Transformation – StandardScaler (Estimator) |
|
Feature Transformation – StopWordsRemover (Transformer) |
|
Feature Transformation – StringIndexer (Estimator) |
|
Feature Transformation – Tokenizer (Transformer) |
|
Feature Transformation – VectorAssembler (Transformer) |
|
Feature Transformation – VectorIndexer (Estimator) |
|
Feature Transformation – VectorSlicer (Transformer) |
|
Feature Transformation – Word2Vec (Estimator) |
|
Feature Transformation – SQLTransformer |
|
Extensions |
|
|
|
Constructors for `ml_model` Objects |
Compile Scala sources into a Java Archive (jar) |
|
Read configuration values for a connection |
|
Downloads default Scala Compilers |
|
Discover the Scala Compiler |
|
|
|
Access the Spark API |
Runtime configuration interface for Hive |
|
Invoke a Method on a JVM Object |
|
Invoke a Java function. |
|
Instantiate a Java array with a specific element type. |
|
Instantiate a Java float type. |
|
Instantiate an Array[Float]. |
|
Register a Package that Implements a Spark Extension |
|
Define a Spark Compilation Specification |
|
Default Compilation Specification for Spark Extensions |
|
Runtime configuration interface for the Spark Context. |
|
Retrieve a Spark DataFrame |
|
Define a Spark dependency |
|
Set the SPARK_HOME environment variable |
|
Retrieve a Spark JVM Object Reference |
|
Get the Spark Version Associated with a Spark Connection |
|
Distributed Computing |
|
Apply an R Function in Spark |
|
Create Bundle for Spark Apply |
|
Log Writer for Spark Apply |
|
Register a Parallel Backend |
|
Livy |
|
Create a Spark Configuration for Livy |
|
Start Livy |
|
Streaming |
|
Find Stream |
|
Generate Test Stream |
|
Spark Stream's Identifier |
|
Apply lag function to columns of a Spark Streaming DataFrame |
|
Spark Stream's Name |
|
|
|
Read files created by the stream |
Render Stream |
|
Stream Statistics |
|
Stops a Spark Stream |
|
Spark Stream Continuous Trigger |
|
Spark Stream Interval Trigger |
|
View Stream |
|
Watermark Stream |
|
|
|
Write files to the stream |
Write Memory Stream |
|
Write Stream to Table |
|
Reactive spark reader |
|
dplyr integration |
|
Copy an R Data Frame to Spark |
|
Distinct |
|
Filter |
|
Full join |
|
Inner join |
|
|
|
Join Spark tbls. |
Left join |
|
Mutate |
|
Right join |
|
Select |
|
tidyr integration |
|
Pivot longer |
|
Pivot wider |
|
Fill |
|
Replace Missing Values in Objects |
|
Nest |
|
Replace NA |
|
Separate |
|
Unite |
|
Unnest |
|
tidymodels integration |
|
|
|
Tidying methods for Spark ML ALS |
|
|
Tidying methods for Spark ML linear models |
|
|
Tidying methods for Spark ML Isotonic Regression |
|
|
Tidying methods for Spark ML LDA models |
|
|
Tidying methods for Spark ML linear svc |
|
|
Tidying methods for Spark ML Logistic Regression |
|
|
Tidying methods for Spark ML MLP |
|
|
Tidying methods for Spark ML Naive Bayes |
|
|
Tidying methods for Spark ML Principal Component Analysis |
|
|
Tidying methods for Spark ML Survival Regression |
|
|
Tidying methods for Spark ML tree models |
|
|
Tidying methods for Spark ML unsupervised models |
Spark Operations |
|
Retrieve the Spark connection's SQL catalog implementation property |
|
Check whether the connection is open |
|
A Shiny app that can be used to construct a |
|
Runtime configuration interface for the Spark Session |
|
Set/Get Spark checkpoint directory |
|
|
|
Manage Spark Connections |
Generate a Table Name from Expression |
|
|
|
Download and install various versions of Spark |
Get the Spark Version Associated with a Spark Installation |
|
Returns a data frame of available Spark versions that can be installed. |
|
Kubernetes Configuration |
|
Retrieve Available Settings |
|
Find Spark Connection |
|
Fallback to Spark Dependency |
|
Create Spark Extension |
|
Reads from a Spark Table into a Spark DataFrame. |
|
list all sparklyr-*.jar files that have been built |
|
Creates Spark Configuration |
|
Retrieve the Spark Connection Associated with an R Object |
|
Retrieves or sets status of Spark AQE |
|
Retrieves or sets advisory size of the shuffle partition |
|
Retrieves or sets the auto broadcast join threshold |
|
Retrieves or sets initial number of shuffle partitions before coalescing |
|
Retrieves or sets the minimum number of shuffle partitions after coalescing |
|
Retrieves or sets whether coalescing contiguous shuffle partitions is enabled |
|
spark_connection class |
|
spark_jobj class |
|
Return the port number of a `sparklyr` backend. |
|
Other |
|
Generate random samples from some distribution |
|
Enforce Specific Structure for R Objects |
|
Random string generation |
|
Infix operator for composing a lambda expression |
|
Subsetting operator for Spark dataframe |
|
Generic Call Interface |
|
Function that negotiates the connection with the Spark back-end |
|
|
|
Set of functions to provide integration with the RStudio IDE |
It lets the package know if it should test a particular functionality or not |
|
Surfaces the last error from Spark captured by internal `spark_error` function |
|