Skip to contents

Generator method for creating a single-column Spark dataframes comprised of i.i.d. samples from a log normal distribution.

Usage

sdf_rlnorm(
  sc,
  n,
  meanlog = 0,
  sdlog = 1,
  num_partitions = NULL,
  seed = NULL,
  output_col = "x"
)

Arguments

sc

A Spark connection.

n

Sample Size (default: 1000).

meanlog

The mean of the normally distributed natural logarithm of this distribution.

sdlog

The Standard deviation of the normally distributed natural logarithm of this distribution.

num_partitions

Number of partitions in the resulting Spark dataframe (default: default parallelism of the Spark cluster).

seed

Random seed (default: a random long integer).

output_col

Name of the output column containing sample values (default: "x").