summaryP.Rd
summaryP
produces a tall and thin data frame containing
numerators (freq
) and denominators (denom
) after
stratifying the data by a series of variables. A special capability
to group a series of related yes/no variables is included through the
use of the ynbind
function, for which the user specials a final
argument label
used to label the panel created for that group
of related variables.
If options(grType='plotly')
is not in effect,
the plot
method for summaryP
displays proportions as a
multi-panel dot chart using the lattice
package's dotplot
function with a special panel
function. Numerators and
denominators of proportions are also included as text, in the same
colors as used by an optional groups
variable. The
formula
argument used in the dotplot
call is constructed,
but the user can easily reorder the variables by specifying
formula
, with elements named val
(category levels),
var
(classification variable name), freq
(calculated
result) plus the overall cross-classification variables excluding
groups
. If options(grType='plotly')
is in effect, the
plot
method makes an entirely different display using
Hmisc::dotchartpl
with plotly
if marginVal
is
specified, whereby a stratification
variable causes more finely stratified estimates to be shown slightly
below the lines, with smaller and translucent symbols if data
has been run through addMarginal
. The marginal summaries are
shown as the main estimates and the user can turn off display of the
stratified estimates, or view their details with hover text.
The ggplot
method for summaryP
does not draw numerators
and denominators but the chart is more compact than using the
plot
method with base graphics because ggplot2
does not repeat category names the same way as lattice
does.
Variable names that are too long to fit in panel strips are renamed
(1), (2), etc. and an attribute "fnvar"
is added to the result;
this attribute is a character string defining the abbreviations,
useful in a figure caption. The ggplot2
object has
label
s for points plotted, used by plotly::ggplotly
as
hover text (see example).
The latex
method produces one or more LaTeX tabular
s
containing a table representation of the result, with optional
side-by-side display if groups
is specified. Multiple
tabular
s result from the presence of non-group stratification
factors.
summaryP(formula, data = NULL, subset = NULL,
na.action = na.retain, sort=TRUE,
asna = c("unknown", "unspecified"), ...)
# S3 method for class 'summaryP'
plot(x, formula=NULL, groups=NULL,
marginVal=NULL, marginLabel=marginVal,
refgroup=NULL, exclude1=TRUE, xlim = c(-.05, 1.05),
text.at=NULL, cex.values = 0.5,
key = list(columns = length(groupslevels), x = 0.75,
y = -0.04, cex = 0.9,
col = lattice::trellis.par.get('superpose.symbol')$col,
corner=c(0,1)),
outerlabels=TRUE, autoarrange=TRUE,
col=colorspace::rainbow_hcl, ...)
# S3 method for class 'summaryP'
ggplot(data, mapping, groups=NULL, exclude1=TRUE,
xlim=c(0, 1), col=NULL, shape=NULL, size=function(n) n ^ (1/4),
sizerange=NULL, abblen=5, autoarrange=TRUE, addlayer=NULL,
..., environment)
# S3 method for class 'summaryP'
latex(object, groups=NULL, exclude1=TRUE, file='', round=3,
size=NULL, append=TRUE, ...)
a formula with the variables for whose levels
proportions are computed on the left hand side, and major
classification variables on the right. The formula need to include
any variable later used as groups
, as the data summarization
does not distinguish between superpositioning and paneling. For the
plot method, formula
can provide an overall to the default
formula for dotplot()
.
an optional data frame. For ggplot.summaryP
data
is the result of summaryP
.
an optional subsetting expression or vector
function specifying how to handle NA
s. The
default is to keep all NA
s in the analysis frame.
set to FALSE
to not sort category levels in
descending order of global proportions
character vector specifying level names to consider the
same as NA
. Set asna=NULL
to not consider any.
an object produced by summaryP
a character string containing the name of a superpositioning variable for obtaining further stratification within a horizontal line in the dot chart.
if options(grType='plotly')
is in effect and
the data given to summaryP
were run through addMarginal
,
specifies the category name that represents marginal summaries
(usually "All"
).
specifies a different character string to use than
the value of marginVal
. For example, if marginal proportions
were computed over all region
s, one may specify
marginVal="All", marginLabel="All Regions"
. marginLabel
is only used for formatting graphical output.
used when doing a plotly
chart and a two-level
group variable was used, resulting in the half-width confidence
interval for the difference in two proportions to be shown, and the
actual confidence limits and the difference added to hover text. See
dotchartpl
for more details.
By default, ggplot
, plot
, and
latex
methods for summaryP
remove redundant entries
from tables for variables with only two levels. For example, if you
print the proportion of females, you don't need to print the
proportion of males. To override this, set exclude1=FALSE
.
x
-axis limits. Default is c(0,1)
.
specify to leave unused space to the right of each
panel to prevent numerators and denominators from touching data
points. text.at
is the upper limit for scaling panels'
x
-axes but tick marks are only labeled up to max(xlim)
.
character size to use for plotting numerators and denominators
a list to pass to the auto.key
argument of
dotplot
. To place a key above the entire chart use
auto.key=list(columns=2)
for example.
by default if there are two conditioning variables
besides groups
, the latticeExtra
package's
useOuterStrips
function is used to put strip labels in the
margins, usually resulting in a much prettier chart. Set to
FALSE
to prevent usage of useOuterStrips
.
If TRUE
, the formula is re-arranged so that
if there are two conditioning (paneling) variables, the variable with
the most levels is taken as the vertical condition.
a vector of colors to use to override defaults in
ggplot
. When options(grType='plotly')
, see dotchartpl
.
a vector of plotting symbols to override ggplot
defaults
not used; needed because of rules for generics
for ggplot
, a function that transforms denominators
into metrics used for the size
aesthetic. Default is the
fourth root function so that the area of symbols is proportional to
the square root of sample size. Specify NULL
to not vary point
sizes. size=sqrt
is a reasonable alternative. Set
size
to an integer to categorize the denominators into
size
quantile groups using cut2
. Unless size
is
an integer, the legend for sizes uses the minimum and maximum
denominators and 6-tiles using quantile(..., type=1)
so that
actually occurring sample sizes are used as labels. size
is
overridden to NULL
if the range in denominators is less than 10
or the ratio of the maximum to the minimum is less than 1.2.
For latex
, size
is an optional font size such as
"small"
a 2-vector specifying the range
argument to the
ggplot2
scale_size_...
function, which is the
range of sizes allowed for the points according to the denominator.
The default is sizerange=c(.7, 3.25)
but the lower limit is
increased according to the ratio of maximum to minimum sample sizes.
labels of variables having only one level and having
their name longer than abblen
characters are
abbreviated and documented in fnvar
(described elsewhere
here). The default abblen=5
is good for labels plotted
vertically. If labels are rotated using theme
a better value
would be 12.
used only for plotly
graphics and these arguments
are passed to dotchartpl
an object produced by summaryP
file name, defaults to writing to console
number of digits to the right of the decimal place for proportions
set to FALSE
to start output over
a ggplot
layer to add to the plot object
summaryP
produces a data frame of class
"summaryP"
. The plot
method produces a lattice
object of class "trellis"
. The latex
method produces an
object of class "latex"
with an additional attribute
ngrouplevels
specifying the number of levels of any
groups
variable and an attribute nstrata
specifying the
number of strata.
n <- 100
f <- function(na=FALSE) {
x <- sample(c('N', 'Y'), n, TRUE)
if(na) x[runif(100) < .1] <- NA
x
}
set.seed(1)
d <- data.frame(x1=f(), x2=f(), x3=f(), x4=f(), x5=f(), x6=f(), x7=f(TRUE),
age=rnorm(n, 50, 10),
race=sample(c('Asian', 'Black/AA', 'White'), n, TRUE),
sex=sample(c('Female', 'Male'), n, TRUE),
treat=sample(c('A', 'B'), n, TRUE),
region=sample(c('North America','Europe'), n, TRUE))
d <- upData(d, labels=c(x1='MI', x2='Stroke', x3='AKI', x4='Migraines',
x5='Pregnant', x6='Other event', x7='MD withdrawal',
race='Race', sex='Sex'))
#> Input object size: 13016 bytes; 12 variables 100 observations
#> New object size: 17800 bytes; 12 variables 100 observations
dasna <- subset(d, region=='North America')
with(dasna, table(race, treat))
#> treat
#> race A B
#> Asian 8 10
#> Black/AA 13 6
#> White 4 10
s <- summaryP(race + sex + ynbind(x1, x2, x3, x4, x5, x6, x7, label='Exclusions') ~
region + treat, data=d)
# add exclude1=FALSE below to include female category
plot(s, groups='treat')
require(ggplot2)
ggplot(s, groups='treat')
plot(s, val ~ freq | region * var, groups='treat', outerlabels=FALSE)
# Much better looking if omit outerlabels=FALSE; see output at
# https://hbiostat.org/R/Hmisc/summaryFuns.pdf
# See more examples under bpplotM
## For plotly interactive graphic that does not handle variable size
## panels well:
## require(plotly)
## g <- ggplot(s, groups='treat')
## ggplotly(g, tooltip='text')
## For nice plotly interactive graphic:
## options(grType='plotly')
## s <- summaryP(race + sex + ynbind(x1, x2, x3, x4, x5, x6, x7,
## label='Exclusions') ~
## treat, data=subset(d, region='Europe'))
##
## plot(s, groups='treat', refgroup='A') # refgroup='A' does B-A differences
# Make a chart where there is a block of variables that
# are only analyzed for males. Keep redundant sex in block for demo.
# Leave extra space for numerators, denominators
sb <- summaryP(race + sex +
pBlock(race, sex, label='Race: Males', subset=sex=='Male') ~
region, data=d)
plot(sb, text.at=1.3)
plot(sb, groups='region', layout=c(1,3), key=list(space='top'),
text.at=1.15)
ggplot(sb, groups='region')
if (FALSE) { # \dontrun{
plot(s, groups='treat')
# plot(s, groups='treat', outerlabels=FALSE) for standard lattice output
plot(s, groups='region', key=list(columns=2, space='bottom'))
require(ggplot2)
colorFacet(ggplot(s))
plot(summaryP(race + sex ~ region, data=d), exclude1=FALSE, col='green')
require(lattice)
# Make your own plot using data frame created by summaryP
useOuterStrips(dotplot(val ~ freq | region * var, groups=treat, data=s,
xlim=c(0,1), scales=list(y='free', rot=0), xlab='Fraction',
panel=function(x, y, subscripts, ...) {
denom <- s$denom[subscripts]
x <- x / denom
panel.dotplot(x=x, y=y, subscripts=subscripts, ...) }))
# Show marginal summary for all regions combined
s <- summaryP(race + sex ~ region, data=addMarginal(d, region))
plot(s, groups='region', key=list(space='top'), layout=c(1,2))
# Show marginal summaries for both race and sex
s <- summaryP(ynbind(x1, x2, x3, x4, label='Exclusions', sort=FALSE) ~
race + sex, data=addMarginal(d, race, sex))
plot(s, val ~ freq | sex*race)
} # }