Skip to contents

fndistinct is a generic function that (column-wise) computes the number of distinct values in x, (optionally) grouped by g. It is significantly faster than length(unique(x)). The TRA argument can further be used to transform x using its (grouped) distinct value count.

Usage

fndistinct(x, ...)

# Default S3 method
fndistinct(x, g = NULL, TRA = NULL, na.rm = .op[["na.rm"]],
           use.g.names = TRUE, nthreads = .op[["nthreads"]], ...)

# S3 method for class 'matrix'
fndistinct(x, g = NULL, TRA = NULL, na.rm = .op[["na.rm"]],
           use.g.names = TRUE, drop = TRUE, nthreads = .op[["nthreads"]], ...)

# S3 method for class 'data.frame'
fndistinct(x, g = NULL, TRA = NULL, na.rm = .op[["na.rm"]],
           use.g.names = TRUE, drop = TRUE, nthreads = .op[["nthreads"]], ...)

# S3 method for class 'grouped_df'
fndistinct(x, TRA = NULL, na.rm = .op[["na.rm"]],
           use.g.names = FALSE, keep.group_vars = TRUE, nthreads = .op[["nthreads"]], ...)

Arguments

x

a vector, matrix, data frame or grouped data frame (class 'grouped_df').

g

a factor, GRP object, atomic vector (internally converted to factor) or a list of vectors / factors (internally converted to a GRP object) used to group x.

TRA

an integer or quoted operator indicating the transformation to perform: 0 - "na" | 1 - "fill" | 2 - "replace" | 3 - "-" | 4 - "-+" | 5 - "/" | 6 - "%" | 7 - "+" | 8 - "*" | 9 - "%%" | 10 - "-%%". See TRA.

na.rm

logical. TRUE: Skip missing values in x (faster computation). FALSE: Also consider 'NA' as one distinct value.

use.g.names

logical. Make group-names and add to the result as names (default method) or row-names (matrix and data frame methods). No row-names are generated for data.table's.

nthreads

integer. The number of threads to utilize. Parallelism is across groups for grouped computations and at the column-level otherwise.

drop

matrix and data.frame method: Logical. TRUE drops dimensions and returns an atomic vector if g = NULL and TRA = NULL.

keep.group_vars

grouped_df method: Logical. FALSE removes grouping variables after computation.

...

arguments to be passed to or from other methods. If TRA is used, passing set = TRUE will transform data by reference and return the result invisibly.

Details

fndistinct implements a pretty fast C-level hashing algorithm inspired by the kit package to find the number of distinct values.

If na.rm = TRUE (the default), missing values will be skipped yielding substantial performance gains in data with many missing values. If na.rm = FALSE, missing values will simply be treated as any other value and read into the hash-map. Thus with the former, a numeric vector c(1.25,NaN,3.56,NA) will have a distinct value count of 2, whereas the latter will return a distinct value count of 4.

fndistinct preserves all attributes of non-classed vectors / columns, and only the 'label' attribute (if available) of classed vectors / columns (i.e. dates or factors). When applied to data frames and matrices, the row-names are adjusted as necessary.

Value

Integer. The number of distinct values in x, grouped by g, or (if TRA is used) x transformed by its distinct value count, grouped by g.

Examples

## default vector method
fndistinct(airquality$Solar.R)                   # Simple distinct value count
#> [1] 117
fndistinct(airquality$Solar.R, airquality$Month) # Grouped distinct value count
#>  5  6  7  8  9 
#> 27 28 29 27 27 

## data.frame method
fndistinct(airquality)
#>   Ozone Solar.R    Wind    Temp   Month     Day 
#>      67     117      31      40       5      31 
fndistinct(airquality, airquality$Month)
#>   Ozone Solar.R Wind Temp Month Day
#> 5    21      27   18   18     1  31
#> 6     9      28   16   19     1  30
#> 7    24      29   17   14     1  31
#> 8    24      27   18   19     1  31
#> 9    21      27   19   20     1  30
fndistinct(wlddev)                               # Works with data of all types!
#> country   iso3c    date    year  decade  region  income    OECD   PCGDP  LIFEEX 
#>     216     216      61      61       7       7       4       2    9470   10548 
#>    GINI     ODA     POP 
#>     368    7832   12877 
head(fndistinct(wlddev, wlddev$iso3c))
#>     country iso3c date year decade region income OECD PCGDP LIFEEX GINI ODA POP
#> ABW       1     1   61   61      7      1      1    1    32     60    0  20  60
#> AFG       1     1   61   61      7      1      1    1    18     60    0  60  60
#> AGO       1     1   61   61      7      1      1    1    40     59    3  58  60
#> ALB       1     1   61   61      7      1      1    1    40     59    9  32  60
#> AND       1     1   61   61      7      1      1    1    50      0    0   0  60
#> ARE       1     1   61   61      7      1      1    1    45     60    2  45  60

## matrix method
aqm <- qM(airquality)
fndistinct(aqm)                                  # Also works for character or logical matrices
#>   Ozone Solar.R    Wind    Temp   Month     Day 
#>      67     117      31      40       5      31 
fndistinct(aqm, airquality$Month)
#>   Ozone Solar.R Wind Temp Month Day
#> 5    21      27   18   18     1  31
#> 6     9      28   16   19     1  30
#> 7    24      29   17   14     1  31
#> 8    24      27   18   19     1  31
#> 9    21      27   19   20     1  30

## method for grouped data frames - created with dplyr::group_by or fgroup_by
airquality |> fgroup_by(Month) |> fndistinct()
#>   Month Ozone Solar.R Wind Temp Day
#> 1     5    21      27   18   18  31
#> 2     6     9      28   16   19  30
#> 3     7    24      29   17   14  31
#> 4     8    24      27   18   19  31
#> 5     9    21      27   19   20  30
wlddev |> fgroup_by(country) |>
             fselect(PCGDP,LIFEEX,GINI,ODA) |> fndistinct()
#>                            country PCGDP LIFEEX GINI ODA
#> 1                      Afghanistan    18     60    0  60
#> 2                          Albania    40     59    9  32
#> 3                          Algeria    60     60    3  60
#> 4                   American Samoa    17      0    0   0
#> 5                          Andorra    50      0    0   0
#> 6                           Angola    40     59    3  58
#> 7              Antigua and Barbuda    43     60    0  47
#> 8                        Argentina    60     60   29  60
#> 9                          Armenia    30     59   20  29
#> 10                           Aruba    32     60    0  20
#> 11                       Australia    60     59    9   0
#> 12                         Austria    60     60   16   0
#> 13                      Azerbaijan    30     60    5  29
#> 14                    Bahamas, The    60     59    0  41
#> 15                         Bahrain    40     60    0  45
#> 16                      Bangladesh    60     60    9  49
#> 17                        Barbados    46     60    0  45
#> 18                         Belarus    30     59   19  29
#> 19                         Belgium    60     60   16   0
#> 20                          Belize    60     59    6  60
#> 21                           Benin    60     60    3  60
#> 22                         Bermuda    60     23    0  34
#> 23                          Bhutan    40     60    4  56
#> 24                         Bolivia    60     60   21  60
#> 25          Bosnia and Herzegovina    26     60    4  30
#> 26                        Botswana    60     60    5  60
#> 27                          Brazil    60     60   29  60
#> 28          British Virgin Islands     0      0    0  38
#> 29               Brunei Darussalam    46     60    0  40
#> 30                        Bulgaria    40     60   12  15
#> 31                    Burkina Faso    60     60    5  60
#> 32                         Burundi    60     59    4  60
#> 33                      Cabo Verde    40     60    3  49
#> 34                        Cambodia    27     60    0  60
#> 35                        Cameroon    60     60    4  60
#> 36                          Canada    60     57   13   0
#> 37                  Cayman Islands    13      1    0  30
#> 38        Central African Republic    60     60    2  60
#> 39                            Chad    60     60    2  60
#> 40                 Channel Islands     0     60    0   0
#> 41                           Chile    60     60   13  57
#> 42                           China    60     60   13  41
#> 43                        Colombia    60     60   18  60
#> 44                         Comoros    40     60    2  54
#> 45                Congo, Dem. Rep.    60     60    2  60
#> 46                     Congo, Rep.    60     60    2  60
#> 47                      Costa Rica    60     60   23  60
#> 48                   Cote d'Ivoire    60     60   10  60
#> 49                         Croatia    25     60   10  19
#> 50                            Cuba    49     60    0  60
#> 51                         Curacao     0     11    0   0
#> 52                          Cyprus    45     60   14  45
#> 53                  Czech Republic    30     60   16  15
#> 54                         Denmark    60     60   16   0
#> 55                        Djibouti     1     59    4  54
#> 56                        Dominica    43      5    0  47
#> 57              Dominican Republic    60     60   20  60
#> 58                         Ecuador    60     60   18  60
#> 59                Egypt, Arab Rep.    60     60    8  60
#> 60                     El Salvador    55     60   25  60
#> 61               Equatorial Guinea    40     60    0  47
#> 62                         Eritrea    20     60    0  44
#> 63                         Estonia    27     60   15  14
#> 64                        Eswatini    50     60    4  60
#> 65                        Ethiopia    39     60    5  60
#> 66                   Faroe Islands     1     36    0   0
#> 67                            Fiji    60     60    3  59
#> 68                         Finland    60     60   15   0
#> 69                          France    60     59   18   0
#> 70                French Polynesia     0     60    0  39
#> 71                           Gabon    60     60    2  60
#> 72                     Gambia, The    54     60    4  60
#> 73                         Georgia    55     60   23  29
#> 74                         Germany    50     60   16   0
#> 75                           Ghana    60     60    7  60
#> 76                       Gibraltar     0      0    0  44
#> 77                          Greece    60     59   16   0
#> 78                       Greenland    49     40    0   0
#> 79                         Grenada    43     60    0  47
#> 80                            Guam    17     60    0   0
#> 81                       Guatemala    60     60    5  60
#> 82                          Guinea    34     60    5  60
#> 83                   Guinea-Bissau    50     60    3  52
#> 84                          Guyana    60     60    1  59
#> 85                           Haiti    60     60    1  60
#> 86                        Honduras    60     60   23  60
#> 87            Hong Kong SAR, China    59     59    0  44
#> 88                         Hungary    29     59   15  15
#> 89                         Iceland    60     59   12   0
#> 90                           India    60     60    6  60
#> 91                       Indonesia    60     60   24  60
#> 92              Iran, Islamic Rep.    60     60   12  60
#> 93                            Iraq    52     60    2  60
#> 94                         Ireland    50     60   17   0
#> 95                     Isle of Man    35      2    0   0
#> 96                          Israel    60     56   11  45
#> 97                           Italy    60     60   17   0
#> 98                         Jamaica    54     59    7  59
#> 99                           Japan    60     60    3   0
#> 100                         Jordan    44     60    7  60
#> 101                     Kazakhstan    30     59   17  29
#> 102                          Kenya    60     60    5  60
#> 103                       Kiribati    50     60    1  60
#> 104                    Korea, Rep.    60     59    6  45
#> 105                         Kosovo    20     38   10  11
#> 106                         Kuwait    25     60    0  44
#> 107                Kyrgyz Republic    34     60   19  28
#> 108                        Lao PDR    36     60    6  60
#> 109                         Latvia    25     60   13  14
#> 110                        Lebanon    32     60    1  60
#> 111                        Lesotho    60     60    4  60
#> 112                        Liberia    20     60    3  60
#> 113                          Libya    21     60    0  60
#> 114                  Liechtenstein     1     25    0   0
#> 115                      Lithuania    25     59   13  14
#> 116                     Luxembourg    60     60   18   0
#> 117               Macao SAR, China    38     60    0  35
#> 118                     Madagascar    60     60    8  60
#> 119                         Malawi    60     60    4  60
#> 120                       Malaysia    60     60   12  60
#> 121                       Maldives    25     60    3  59
#> 122                           Mali    53     60    4  60
#> 123                          Malta    50     59    8  45
#> 124               Marshall Islands    38      3    0  29
#> 125                     Mauritania    59     60    7  59
#> 126                      Mauritius    44     58    3  60
#> 127                         Mexico    60     60   13  60
#> 128          Micronesia, Fed. Sts.    33     60    2  29
#> 129                        Moldova    25     60   22  28
#> 130                         Monaco    49      0    0   0
#> 131                       Mongolia    39     60   10  44
#> 132                     Montenegro    23     60    4  17
#> 133                        Morocco    54     60    5  60
#> 134                     Mozambique    40     60    4  58
#> 135                        Myanmar    60     60    2  60
#> 136                        Namibia    40     60    3  37
#> 137                          Nauru    16      0    1  39
#> 138                          Nepal    60     60    3  60
#> 139                    Netherlands    60     60   16   0
#> 140                  New Caledonia     0     59    0  39
#> 141                    New Zealand    50     57    0   0
#> 142                      Nicaragua    60     60    6  60
#> 143                          Niger    60     60    6  60
#> 144                        Nigeria    60     60    6  60
#> 145                North Macedonia    30     60   10  27
#> 146       Northern Mariana Islands    17      0    0  41
#> 147                         Norway    60     60   17   0
#> 148                           Oman    55     60    0  51
#> 149                       Pakistan    60     60   12  60
#> 150                          Palau    20      4    0  27
#> 151                         Panama    60     60   25  60
#> 152               Papua New Guinea    60     60    2  57
#> 153                       Paraguay    60     60   16  60
#> 154                           Peru    60     60   23  60
#> 155                    Philippines    60     60    7  60
#> 156                         Poland    30     59   14  15
#> 157                       Portugal    60     59   16   0
#> 158                    Puerto Rico    60     60    0   0
#> 159                          Qatar    20     60    0  38
#> 160                        Romania    30     58   11  15
#> 161             Russian Federation    31     60   24  15
#> 162                         Rwanda    60     60    5  60
#> 163                          Samoa    38     60    3  59
#> 164                     San Marino    22      1    0   0
#> 165          Sao Tome and Principe    19     60    3  50
#> 166                   Saudi Arabia    52     60    0  49
#> 167                        Senegal    60     60    5  60
#> 168                         Serbia    25     22    5  26
#> 169                     Seychelles    60     40    2  58
#> 170                   Sierra Leone    60     60    3  60
#> 171                      Singapore    60     60    0  45
#> 172      Sint Maarten (Dutch part)     0      7    0   0
#> 173                Slovak Republic    28     58   12  15
#> 174                       Slovenia    30     58   10  12
#> 175                Solomon Islands    40     60    2  60
#> 176                        Somalia     0     60    1  60
#> 177                   South Africa    60     60    6  27
#> 178                    South Sudan     8     60    2   9
#> 179                          Spain    60     60   19   0
#> 180                      Sri Lanka    59     60    8  59
#> 181            St. Kitts and Nevis    43      5    0  41
#> 182                      St. Lucia    43     60    2  47
#> 183       St. Martin (French part)     0     38    0   0
#> 184 St. Vincent and the Grenadines    60     60    0  47
#> 185                          Sudan    60     60    2  60
#> 186                       Suriname    60     60    1  60
#> 187                         Sweden    60     60   21   0
#> 188                    Switzerland    50     60   14   0
#> 189           Syrian Arab Republic     0     60    2  60
#> 190                     Tajikistan    35     60    6  28
#> 191                       Tanzania    32     60    5  60
#> 192                       Thailand    60     60   25  60
#> 193                    Timor-Leste    20     60    3  42
#> 194                           Togo    60     60    3  60
#> 195                          Tonga    39     60    3  60
#> 196            Trinidad and Tobago    60     60    2  50
#> 197                        Tunisia    55     60    7  60
#> 198                         Turkey    60     60   14  60
#> 199                   Turkmenistan    32     60    1  28
#> 200       Turks and Caicos Islands     1      0    0  34
#> 201                         Tuvalu    30      0    1  44
#> 202                         Uganda    38     60    9  60
#> 203                        Ukraine    33     60   18  30
#> 204           United Arab Emirates    45     60    2  45
#> 205                 United Kingdom    60     57   21   0
#> 206                  United States    60     55   20   0
#> 207                        Uruguay    60     60   18  58
#> 208                     Uzbekistan    33     60    4  28
#> 209                        Vanuatu    41     60    1  60
#> 210                  Venezuela, RB    55     60   11  60
#> 211                        Vietnam    36     60    9  60
#> 212          Virgin Islands (U.S.)    16     59    0   0
#> 213             West Bank and Gaza    26     30    7  27
#> 214                    Yemen, Rep.    30     60    3  60
#> 215                         Zambia    60     60    9  60
#> 216                       Zimbabwe    60     60    3  58