Fast estimation of allele and genotype frequencies under Hardy-Weinberg equilibrium

Alleles are assumed to be numerated from 1 and up with no missing label. Thus if the largest value in either allele1 or allele2 is K then we assume that there can be at least K possible alleles. Genotypes are sorted such the the smallest allele comes first, i.e., 2x1 -> 1x2, and 2x3 -> 2x3

hwe_frequencies(allele1, allele2, min_alleles = 0L)

Arguments

allele1: An integer vector (starting with values 1 upwards) of first alleles
allele2: An integer vector (starting with values 1 upwards) of second alleles
min_alleles: A minimum number of unique alleles available

Value

A list with three variables: allele_freq for estimated allele frequencies, genotype_freq for estimated genotype_frequencies (under HWE assumption), obs_genotype is the frequency of the genotypes, available_genotypes is the number of available genotypes used for the estimation, and unique_alleles is the number of unique alleles (matches the length of allele_freq)

Author

Claus Ekstrom <claus@rprimer.dk>

Examples

al1 <- sample(1:5, size=1000, replace=TRUE, prob=c(.4, .2, .2, .1, .1))
al2 <- sample(1:5, size=1000, replace=TRUE, prob=c(.4, .2, .2, .1, .1))
hwe_frequencies(al1, al2)
#> $allele_freq
#> [1] 0.4145 0.2020 0.1835 0.1020 0.0980
#> 
#> $genotype_freq
#>  [1] 0.17181025 0.16745800 0.15212150 0.08455800 0.08124200 0.04080400
#>  [7] 0.07413400 0.04120800 0.03959200 0.03367225 0.03743400 0.03596600
#> [13] 0.01040400 0.01999200 0.00960400
#> 
#> $obs_genotype
#>  [1] 164 176 151  89  85  37  82  40  32  32  36  34   9  21  12
#> 
#> $available_genotypes
#> [1] 1000
#> 
#> $unique_alleles
#> [1] 5
#>