Fast estimation of allele and genotype frequencies under Hardy-Weinberg equilibrium

Alleles are assumed to be numerated from 1 and up with no missing label. Thus if the largest value in either allele1 or allele2 is K then we assume that there can be at least K possible alleles. Genotypes are sorted such the the smallest allele comes first, i.e., 2x1 -> 1x2, and 2x3 -> 2x3

hwe_frequencies(allele1, allele2, min_alleles = 0L)

Arguments

allele1: An integer vector (starting with values 1 upwards) of first alleles
allele2: An integer vector (starting with values 1 upwards) of second alleles
min_alleles: A minimum number of unique alleles available

Value

A list with three variables: allele_freq for estimated allele frequencies, genotype_freq for estimated genotype_frequencies (under HWE assumption), obs_genotype is the frequency of the genotypes, available_genotypes is the number of available genotypes used for the estimation, and unique_alleles is the number of unique alleles (matches the length of allele_freq)

Author

Claus Ekstrom <claus@rprimer.dk>

Examples

al1 <- sample(1:5, size=1000, replace=TRUE, prob=c(.4, .2, .2, .1, .1))
al2 <- sample(1:5, size=1000, replace=TRUE, prob=c(.4, .2, .2, .1, .1))
hwe_frequencies(al1, al2)
#> $allele_freq
#> [1] 0.4150 0.2030 0.1835 0.1015 0.0970
#> 
#> $genotype_freq
#>  [1] 0.17222500 0.16849000 0.15230500 0.08424500 0.08051000 0.04120900
#>  [7] 0.07450100 0.04120900 0.03938200 0.03367225 0.03725050 0.03559900
#> [13] 0.01030225 0.01969100 0.00940900
#> 
#> $obs_genotype
#>  [1] 162 178 154  89  85  37  83  39  32  31  35  33   9  22  11
#> 
#> $available_genotypes
#> [1] 1000
#> 
#> $unique_alleles
#> [1] 5
#>