[Stable]

To understand the HyperGeometric distribution, consider a set of \(r\) objects, of which \(m\) are of the type I and \(n\) are of the type II. A sample with size \(k\) (\(k<r\)) with no replacement is randomly chosen. The number of observed type I elements observed in this sample is set to be our random variable \(X\).

dist_hypergeometric(m, n, k)

Arguments

m

The number of type I elements available.

n

The number of type II elements available.

k

The size of the sample taken.

Details

We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_hypergeometric.html

In the following, let \(X\) be a HyperGeometric random variable with success probability p = \(p = m/(m+n)\).

Support: \(x \in \{\max(0, k-n), \dots, \min(k,m)\}\)

Mean: \(\frac{km}{m+n} = kp\)

Variance: \(\frac{kmn(m+n-k)}{(m+n)^2 (m+n-1)} = kp(1-p)\left(1 - \frac{k-1}{m+n-1}\right)\)

Probability mass function (p.m.f):

$$ P(X = x) = \frac{{m \choose x}{n \choose k-x}}{{m+n \choose k}} $$

Cumulative distribution function (c.d.f):

$$ P(X \le x) = \sum_{i = \max(0, k-n)}^{\lfloor x \rfloor} \frac{{m \choose i}{n \choose k-i}}{{m+n \choose k}} $$

Moment generating function (m.g.f):

$$ E(e^{tX}) = \frac{{m \choose k}}{{m+n \choose k}}{}_2F_1(-m, -k; m+n-k+1; e^t) $$

where \(_2F_1\) is the hypergeometric function.

Skewness:

$$ \frac{(m+n-2k)(m+n-1)^{1/2}(m+n-2n)}{[kmn(m+n-k)]^{1/2}(m+n-2)} $$

Examples

dist <- dist_hypergeometric(m = rep(500, 3), n = c(50, 60, 70), k = c(100, 200, 300))

dist
#> <distribution[3]>
#> [1] Hypergeometric(500, 50, 100) Hypergeometric(500, 60, 200)
#> [3] Hypergeometric(500, 70, 300)
mean(dist)
#> [1]  90.90909 178.57143 263.15789
variance(dist)
#> [1]  6.77415 12.32157 15.33526
skewness(dist)
#> Warning: NAs produced by integer overflow
#> Warning: NAs produced by integer overflow
#> [1] -0.2007751         NA         NA
kurtosis(dist)
#> Warning: NAs produced by integer overflow
#> Warning: NAs produced by integer overflow
#> [1] 2.965375e-15           NA           NA

generate(dist, 10)
#> [[1]]
#>  [1] 89 91 90 93 89 89 93 93 93 89
#> 
#> [[2]]
#>  [1] 179 175 178 180 178 172 175 180 185 180
#> 
#> [[3]]
#>  [1] 262 271 260 267 265 260 260 264 261 259
#> 

density(dist, 2)
#> [1] 0 0 0
density(dist, 2, log = TRUE)
#> [1] -Inf -Inf -Inf

cdf(dist, 4)
#> [1] 0 0 0

quantile(dist, 0.7)
#> [1]  92 180 265