To understand the HyperGeometric distribution, consider a set of $$r$$ objects, of which $$m$$ are of the type I and $$n$$ are of the type II. A sample with size $$k$$ ($$k<r$$) with no replacement is randomly chosen. The number of observed type I elements observed in this sample is set to be our random variable $$X$$.

dist_hypergeometric(m, n, k)

Arguments

m

The number of type I elements available.

n

The number of type II elements available.

k

The size of the sample taken.

Details

We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.

In the following, let $$X$$ be a HyperGeometric random variable with success probability p = $$p = m/(m+n)$$.

Support: $$x \in { \{\max{(0, k-n)}, \dots, \min{(k,m)}}\}$$

Mean: $$\frac{km}{n+m} = kp$$

Variance: $$\frac{km(n)(n+m-k)}{(n+m)^2 (n+m-1)} = kp(1-p)(1 - \frac{k-1}{m+n-1})$$

Probability mass function (p.m.f):

$$P(X = x) = \frac{{m \choose x}{n \choose k-x}}{{m+n \choose k}}$$

Cumulative distribution function (c.d.f):

$$P(X \le k) \approx \Phi\Big(\frac{x - kp}{\sqrt{kp(1-p)}}\Big)$$

Examples

dist <- dist_hypergeometric(m = rep(500, 3), n = c(50, 60, 70), k = c(100, 200, 300))

dist
#> <distribution[3]>
#> [1] Hypergeometric(500, 50, 100) Hypergeometric(500, 60, 200)
#> [3] Hypergeometric(500, 70, 300)
mean(dist)
#> [1]  90.90909 178.57143 263.15789
variance(dist)
#> [1]  6.77415 12.32157 15.33526
skewness(dist)
#> Warning: NAs produced by integer overflow
#> Warning: NAs produced by integer overflow
#> [1] -0.2007751         NA         NA
kurtosis(dist)
#> Warning: NAs produced by integer overflow
#> Warning: NAs produced by integer overflow
#> [1] 2.965375e-15           NA           NA

generate(dist, 10)
#> [[1]]
#>  [1] 88 91 91 90 92 93 92 88 89 92
#>
#> [[2]]
#>  [1] 184 176 184 180 178 173 179 172 179 183
#>
#> [[3]]
#>  [1] 263 267 266 260 261 266 266 261 267 259
#>

density(dist, 2)
#> [1] 0 0 0
density(dist, 2, log = TRUE)
#> [1] -Inf -Inf -Inf

cdf(dist, 4)
#> [1] 0 0 0

quantile(dist, 0.7)
#> [1]  92 180 265