[Stable]

Inflated distributions add extra probability mass at a specific value, most commonly zero (zero-inflation). These distributions are useful for modeling data with excess observations at a particular value compared to what the base distribution would predict. Common applications include zero-inflated Poisson or negative binomial models for count data with many zeros.

dist_inflated(dist, prob, x = 0)

Arguments

dist

The distribution(s) to inflate.

prob

The added probability of observing x.

x

The value to inflate. The default of x = 0 is for zero-inflation.

Details

We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_inflated.html

In the following, let \(Y\) be an inflated random variable based on a base distribution \(X\), with inflation value x = \(c\) and inflation probability prob = \(p\).

Support: Same as the base distribution, but with additional probability mass at \(c\)

Mean: (when x is numeric)

$$ E(Y) = p \cdot c + (1-p) \cdot E(X) $$

Variance: (when x = 0)

$$ \text{Var}(Y) = (1-p) \cdot \text{Var}(X) + p(1-p) \cdot [E(X)]^2 $$

For non-zero inflation values, the variance is not computed in closed form.

Probability mass/density function (p.m.f/p.d.f):

For discrete distributions: $$ f_Y(y) = \begin{cases} p + (1-p) \cdot f_X(c) & \text{if } y = c \\ (1-p) \cdot f_X(y) & \text{if } y \neq c \end{cases} $$

For continuous distributions: $$ f_Y(y) = \begin{cases} p & \text{if } y = c \\ (1-p) \cdot f_X(y) & \text{if } y \neq c \end{cases} $$

Cumulative distribution function (c.d.f):

$$ F_Y(q) = \begin{cases} (1-p) \cdot F_X(q) & \text{if } q < c \\ p + (1-p) \cdot F_X(q) & \text{if } q \geq c \end{cases} $$

Quantile function:

The quantile function is computed numerically by inverting the inflated CDF, accounting for the jump in probability at the inflation point.

Examples

# Zero-inflated Poisson
dist <- dist_inflated(dist_poisson(lambda = 2), prob = 0.3, x = 0)

dist
#> <distribution[1]>
#> [1] 0+Pois(2)
mean(dist)
#> [1] 1.4
variance(dist)
#> [1] 2.24

generate(dist, 10)
#> [[1]]
#>  [1] 4 4 0 1 4 0 3 0 1 3
#> 

density(dist, 0)
#> [1] 0.3947347
density(dist, 1)
#> [1] 0.1894694

cdf(dist, 2)
#> [1] 0.7736735

quantile(dist, 0.5)
#> [1] 1