[Maturing]

A mixture distribution combines multiple component distributions with specified weights. The resulting distribution can model complex, multimodal data by representing it as a weighted sum of simpler distributions.

dist_mixture(..., weights = numeric())

Arguments

...

Distributions to be used in the mixture. Can be any distributional objects.

weights

A numeric vector of non-negative weights that sum to 1. The length must match the number of distributions passed to .... Each weight \(w_i\) represents the probability that a random draw comes from the \(i\)-th component distribution.

Details

In the following, let \(X\) be a mixture random variable composed of \(K\) component distributions \(F_1, F_2, \ldots, F_K\) with corresponding weights \(w_1, w_2, \ldots, w_K\) where \(\sum_{i=1}^K w_i = 1\) and \(w_i \geq 0\) for all \(i\).

Support: The union of the supports of all component distributions

Mean:

For univariate mixtures: $$ E(X) = \sum_{i=1}^K w_i \mu_i $$

where \(\mu_i\) is the mean of the \(i\)-th component distribution.

For multivariate mixtures: $$ E(\mathbf{X}) = \sum_{i=1}^K w_i \boldsymbol{\mu}_i $$

where \(\boldsymbol{\mu}_i\) is the mean vector of the \(i\)-th component distribution.

Variance:

For univariate mixtures: $$ \text{Var}(X) = \sum_{i=1}^K w_i (\mu_i^2 + \sigma_i^2) - \left(\sum_{i=1}^K w_i \mu_i\right)^2 $$

where \(\sigma_i^2\) is the variance of the \(i\)-th component distribution.

Covariance:

For multivariate mixtures: $$ \text{Cov}(\mathbf{X}) = \sum_{i=1}^K w_i \left[ (\boldsymbol{\mu}_i - \bar{\boldsymbol{\mu}})(\boldsymbol{\mu}_i - \bar{\boldsymbol{\mu}})^T + \boldsymbol{\Sigma}_i \right] $$

where \(\bar{\boldsymbol{\mu}} = \sum_{i=1}^K w_i \boldsymbol{\mu}_i\) is the overall mean vector and \(\boldsymbol{\Sigma}_i\) is the covariance matrix of the \(i\)-th component distribution.

Probability density/mass function (p.d.f/p.m.f):

$$ f(x) = \sum_{i=1}^K w_i f_i(x) $$

where \(f_i(x)\) is the density or mass function of the \(i\)-th component distribution.

Cumulative distribution function (c.d.f):

For univariate mixtures: $$ F(x) = \sum_{i=1}^K w_i F_i(x) $$

where \(F_i(x)\) is the c.d.f. of the \(i\)-th component distribution.

For multivariate mixtures, the c.d.f. is approximated numerically.

Quantile function:

For univariate mixtures, the quantile function has no closed form and is computed numerically by inverting the c.d.f. using root-finding (stats::uniroot()).

For multivariate mixtures, quantiles are not yet implemented.

Examples

# Univariate mixture of two normal distributions
dist <- dist_mixture(dist_normal(0, 1), dist_normal(5, 2), weights = c(0.3, 0.7))
dist
#> <distribution[1]>
#> [1] mixture(0.3*N(0, 1), 0.7*N(5, 4))

mean(dist)
#> [1] 3.5
variance(dist)
#> [1] 8.35

density(dist, 2)
#> [1] 0.06152845
cdf(dist, 2)
#> [1] 0.33994
quantile(dist, 0.5)
#> [1] 3.868233

generate(dist, 10)
#> [[1]]
#>  [1] 4.6444025 4.3243424 8.7675729 0.8420943 4.7689489 2.8547610 5.3473468
#>  [8] 6.2028003 5.1936014 0.8386283
#>