A mixture distribution combines multiple component distributions with specified weights. The resulting distribution can model complex, multimodal data by representing it as a weighted sum of simpler distributions.
dist_mixture(..., weights = numeric())Distributions to be used in the mixture. Can be any distributional objects.
A numeric vector of non-negative weights that sum to 1.
The length must match the number of distributions passed to ....
Each weight \(w_i\) represents the probability that a random draw
comes from the \(i\)-th component distribution.
In the following, let \(X\) be a mixture random variable composed of \(K\) component distributions \(F_1, F_2, \ldots, F_K\) with corresponding weights \(w_1, w_2, \ldots, w_K\) where \(\sum_{i=1}^K w_i = 1\) and \(w_i \geq 0\) for all \(i\).
Support: The union of the supports of all component distributions
Mean:
For univariate mixtures: $$ E(X) = \sum_{i=1}^K w_i \mu_i $$
where \(\mu_i\) is the mean of the \(i\)-th component distribution.
For multivariate mixtures: $$ E(\mathbf{X}) = \sum_{i=1}^K w_i \boldsymbol{\mu}_i $$
where \(\boldsymbol{\mu}_i\) is the mean vector of the \(i\)-th component distribution.
Variance:
For univariate mixtures: $$ \text{Var}(X) = \sum_{i=1}^K w_i (\mu_i^2 + \sigma_i^2) - \left(\sum_{i=1}^K w_i \mu_i\right)^2 $$
where \(\sigma_i^2\) is the variance of the \(i\)-th component distribution.
Covariance:
For multivariate mixtures: $$ \text{Cov}(\mathbf{X}) = \sum_{i=1}^K w_i \left[ (\boldsymbol{\mu}_i - \bar{\boldsymbol{\mu}})(\boldsymbol{\mu}_i - \bar{\boldsymbol{\mu}})^T + \boldsymbol{\Sigma}_i \right] $$
where \(\bar{\boldsymbol{\mu}} = \sum_{i=1}^K w_i \boldsymbol{\mu}_i\) is the overall mean vector and \(\boldsymbol{\Sigma}_i\) is the covariance matrix of the \(i\)-th component distribution.
Probability density/mass function (p.d.f/p.m.f):
$$ f(x) = \sum_{i=1}^K w_i f_i(x) $$
where \(f_i(x)\) is the density or mass function of the \(i\)-th component distribution.
Cumulative distribution function (c.d.f):
For univariate mixtures: $$ F(x) = \sum_{i=1}^K w_i F_i(x) $$
where \(F_i(x)\) is the c.d.f. of the \(i\)-th component distribution.
For multivariate mixtures, the c.d.f. is approximated numerically.
Quantile function:
For univariate mixtures, the quantile function has no closed form
and is computed numerically by inverting the c.d.f. using root-finding
(stats::uniroot()).
For multivariate mixtures, quantiles are not yet implemented.
# Univariate mixture of two normal distributions
dist <- dist_mixture(dist_normal(0, 1), dist_normal(5, 2), weights = c(0.3, 0.7))
dist
#> <distribution[1]>
#> [1] mixture(0.3*N(0, 1), 0.7*N(5, 4))
mean(dist)
#> [1] 3.5
variance(dist)
#> [1] 8.35
density(dist, 2)
#> [1] 0.06152845
cdf(dist, 2)
#> [1] 0.33994
quantile(dist, 0.5)
#> [1] 3.868233
generate(dist, 10)
#> [[1]]
#> [1] 4.6444025 4.3243424 8.7675729 0.8420943 4.7689489 2.8547610 5.3473468
#> [8] 6.2028003 5.1936014 0.8386283
#>