[Stable]

The sampling distribution represents an empirical distribution based on observed samples. It is useful for bootstrapping, representing posterior distributions from Markov Chain Monte Carlo (MCMC) algorithms, or working with any empirical data where the parametric form is unknown. Unlike parametric distributions, the sampling distribution makes no assumptions about the underlying data-generating process and instead uses the sample itself to estimate distributional properties. The distribution can handle both univariate and multivariate samples.

dist_sample(x)

Arguments

x

A list of sampled values. For univariate distributions, each element should be a numeric vector. For multivariate distributions, each element should be a matrix where columns represent variables and rows represent observations.

Details

We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_sample.html

In the following, let \(X\) be a random variable with sample \(x_1, x_2, \ldots, x_n\) of size \(n\).

Support: The observed range of the sample

Mean (univariate):

$$ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i $$

Mean (multivariate): Computed independently for each variable.

Variance (univariate):

$$ s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2 $$

Covariance (multivariate): The sample covariance matrix.

Skewness (univariate):

$$ g_1 = \frac{\sqrt{n} \sum_{i=1}^{n} (x_i - \bar{x})^3}{\left(\sum_{i=1}^{n} (x_i - \bar{x})^2\right)^{3/2}} \left(1 - \frac{1}{n}\right)^{3/2} $$

Probability density function: Approximated numerically using kernel density estimation.

Cumulative distribution function (univariate):

$$ F(q) = \frac{1}{n} \sum_{i=1}^{n} I(x_i \leq q) $$

where \(I(\cdot)\) is the indicator function.

Cumulative distribution function (multivariate):

$$ F(\mathbf{q}) = \frac{1}{n} \sum_{i=1}^{n} I(\mathbf{x}_i \leq \mathbf{q}) $$

where the inequality is applied element-wise.

Quantile function (univariate): The sample quantile, computed using the specified quantile type (see stats::quantile()).

Quantile function (multivariate): Marginal quantiles are computed independently for each variable.

Random generation: Bootstrap sampling with replacement from the empirical sample.

Examples

# Univariate numeric samples
dist <- dist_sample(x = list(rnorm(100), rnorm(100, 10)))

dist
#> <distribution[2]>
#> [1] sample[100] sample[100]
mean(dist)
#> [1]  0.04102104 10.01781389
variance(dist)
#> [1] 1.0150855 0.7951853
skewness(dist)
#> [1] -0.02619728  0.16954650
generate(dist, 10)
#> [[1]]
#>  [1]  0.73367431 -0.15330647 -0.93636236  0.33369602 -1.35048647  0.03397584
#>  [7]  0.73367431  0.61625003 -0.46928925 -0.18927626
#> 
#> [[2]]
#>  [1]  9.706107 10.201138  9.709342 10.634419  8.989626 10.334429  7.865506
#>  [8]  9.425060  9.926029  9.733883
#> 

density(dist, 1)
#> [1] 0.2422892 0.0000000

# Multivariate numeric samples
dist <- dist_sample(x = list(cbind(rnorm(100), rnorm(100, 10))))
dimnames(dist) <- c("x", "y")

dist
#> <distribution[1]>
#> [1] sample[100]
mean(dist)
#>               x        y
#> [1,] -0.1650622 10.12963
variance(dist)
#>                x           y
#> [1,]  0.96573399 -0.09456499
#> [2,] -0.09456499  0.79800852
generate(dist, 10)
#> [[1]]
#>                x         y
#>  [1,] -0.1220865  9.046134
#>  [2,]  0.4200073 10.908468
#>  [3,] -0.7543786  9.650031
#>  [4,] -1.5528671 11.984095
#>  [5,]  0.2383209 10.418310
#>  [6,]  0.0749853 11.117084
#>  [7,] -0.4290761  8.801444
#>  [8,] -2.1155753 11.666273
#>  [9,]  0.6734014  9.745097
#> [10,] -0.9425496  9.889441
#> 
quantile(dist, 0.4) # Returns the marginal quantiles
#>               x        y
#> [1,] -0.3804629 9.830477
cdf(dist, matrix(c(0.3,9), nrow = 1))
#> [1] 0.395