[Stable]

The sampling distribution represents an empirical distribution based on observed samples. It is useful for bootstrapping, representing posterior distributions from Markov Chain Monte Carlo (MCMC) algorithms, or working with any empirical data where the parametric form is unknown. Unlike parametric distributions, the sampling distribution makes no assumptions about the underlying data-generating process and instead uses the sample itself to estimate distributional properties. The distribution can handle both univariate and multivariate samples.

dist_sample(x)

Arguments

x

A list of sampled values. For univariate distributions, each element should be a numeric vector. For multivariate distributions, each element should be a matrix where columns represent variables and rows represent observations.

Details

We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_sample.html

In the following, let \(X\) be a random variable with sample \(x_1, x_2, \ldots, x_n\) of size \(n\).

Support: The observed range of the sample

Mean (univariate):

$$ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i $$

Mean (multivariate): Computed independently for each variable.

Variance (univariate):

$$ s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2 $$

Covariance (multivariate): The sample covariance matrix.

Skewness (univariate):

$$ g_1 = \frac{\sqrt{n} \sum_{i=1}^{n} (x_i - \bar{x})^3}{\left(\sum_{i=1}^{n} (x_i - \bar{x})^2\right)^{3/2}} \left(1 - \frac{1}{n}\right)^{3/2} $$

Probability density function: Approximated numerically using kernel density estimation.

Cumulative distribution function (univariate):

$$ F(q) = \frac{1}{n} \sum_{i=1}^{n} I(x_i \leq q) $$

where \(I(\cdot)\) is the indicator function.

Cumulative distribution function (multivariate):

$$ F(\mathbf{q}) = \frac{1}{n} \sum_{i=1}^{n} I(\mathbf{x}_i \leq \mathbf{q}) $$

where the inequality is applied element-wise.

Quantile function (univariate): The sample quantile, computed using the specified quantile type (see stats::quantile()).

Quantile function (multivariate): Marginal quantiles are computed independently for each variable.

Random generation: Bootstrap sampling with replacement from the empirical sample.

Examples

# Univariate numeric samples
dist <- dist_sample(x = list(rnorm(100), rnorm(100, 10)))

dist
#> <distribution[2]>
#> [1] sample[100] sample[100]
mean(dist)
#> [1]  0.01345586 10.13720518
variance(dist)
#> [1] 0.8294004 0.7518262
skewness(dist)
#> [1] -0.3366961  0.1095195
generate(dist, 10)
#> [[1]]
#>  [1]  0.5878485 -0.2718239  0.8241558 -0.3137043  0.4600684  1.3781358
#>  [7]  0.7144907 -0.8420588  0.5264135  0.8136760
#> 
#> [[2]]
#>  [1]  9.299931 10.378420  9.833146 10.166176  9.706893  8.950290 11.299751
#>  [8] 11.860932  8.599088 11.445015
#> 

density(dist, 1)
#> [1] 0.2921889 0.0000000

# Multivariate numeric samples
dist <- dist_sample(x = list(cbind(rnorm(100), rnorm(100, 10))))
dimnames(dist) <- c("x", "y")

dist
#> <distribution[1]>
#> [1] sample[100]
mean(dist)
#>               x        y
#> [1,] -0.1797562 10.01306
variance(dist)
#>               x          y
#> [1,] 0.80303845 0.01232998
#> [2,] 0.01232998 0.93228142
generate(dist, 10)
#> [[1]]
#>                  x         y
#>  [1,] -0.594179494 10.537981
#>  [2,] -0.613394724  8.629803
#>  [3,]  0.515621994  9.747819
#>  [4,]  0.890362697 10.119178
#>  [5,] -0.001044417 10.569577
#>  [6,] -0.069811729 10.403368
#>  [7,] -0.067205397  8.859516
#>  [8,] -0.822696777  9.876365
#>  [9,] -1.037365350 11.626562
#> [10,] -1.566192019  7.816245
#> 
quantile(dist, 0.4) # Returns the marginal quantiles
#>               x        y
#> [1,] -0.3278864 9.741993
cdf(dist, matrix(c(0.3,9), nrow = 1))
#> [1] 0.435