The Normal distribution is ubiquitous in statistics, partially because of the central limit theorem, which states that sums of i.i.d. random variables eventually become Normal. Linear transformations of Normal random variables result in new random variables that are also Normal. If you are taking an intro stats course, you'll likely use the Normal distribution for Z-tests and in simple linear regression. Under regularity conditions, maximum likelihood estimators are asymptotically Normal. The Normal distribution is also called the gaussian distribution.
dist_normal(mu = 0, sigma = 1, mean = mu, sd = sigma)
The mean (location parameter) of the distribution, which is also the mean of the distribution. Can be any real number.
The standard deviation (scale parameter) of the distribution. Can be any positive number. If you would like a Normal distribution with variance \(\sigma^2\), be sure to take the square root, as this is a common source of errors.
We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.
In the following, let \(X\) be a Normal random variable with mean
mu
= \(\mu\) and standard deviation sigma
= \(\sigma\).
Support: \(R\), the set of all real numbers
Mean: \(\mu\)
Variance: \(\sigma^2\)
Probability density function (p.d.f):
$$ f(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-(x - \mu)^2 / 2 \sigma^2} $$
Cumulative distribution function (c.d.f):
The cumulative distribution function has the form
$$ F(t) = \int_{-\infty}^t \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-(x - \mu)^2 / 2 \sigma^2} dx $$
but this integral does not have a closed form solution and must be approximated numerically. The c.d.f. of a standard Normal is sometimes called the "error function". The notation \(\Phi(t)\) also stands for the c.d.f. of a standard Normal evaluated at \(t\). Z-tables list the value of \(\Phi(t)\) for various \(t\).
Moment generating function (m.g.f):
$$ E(e^{tX}) = e^{\mu t + \sigma^2 t^2 / 2} $$
dist <- dist_normal(mu = 1:5, sigma = 3)
dist
#> <distribution[5]>
#> [1] N(1, 9) N(2, 9) N(3, 9) N(4, 9) N(5, 9)
mean(dist)
#> [1] 1 2 3 4 5
variance(dist)
#> [1] 9 9 9 9 9
skewness(dist)
#> [1] 0 0 0 0 0
kurtosis(dist)
#> [1] 0 0 0 0 0
generate(dist, 10)
#> [[1]]
#> [1] 0.9031498 -1.1569428 -2.3496840 -1.3408097 -4.3308756 -0.2835046
#> [7] -5.0930810 9.2522943 5.5410092 1.1019668
#>
#> [[2]]
#> [1] 4.2223784 -1.8124245 1.5093073 0.1671635 5.7556526 2.7509584
#> [7] -3.1167450 -0.5662394 1.5652951 1.0266591
#>
#> [[3]]
#> [1] 2.4823053 -0.7081888 -2.7069126 2.7164879 3.0976674 4.3838703
#> [7] 7.1442009 1.7505712 5.0428280 1.7568809
#>
#> [[4]]
#> [1] 2.44496347 1.94794081 1.34305420 4.14771120 4.55668365 2.17402663
#> [7] 1.80669144 12.14543264 -0.01816112 2.06195425
#>
#> [[5]]
#> [1] 2.2026362 2.6927392 6.1147393 6.0662983 2.0480045 5.6441888 4.7597448
#> [8] 0.7328914 8.3244469 8.2354329
#>
density(dist, 2)
#> [1] 0.12579441 0.13298076 0.12579441 0.10648267 0.08065691
density(dist, 2, log = TRUE)
#> [1] -2.073106 -2.017551 -2.073106 -2.239773 -2.517551
cdf(dist, 4)
#> [1] 0.8413447 0.7475075 0.6305587 0.5000000 0.3694413
quantile(dist, 0.7)
#> [1] 2.573202 3.573202 4.573202 5.573202 6.573202