Summary of Statistical Distribution#

Useful function#

Power function#

Real functions of the form \(f(x)=x^{a}\)

Exponential function#

\(f(x) = a^{x}\), for example, \(f(x) = e^x\).

Logrithm function#

\(log_b^y = x\)

Gamma function#

\(\Gamma(\alpha) = \int^{\infty}_0 t^{\alpha - 1} e^{-t} dt\)

Univariate Distribution relationship#

Chart for univariate distribution relationship

Relationship between discrete and continuous distribution#

Table 1. Relationship abbreviation#

Discrete

Continuous

shorthand

Binomial

Poisson

BP

Negative Binomial

Gamma

NG

Geometric

Exponential

GE

Continuous distribution#

Exponential distribution#

The probability density function (pdf) of an exponential distribution is

\[ f(x;\lambda) = \left\{ \begin{array}{ @{}% no padding l@{\quad}% some padding r@{}% no padding >{{}}r@{}% no padding >{{}}l@{}% no padding } \lambda e^{ -\lambda x } & x \geq 0 \\ 0 & x < 0 \end{array} \right. \]

Or,

\[ f(x;\lambda) = \left\{ \begin{array}{ @{}% no padding l@{\quad}% some padding r@{}% no padding >{{}}r@{}% no padding >{{}}l@{}% no padding } \frac{1}{\beta} e^{ - x/\beta} & x \geq 0 \\ 0 & x < 0 \end{array} \right. \]

where \(\lambda\) is rate parameter, \(\beta\) is scale parameter.

Gamma distribution#

Let \(t = \beta x\),

the probability density function is

\[ f(x;\lambda) = \left\{ \begin{array}{ @{}% no padding l@{\quad}% some padding r@{}% no padding >{{}}r@{}% no padding >{{}}l@{}% no padding } \frac{\beta^{\alpha}}{\Gamma{(\alpha)}} x^{\alpha -1} e^{-\beta x} & x \geq 0 \\ 0 & x < 0 \end{array} \right. \]

where \(\alpha\) is shape parameter and \(\beta\) is rate parameter.

Normal distribution#

The derivative of normal distribution (Ref: Tim )#

Suppose I throw a dart into a dartboard. I aim at the centre of the board (0,0) but I’m not all that good with darts so the dart lands in a random position (X,Y) which has a joint density function \(f:\mathbb R^2\to\mathbb R^+\). Let’s make two assumptions about the way I play darts.

  • The density is rotationally invariant so the distribution of where my dart lands only depends on the distance of the dart to the centre.

  • The random variables \(X\) and \(Y\) are independent, how much I miss left and right makes no difference to the distribution of how much I miss up and down.

So by assumption one and Pythagoras I must be able to express the density

\[f(x,y) = g(x^2 + y^2)\]

Now as the random variable \(X\) and \(Y\) are independent and identically distributed I must be able to express

\[f(x,y) \propto f(x,0) f(0,y)\]

Combining these assumptions we get that for every pair (x,y), we have

\[g(x^2 + y^2) \propto g(x^2) g(y^2)\]

This means that \(g\) must be an exponential function

\[g(t) = Ae^{-Bt}\]

So \(A\) will be some normalising constant. \(B\) somehow reflects the units I’m measuring in. ( So if I measure the distance in cm \(B\) will be 10 times as big as if I measured in mm). \(B\) must be negative because the density should be a decreasing function of distance(I’m not that bad at darts).

So to work out \(A\), I need to integrate \(f(.,.)\) over \(\mathbb{R}^2\) a quick change of coordinates and

\[\iint_{\mathbb{R}} f(x,y)dxdy = 2\pi \int^{\infty}_{0} tg(t)dt=\frac{2\pi}{B^2}\]

So we should set \(A = \frac{B^2}{2\pi}\) it’s convenient to choose \(B\) in terms of the standart deviation, so we set \(B = \frac{1}{2\sigma}\) and \(A = \frac{1}{2\pi\sigma^2}\).

So if I set \(\tilde{f}(x) = \frac{1}{\sqrt(2\pi)\sigma}e^{-\frac{x^2}{2\sigma}}\) then \(f(x,y) = \tilde{f}(x) \tilde{f}(y)\).

The \(e\) comes from the fact I wanted my \(X\) and \(Y\) coordinates to be independent and the \(\pi\) comes from the fact that I wanted rotational invariance so I’m integrating over a circle.

The interesting thing happens if I throw two darts. Suppose I throw my first dart aiming at (0,0) which lands at (\(X_1\), \(Y_1\)), I aim my next dart at the first dart, so this one lands at (\(X_2\),:math:Y_2) with \(X_2=X_1 + X\) and \(X_2=Y_1+Y\).

So the position of the second dart is the sum of the two errors. But my sum is still rotationally invariant and the variables \(X_2\) and \(Y_2\) are still independent, so (\(X_2\), \(Y_2\)) satisfies my two assumptions.

That means that when I add independent normal distributions together I get another normal distribution.

It’s this property that makes it so useful, because if I take the average of a very long sequence of random variables I should get something that’s the same shape no matter how long my sequence is and taking a sequence twice as long is like adding the two sequences together. It’s this property of the normal distribution that makes it so useful.

\(\chi^2\) distribution#

The \(\chi^2\) distribution is a special case of the gamma distribution, as a \(\chi^2\) random variable with \(n\) degrees of freedom follows a gamma distribution with shape parameter \(\alpha = n/2\) and \(\beta = 1/2\), namely, \(\chi^2 \sim Gamma(n/2,1/2)\), giving the density function as

\[ f(x;\alpha,\beta) = \left\{ \begin{array}{ @{}% no padding l@{\quad}% some padding r@{}% no padding >{{}}r@{}% no padding >{{}}l@{}% no padding } \frac{2^{-n/2}}{\Gamma{(n/2)}} x^{n/2 -1} e^{-1/2 x} & x\geq 0 \\ 0 & x < 0 \end{array} \right. \]

The relationship between normal and \(\chi^2\) distribution#

Firstly, we need prove \(\Gamma(1/2) = \sqrt{\pi}\) (Ref: The Gamma Function)

\[ \begin{align} \Gamma(1/2)^2 &= \left(\int_0^{\infty}t^{-1/2}e^{-t}\,dt\right)^2 \\ &= \left(2\int_0^{\infty}e^{-x^2}\,dx\right)^2 && \text{let $t = x^2$} \\ &= \left(\int_{-\infty}^{\infty}e^{-x^2}\,dx\right)^2 && \text{from (0,$\infty$) to (-$\infty$,+$\infty$)} \\ &= \left(\int_{-\infty}^{\infty}e^{-x^2}\,dx\right)\left(\int_{-\infty}^{\infty}e^{-y^2}\,dy\right) \\ &= \int_{-\infty}^{\infty}\int_{-\infty}^{\infty}e^{-x^2}e^{-y^2}\,dxdy \\ &= \int_{-\infty}^{\infty}\int_{-\infty}^{\infty}e^{-(x^2+y^2)}\,dxdy \\ &= \int_{0}^{2\pi}\int_{0}^{\infty}e^{-r^2}\,rdrd\theta && \text{let $x= r \cdot \cos(\theta)$ and $y= r \cdot \sin(\theta) $} \\ &= \frac12\int_{0}^{2\pi}\int_{0}^{\infty}e^{-u}\,dud\theta \\ &= \frac12\int_{0}^{2\pi}\left.-e^{-u}\right|_0^{\infty}\,d\theta \\ &= \frac12\int_{0}^{2\pi}\,d\theta \\ &= \pi \end{align} \]

Secondly, we need to find the pdf of \(X^2\) (Ref: Normal to Chi)

If \(X \sim \mathcal{N}(0,1)\), then the pdf of \(X\) is

\[ \begin{array}{ @{}% no padding l@{\quad}% some padding r@{}% no padding >{{}}r@{}% no padding >{{}}l@{}% no padding } \varphi(x) = \frac{1}{\sqrt{2\pi}} e^{-x^2/2} \end{array} \]

Let \(f\) be the pdf of \(X^2\). Then,

\[ \begin{align} f(x) & = \frac{d}{dx} \Pr(X^2 \le x) \\ &= \frac{d}{dx} \Pr(-\sqrt{x}\le X\le\sqrt{x}) \\ & = \frac{d}{dx} \frac{1}{\sqrt{2\pi}} \int_{-\sqrt{x}}^{\sqrt{x}} e^{-u^2/2} \;du \\ & = \frac{2}{\sqrt{2\pi}}\frac{d}{dx} \int_0^{\sqrt{x}} e^{-u^2/2} \;du \\ & = \frac{2}{\sqrt{2\pi}} e^{-\sqrt{x}^2/2} \frac{d}{dx} \sqrt{x} = \frac{2}{\sqrt{2\pi}} e^{-x/2} \frac{1}{2\sqrt{x}} = \frac{e^{-x/2}}{\sqrt{2\pi x}} \\ & = \frac{2^{-1/2}}{\Gamma{(1/2)}} x^{\frac12 - 1}e^{-x/2} \sim Gamma(1/2,1/2) \end{align} \]

Noncentral \(\chi^2\) distribution#

Wishart distribution#

Discrete distribution#

Bernoulli distribution \(X \sim Bernoulli(p)\)#

Bernoulli distribution is used to indicate that the random variable \(X\) has the Bernoulli distribution with parameter \(p\), where \(0 < p < 1\). A Bernoulli random variable \(X\) with success probability \(p\) has probability mass function

\[ f(x) = \left\{ \begin{array}{ @{}% no padding l@{\quad}% some padding r@{}% no padding >{{}}r@{}% no padding >{{}}l@{}% no padding } p , & x = 1 \\ 1- p , & x = 0 \end{array} \right. \]

where \(0 < p < 1\). The Bernoulli distribution is associated with the notion of a \(Bernoulli \quad trial\), which is an experiment with two outcomes, generically referred to as \(success\) (x = 1) and \(failure\) (x = 0).

Binomial distribution \(X \sim Binomial(n,p)\)#

The binomial distribution models the number of successes in \(n\) mutually independent Bernoulli trials, each with probability of success \(p\). The random variable \(X \sim binomial(n,p)\) has probability mass function

Possion distribution#

A Poisson random variable \(X\) with scale parameter \(\mu\) has probability mass function

The Poisson distribution can be used to approximate the binomial distribution when \(n\) is large and \(p\) is small. Besides, it can be also used to model the number of events in an interval associated with a process that evolves randomly over space or time. Applications include the number of typographical errors in a book, the number of customers arriving in an hour.

P value distribution#

ordered statistics, QQ plot