Summary of Statistical Distribution#
Useful function#
Power function#
Real functions of the form \(f(x)=x^{a}\)
Exponential function#
\(f(x) = a^{x}\), for example, \(f(x) = e^x\).
Logrithm function#
\(log_b^y = x\)
Gamma function#
\(\Gamma(\alpha) = \int^{\infty}_0 t^{\alpha - 1} e^{-t} dt\)
Univariate Distribution relationship#
Relationship between discrete and continuous distribution#
Table 1. Relationship abbreviation#
Discrete |
Continuous |
shorthand |
---|---|---|
Binomial |
Poisson |
BP |
Negative Binomial |
Gamma |
NG |
Geometric |
Exponential |
GE |
Continuous distribution#
Exponential distribution#
The probability density function (pdf) of an exponential distribution is
Or,
where \(\lambda\) is rate parameter, \(\beta\) is scale parameter.
Gamma distribution#
Let \(t = \beta x\),
the probability density function is
where \(\alpha\) is shape parameter and \(\beta\) is rate parameter.
Normal distribution#
The derivative of normal distribution (Ref: Tim )#
Suppose I throw a dart into a dartboard. I aim at the centre of the board (0,0) but I’m not all that good with darts so the dart lands in a random position (X,Y) which has a joint density function \(f:\mathbb R^2\to\mathbb R^+\). Let’s make two assumptions about the way I play darts.
The density is rotationally invariant so the distribution of where my dart lands only depends on the distance of the dart to the centre.
The random variables \(X\) and \(Y\) are independent, how much I miss left and right makes no difference to the distribution of how much I miss up and down.
So by assumption one and Pythagoras I must be able to express the density
Now as the random variable \(X\) and \(Y\) are independent and identically distributed I must be able to express
Combining these assumptions we get that for every pair (x,y), we have
This means that \(g\) must be an exponential function
So \(A\) will be some normalising constant. \(B\) somehow reflects the units I’m measuring in. ( So if I measure the distance in cm \(B\) will be 10 times as big as if I measured in mm). \(B\) must be negative because the density should be a decreasing function of distance(I’m not that bad at darts).
So to work out \(A\), I need to integrate \(f(.,.)\) over \(\mathbb{R}^2\) a quick change of coordinates and
So we should set \(A = \frac{B^2}{2\pi}\) it’s convenient to choose \(B\) in terms of the standart deviation, so we set \(B = \frac{1}{2\sigma}\) and \(A = \frac{1}{2\pi\sigma^2}\).
So if I set \(\tilde{f}(x) = \frac{1}{\sqrt(2\pi)\sigma}e^{-\frac{x^2}{2\sigma}}\) then \(f(x,y) = \tilde{f}(x) \tilde{f}(y)\).
The \(e\) comes from the fact I wanted my \(X\) and \(Y\) coordinates to be independent and the \(\pi\) comes from the fact that I wanted rotational invariance so I’m integrating over a circle.
The interesting thing happens if I throw two darts. Suppose I throw my first dart aiming at (0,0) which lands at (\(X_1\), \(Y_1\)), I aim my next dart at the first dart, so this one lands at (\(X_2\),:math:Y_2) with \(X_2=X_1 + X\) and \(X_2=Y_1+Y\).
So the position of the second dart is the sum of the two errors. But my sum is still rotationally invariant and the variables \(X_2\) and \(Y_2\) are still independent, so (\(X_2\), \(Y_2\)) satisfies my two assumptions.
That means that when I add independent normal distributions together I get another normal distribution.
It’s this property that makes it so useful, because if I take the average of a very long sequence of random variables I should get something that’s the same shape no matter how long my sequence is and taking a sequence twice as long is like adding the two sequences together. It’s this property of the normal distribution that makes it so useful.
Other useful materials#
\(\chi^2\) distribution#
The \(\chi^2\) distribution is a special case of the gamma distribution, as a \(\chi^2\) random variable with \(n\) degrees of freedom follows a gamma distribution with shape parameter \(\alpha = n/2\) and \(\beta = 1/2\), namely, \(\chi^2 \sim Gamma(n/2,1/2)\), giving the density function as
The relationship between normal and \(\chi^2\) distribution#
Firstly, we need prove \(\Gamma(1/2) = \sqrt{\pi}\) (Ref: The Gamma Function)
Secondly, we need to find the pdf of \(X^2\) (Ref: Normal to Chi)
If \(X \sim \mathcal{N}(0,1)\), then the pdf of \(X\) is
Let \(f\) be the pdf of \(X^2\). Then,
Noncentral \(\chi^2\) distribution#
Wishart distribution#
Discrete distribution#
Bernoulli distribution \(X \sim Bernoulli(p)\)#
Bernoulli distribution is used to indicate that the random variable \(X\) has the Bernoulli distribution with parameter \(p\), where \(0 < p < 1\). A Bernoulli random variable \(X\) with success probability \(p\) has probability mass function
where \(0 < p < 1\). The Bernoulli distribution is associated with the notion of a \(Bernoulli \quad trial\), which is an experiment with two outcomes, generically referred to as \(success\) (x = 1) and \(failure\) (x = 0).
Binomial distribution \(X \sim Binomial(n,p)\)#
The binomial distribution models the number of successes in \(n\) mutually independent Bernoulli trials, each with probability of success \(p\). The random variable \(X \sim binomial(n,p)\) has probability mass function
Possion distribution#
A Poisson random variable \(X\) with scale parameter \(\mu\) has probability mass function
The Poisson distribution can be used to approximate the binomial distribution when \(n\) is large and \(p\) is small. Besides, it can be also used to model the number of events in an interval associated with a process that evolves randomly over space or time. Applications include the number of typographical errors in a book, the number of customers arriving in an hour.