Regression framework#
Expection and Variance#
Discrete random variable#
\(x_1,x_2,\dots, x_n\) are \(n\) independent observations with mean \(\mu\) and variance \(\sigma^2\). we define expection \(\mu\) as
we define variance as the expection of the squared deviation of a random variable from its population mean or sample mean. The exprssion of the variance can be expanded as follows:
When we have finite sample \(x_1,x_2,...,x_N\), then sample mean \(\bar{x}\) and sample variance \(s^2\) is
Now we want proof, sample mean and sample variance are unbiased estimation.
Firstly,
Secondly,
Distribution of the sample variance#
In the case that \(Y_i\) are independent observations from a normal distribution, Cochran’s theorem shows that \(S^2\) follows a scaled chi-squared distribution.
Proof:
Centering and standardlizing#
Math background:#
R demo#
#R code demo for centering and standardlizing:
N=10
P=3
X <- matrix(round(runif(30, 20, 60)), nrow=N, ncol=P)
onesN <- matrix(rep(1,N),N,1)
unitN <- diag(rep(1,N))
matMean <- 1/N * t(onesN) %*%X
matCenter <- X - onesN %*% matMean
# or matCenter <- (unitN - 1/N * onesN %*% t(onesN) )%*%X
matSd<- diag(sqrt(t(matCenter) %*% matCenter /(N-1)))
matScale <- matCenter %*% diag(1/matSd)
matScale;matMean;matSd
#matScale is equivalent to scale(X)
scale(X)
Simple Linear Regression#
Relationship between dependent and independent variables considering variable type#
Indep var \ Dep var |
Continuous |
Discrete |
---|---|---|
Continuous |
OLS Regression |
Logistic Regression |
Discrete |
T-Test, ANOVA |
Categorical Data Analysis |
Model:#
OLS estimation:#
Coefficient of Determination (\(R^2\))#
The coefficient of determination (\(R^2\)) measures how well a statistical model predicts an outcome, which is represented by the model’s dependent variables. More technically, \(R^2\) is a measure of good of fit. It is the proportion of variance in the dependent variable that is explained by the model.
In simple linear regression, \(R^2\) is also denoted as the Squared Pearson Correlation coefficient between the observed value and the fitted values.
Logistic linear regression#
Some questions about linear regression#
why use link function? Because the dependent variables are binary (0/1), we need to model it. Derivative of logistic regression
Multiple linear regression#
Model:#
OLS estimation:#
Relationship between \(\beta\) and \(\sigma_{x,y}\) and \(\sigma^2_{x}\)#
Generalized linear regression#
Derivation#
The general procedures:
General exponential family format
Some important attributes of log-likelihood
The scalar form of Taylor series
Set \(\partial \ell (\theta) / \partial \theta = 0\) and rearranging terms yields:
The basic matrix form of Newton-Raphson algorithm:
Replace hession matrix with the information matrix (i.e. \(E(H(\theta))= -Var[S(\theta)]= -I(\theta)\)), we get Fisher scoring algorithm:
Estimate the coefficient \(\beta\). Scalar form
Some results:
Matrix form
where \(y\) is the \(n\times1\) vector of observations, \(\ell(\theta)\) is the \(n\times 1\) vector of log-likelihood values associated with observations, \(V = diag[Var(y_{i})]\) is the \(n \times n\) variance matrix of the observations, \(D=diag[\partial \eta_{i} / \partial \mu_{i}]\) is the \(n \times n\) matrix of derivatives, and \(\mu\) is the \(n \times 1\) mean vector. Let \(W=(DVD)^{-1}\), we can get:
Pseudo-Likelihood for GLM Using Fisher scoring equation yields \(\beta = \tilde{\beta} +(X^{'}\tilde{W}X)^{-1}X^{'}\tilde{W}\tilde{D}(y-\tilde{\mu})\), where \(\tilde{W},\tilde{D}\), and \(\mu\) evaluated at \(\tilde{\beta}\). So GLM estimating equations:
where \(y^{*} = X\tilde{\beta} + \tilde{D}(y-\tilde{\mu}) = \tilde{eta} + \tilde{D}(y-\tilde{\mu})\), and \(y^{*}\) is called the pseudo-variable.
Mixed Linear Model#
where \(\mathbf{e} \sim MVN(0,\mathbf{R})\) and \(\mathbf{u} \sim MVN(0,\mathbf{G})\), and \(\mathbf{e}\) and \(\mathbf{u}\) are independent.
Firstly, we have probability function
Next we need to consider \(\mathbf{y}\) based on \(\mathbf{u}\) and \(\mathbf{e}\) (it’s great to use conditional distribution)
So the joint distribution is
Now consider the log of the density,
Take the derivatives of \(\ell\) with respect to \(\boldsymbol{\beta}\) and \(\boldsymbol{u}\) yields
Henderson’s mixed-model’s equation