Concepts#

Some key concepts#

PIP (Posterior Inclusion Probability)#

The PIP for a variable \(X_i\) is the proportion of MCMC samples in which \(X_i\) was included in the model. Mathmatically, the PIP for \(X_i\) can be calculated as

\[PIP(X_i) = \frac{N_i}{N}\]

Where \(N\) is the number of MCMC iterations and \(N_i\) is the number of iterations where \(X_i\) ​was included.

Centered and scaled genotype matrix W#

Ref: W The additive genomic relationship matrix \(\mathbf{G}\) (VanRaden PM. 2008. J Dairy Sci. 91:4414-4423) is constructed using all genetic markers as follows: \(\mathbf{G}=\mathbf{W}\mathbf{W}^{\intercal}/m\), where \(W\) is the centered and scaled genotype matrix, and \(m\) is the total number of markers. Each column vector of \(\mathbf{W}\) was calculated as follows: \(\mathbf{w}_i = (m_i -2p_i)/\sqrt{2p_i(1-p_i)}\) , where \(p_i\) is the minor allele frequency of the \(i\) -th genetic marker and \(\mathbf{m}_i\) is the ith column vector of the allele count matrix, \(\mathbf{M}\), which contains the genotypes coded as 0, 1 or 2 counting the number of minor allele.

How to scale and center genotype:

we assume the allele frequency of allele \(a\) is \(1 - p_i\),the allele frequence of other allele \(A\) is \(p_i\). According to Hardy-Weinberg principle, the genotypes \(aa\),:math:Aa and \(AA\) (coded as 0, 1 or 2) follow the following distribution:

Marker

0

1

2

Frequency

\((1-p_i)^2\)

\(2p_i(1-p_i)\)

\(p_i^2\)

The mean of the genotype is

\[ \begin{aligned} E(\mathbf{m}_i) = 0 * (1-p_i)^2 + 1 * 2p_i(1-p_i) + 2 * p_i^2 = 2p_i \end{aligned} \]

The variace of the genotype is

\[ \begin{align} Var(\mathbf{m}_i) &= E(\mathbf{m}_i^2) - E(\mathbf{m}_i)^2 \\\ &=\big [ 0 * (1-p_i)^2 + 1 * 2p_i(1-p_i) + 4 * p_i^2 \big ] - \\\ &\quad \big [ 0 * (1-p_i)^2 + 1 * 2p_i(1-p_i) + 2 * p_i^2 \big ]^2 \\\ &= 2p_i(1-p_i) + 4 * p_i^2 - (2p_i)^2 \\\ &= 2p_i(1-p_i) \end{align} \]

So after centering and scaling genotype,we get

\[ \begin{align} \mathbf{w}_i = (m_i -2p_i)/\sqrt{2p_i(1-p_i)} \end{align} \]