molecular QTLs#
molQTL summary statistics#
Options#
- --eqtl-flist flistfile#
- --add-gene-n geneSampleSizeFile#
- --keep-gene#
Convertion between SMR BESD format and gz format#
Convert plist format to besd format
BayesOmics64 --eqtl-flist flistFile \
--make-besd --out mybesd
Since BESD format can save effect size and se only, a compressed gz format has been designed to incorporate more useful information
psuedo-1kg-chr22-causal-model.q.query.gz
GeneID GeneChr GeneLength GenePhyPos SNPID SNPChr SNPPhyPos A1 A2 A1Freq BETA SE Pvalue N
ENSG00000070413 22 2000000 19066881.0 rs2073776 22 19024651 C T 0.499 0.0015 0.0044 0.72 100000
ENSG00000070413 22 2000000 19066881.0 rs6623 22 19025381 C T 0.499 0.0016 0.0044 0.70 100000
ENSG00000070413 22 2000000 19066881.0 rs10160 22 19026025 G A 0.498 -0.003 0.0044 0.44 100000
......
So we can convert BESD format to query.gz format if additional gene sample size is needed
BayesOmics64 --beqtl-summary mybesd \
--make-query --out myquery
Please see SMR flist to besd for details.
Extract a single genes basd on gene input file
BayesOmics64 --beqtl-summary mybesd \
--keep-one-gene geneID \
--make-query --out myquery
Add gene sample size
There are two ways to add sample size into analysis:
when all SNPs within the same gene have the same sample size,
GeneID N
ENSG00000182902 10000
ENSG00000183628 10000
ENSG00000100346 10000
......
we suggest that adding sample size information into BESD format directly by the folloing command:
BayesOmics64 --beqtl-summary mybesd \
--add-gene-n geneSampleSizeFile\
--make-query \
--out myquery
when SNPs with the same gene may have different sample size (usually occur in meta analysis, like eQTLs from eQTLGen project),
GeneID N
ENSG00000182902 10000
ENSG00000183628 10000
ENSG00000100346 10000
......
we suggest that adding these sample size into gz format by the following command:
BayesOmics64 --beqtl-summary mybesd \
--add-gene-n geneSampleSizeFile\
--make-query \
--out myquery
Tip
A) If you’re adding gene sample size to the analysis, the original sample size in the BESD or gz file will be replaced and gene-snp pairs with NA sample size will be deleted automatically. B) If per gene sample size is missing, you can also add total sample size to besd format file by using SMR software, and we will use total sample size for each gene. C) If both total and per gene sample size are missing, we will use estimated gene sample size in the analysis (add this function later).
Warning
Since the LD matrix is the low-rank LD, it’s important to ensure that the gene and SNP set in the geneSampleSizeFile are identical to those in the BESD or gz file. If they’re not identical, you’ll need to perform QC first to ensure they match first.
Merge multiple besd file into one*
BayesOmics64 --beqtl-list mybesdListFile.txt --merge-besd --out mergeBesd
Input file format mybesdListFile.txt (full paths can be specified if the besd files are in different directories)
test_1
test_2
test_3
......
test_22
QC for molQTLs#
It will very convenient to do manipulations on molQTL datasets using R, as the following example:
library(tidyverse)
library(data.table)
eqtl = fread(paste0("example.query.gz"))
####################################
cisWin = 1000000
dat = eqtl %>%
group_by(GeneChr) %>%
mutate(
cisStart = pmax(0, GenePhyPos - cisWin),
cisEnd = GenePhyPos + cisWin
) %>%
filter(GeneChr == SNPChr) %>% # Ensure SNP and gene are on the same chromosome
filter(SNPPhyPos >= cisStart & SNPPhyPos <= cisEnd) %>%
filter(abs(BETA) > 1e-6) %>%
select(all_of(colnames(eqtl))) %>%
ungroup() %>% arrange(GeneChr,GenePhyPos, GeneID, SNPPhyPos)