PredictDB databases are stored in simple sqlite files. You can programmatically query them via python, R, perl, etc (using appropriate libraries). Below is an example on how to query the database in R.
Each has two tables the extra and the weights tables. - extra table contains the list of available genes and some prediction performance information (for elastic net, not for the mashr models), - weights table contains the weights for predicting the gene expression levels (or other mediating/molecular traits)
Code
## install.packages("RSQLite")library("RSQLite")sqlite <-dbDriver("SQLite")dbname <-"assets/en_Adipose_Subcutaneous.db"## add full path if db file not in current directory## connect to dbdb =dbConnect(sqlite,dbname)## list tablesdbListTables(db)
gene rsid varID ref_allele eff_allele
1 ENSG00000261456.5 rs11252127 chr10_52147_C_T_b38 C T
2 ENSG00000261456.5 rs11252546 chr10_58487_T_C_b38 T C
3 ENSG00000261456.5 rs11591988 chr10_80130_C_T_b38 C T
4 ENSG00000261456.5 rs4495823 chr10_97603_G_A_b38 G A
5 ENSG00000261456.5 rs11253478 chr10_98907_C_T_b38 C T
6 ENSG00000261456.5 rs7901397 chr10_102757_T_C_b38 T C
7 ENSG00000261456.5 rs7476951 chr10_137211_T_C_b38 T C
8 ENSG00000261456.5 rs3123247 chr10_264285_C_T_b38 C T
9 ENSG00000261456.5 rs4880567 chr10_267364_T_C_b38 T C
10 ENSG00000261456.5 rs4881392 chr10_519179_A_G_b38 A G
weight
1 0.052252706
2 -0.033544959
3 0.014296499
4 -0.030826218
5 0.013036311
6 -0.118567569
7 0.007865262
8 0.004556608
9 0.030123310
10 0.057842737
Code
## how many genes are available for given tissue?## dbname should be the name of the sqlite database for the tissuequery('select count(*) from extra')
count(*)
1 8650
Code
## select genes with R2>0.01 (this is cor>0.1)## ths won't work for the latest MASHR-based GTEx V8 models## this only works for models where R2 is included. #high.h2.genes <- query('select * from extra where R2 > 0.01')
---title: Querying PredictDB sqlite databasesauthor: Haky Imdate: '2021-04-27'slug: querying-predictdb-sqlite-databasescategories: - how_totags: []---PredictDB databases are stored in simple sqlite files. You can programmatically query them via python, R, perl, etc (using appropriate libraries). Below is an example on how to query the database in R.Each has two tables the extra and the weights tables. - extra table contains the list of available genes and some prediction performance information (for elastic net, not for the mashr models), - weights table contains the weights for predicting the gene expression levels (or other mediating/molecular traits)```{r}## install.packages("RSQLite")library("RSQLite")sqlite <-dbDriver("SQLite")dbname <-"assets/en_Adipose_Subcutaneous.db"## add full path if db file not in current directory## connect to dbdb =dbConnect(sqlite,dbname)## list tablesdbListTables(db)dbListFields(db, "weights")dbListFields(db, "extra")## convenience query functionquery <-function(...) dbGetQuery(db, ...)## example queriesquery('select count(*) from weights')query('select * from weights where gene = "GATA6" ')query('select * from weights limit 10')## how many genes are available for given tissue?## dbname should be the name of the sqlite database for the tissuequery('select count(*) from extra')## select genes with R2>0.01 (this is cor>0.1)## ths won't work for the latest MASHR-based GTEx V8 models## this only works for models where R2 is included. #high.h2.genes <- query('select * from extra where R2 > 0.01')```