Posts

Querying PredictDB sqlite databases

2021-04-27 Haky Im
ℙredictDB databases are stored in simple sqlite files. You can programmatically query them via python, R, perl, etc (using appropriate libraries). Below is an example on how to query the database in R. Each has two tables the extra and the weights tables. - extra table contains the list of available … Read more →

Bionimbus PDC

2021-03-31 Sabrina Mi
ℍost bionimbus HostName bionimbus-pdc.opensciencedatacloud.org User username IdentityFile ~/.ssh/id_rsa ForwardAgent yes Host bionimbusvm HostName instanceIPaddress User ubuntu IdentityFile ~/.ssh/id_rsa ProxyCommand ssh -q -A bionimbus -W %h:%p ForwardAgent yes Load your key into the ssh-agent by … Read more →

SPrediXcan Harmonization Errors

2021-01-07 Sabrina Mi
𝕋he error message– INFO - 0 % of model's snps used– can typically be traced to inconsistencies between variant IDs in prediction models and input GWAS files. Our GTEx v8 mashr and elastic net models are defined in hg38. For GWAS defined in hg19, we recommend first processing the GWAS with our … Read more →

PsychENCODE Models

2020-12-21 Sabrina Mi
𝔾andal et al analyzed autism spectrum disorder, schizophrenia, and bipolar disorder across multiple levels of transcriptomic organization—gene expression, local splicing, transcript isoform expression, and coexpression networks for both protein-coding and noncoding genes to produce a quantitative, … Read more →

fn_compare_expression_matrices

2020-12-03 Haky Im
𝕎ant to write a function that takes in two data frames with expression data and returns […] vector of correlations of matched genes […] summary of the correlation vect (min,1Q,median,3Q,max) […] returns a list of aligned data frames […] plots correlation for a sample of … Read more →

PrediXcan 0% variant mapping issue

2020-12-01 Haky Im
𝕄any users had difficulties matching the genotype variant id to the prediction model variant id. Here is one example added to the PrediXcan tutorial where the matching was failing because of the on the fly option not taking into account that in the GTEx v8 vcf file, chromosomes are names as chr# … Read more →

How to interpret a p-value of 0

2020-12-01 Haky Im
𝔸 p-value of zero should be interpreted as an extremely small positive value. S-PrediXcan or PrediXcan will provide the zscore as well as the p-value. You can calculate the p-value corresponding to the Zscore using the formula below. For example a Zscore of 30 will give you a (natural) log p-value … Read more →

CRI Gardner upgrade news

2020-11-30 Haky Im
𝕆perating System Upgrade - The operating system will be upgraded from Red Hat Linux 6.7 to 7.6. This will provide a kernel that will allow for a more modern software ecosystem. For example, software such as tensorflow will not run on Red Hat 6. […] GPFS Upgrade - GPFS storage clients will be … Read more →

How to convert GTEx v8 model to hg19 based on UK Biobank SNP set mapping

2020-11-23 Sabrina Mi
/gpfs/data/im-lab/nas40t2/Data/References/mappings/UKB2GTEx_mapping.txt.gz contains information for variants in UK Biobank genotypes. The columns are variant chromosome position non_effect_allele effect_allele rsid zscore panel_variant_id 1:692794:CA:C chr1 757414 CA C 1:692794_CA_C 1 NA … Read more →

Working with Large Files

2020-11-23 Sabrina Mi
𝕎hen working with large datasets, only the files with code should be pushed to Github repositories, not the data itself. The raw data inputs or analysis output should either be kept in a local directory that is never committed, or for best practices, they should be stored in Box (download). Once … Read more →