Installing/Running Tensorqtl on CRI

Author

Natasha

Published

November 17, 2020

Note: You can download tensorqtl using pip install. However, there seems to be a bug that makes tensorqtl incompatible with pandas plink 2.2.2. If you want to download tensorqtl using pip, then you have to downgrade pandas plink to 2.1.x so that it matches the version of pandas 1.1.x

I installed tensorqtl straight from the Github repo https://github.com/broadinstitute/tensorqtl

  1. Activate the environment
Code
conda activate /gpfs/data/im-lab/nas40t2/bin/envs/tensorqtl
  1. Clone the repo into your own directory in the labshare
Code
git clone git@github.com:broadinstitute/tensorqtl.git
cd tensorqtl
  1. Install tensorqtl and its requirements
Code
pip install -r install/requirements.txt
  1. Run Tensorqtl

Requirements: Tensorqtl requires a genotype, phenotype and covariate file. The genotype files must be in plink format. The phenotype file must be in a .bed.gz format and follow the UCSC bed formate (http://fastqtl.sourceforge.net) Finally, the covariate file must be in a .txt format and is in the setup covariates x samples.

Also make sure to set column names true for both the phenotype and covariate files. (Row names must also be present only for the covariate file)

It’s also helpful to look at the example data provided by the repo.

This is the command I used to run tensorqtl. I ran a trans-qtl so start and end positions on the genes were not significant.

Code
python3 -m tensorqtl /gpfs/data/im-lab/nas40t2/Data/GTEx/V8/genotype/plink_files/GTEX_tensorqtl /gpfs/data/im-lab/nas40t2/natasha/tensorqtl/pheno-tensorqtl.bed.gz /gpfs/data/im-lab/nas40t2/natasha/GTEX_tensorqtl \
--covariates /gpfs/data/im-lab/nas40t2/natasha/tensorqtl/covariates.txt  \
--mode trans 

The command above will generate a parquet file in wherever you set the prefix to. The file can be read using pandas

Code
import pandas as pd
df = pd.read_parquet("<path_to_filename>")