Biobank Japan Data in CRI

Author

Sabrina Mi

Published

January 2, 2022

BBJ data directory: \gpfs/data/im-lab/nas40t2/Data/BBJ

I first downloaded and decrypted Biobank Japan data (instructions), then organized into subdirectories BBJ-genotypes-decrypted and BBJ-phenotypes-decrypted, in their original form.

Phenotypes

BBJ phenotypes file: gpfs/data/im-lab/nas40t2/Data/BBJ/BBJ-phenotypes.csv

This CSV combines all phenotype data in the BBJ-phenotypes-decrypted subdirectory into one file. The original BBJ phenotype data in BBJ-phenotypes-decrypted, was bulky and used dataset IDs instead of phenotype names. The file BBJ-phenotype-list.txt contains all the phenotypes and their folder names (download)

I created the combined phenotype file with the following script:

python3 process-phenotypes.py --BBJ_folder /Users/sabrinami/BBJ/BBJ-phenotypes \
--phenotype_mapping /Users/sabrinami/Github/analysis-sabrina/BBJ-data-processing/BBJ-phenotype-list.txt \
--output /Users/sabrinami/Github/analysis-sabrina/BBJ-data-processing/BBJ-phenotypes.csv

Genotypes

BBJ genotypes folder: gpfs/data/im-lab/nas40t2/Data/BBJ/BBJ-genotypes-decrypted