Downloading and Decrypting dbGaP Data
how_to
This page is about downloading and decrypting data from NCBI dbGaP. I will split up the instructions into Downloading and Decrypting
Downloading
To download, one must be approved in the dbGaP controlled data access system, receive an email that the data is ready, and follow the given instructions.
- Get approved to download dbGaP data. I did this through eRA Commons.
- Download IBM’s Aspera CLI: download is available here. Because I was downloading to CRI, I selected the
Linux x86_64
version. - Run the downloaded file. You might have to run
chmod
to make it executable. It should create a directory.aspera/
in your home directory. - Receive an email about a new dataset available for download. Click on the email’s link to see your personal dbGaP Authorized Access: Downloads page. Then click Download, and there will be a pop-up giving you a command to run Aspera to do the download:
"%ASPERA_CONNECT_DIR%\bin\ascp" -QTr -l 300M -k 1 -i "%ASPERA_CONNECT_DIR%\etc\asperaweb_id_dsa.openssh" -W <some_long_key> <some_dbGaP_server_address> .
- Change
"%ASPERA_CONNECT_DIR%\bin\ascp"
toaspera
. If you’re running on CRI, the command is already in the PATH - Change
"%ASPERA_CONNECT_DIR%\etc\asperaweb_id_dsa.openssh"
to~/.aspera/etc/asperaweb_id_dsa.openssh
. I’m not entirely sure if specifying a private key is necessary, but this is what I did. - Run the command in the directory you want the data to go, or edit the
.
at the end of the command. Wondering where you should put the data? Read all of the section on decrypting, because the data needs to be inside a certain directory to be decrypted.
Decrypting
I decrypted using NCBI’s SRA Toolkit.
- Because I was working on CRI, I downloaded the Linux version from here.
- The project key (necessary for decryption) can be found on the My Projects tab of the dbGaP data portal. Download the key and move it to CRI.
- To install and configure the SRA Toolkit, I used this guide which was very helpful.
- The guide also walks through how to configure the toolkit, which seems like a lot because I ended up using just one command,
vdb-decrypt
, but it was necessary. - When in the interactive program
vdb-config -i
, you need to do three things: set a home workspace, import the project key (the one you downloaded earlier), and set a project directory for the project corresponding to the key. - I have the home workspace in my directory inside our CRI lab share, and the project directory inside the home workspace.
71954
and71955
are two data downloads corresponding with this project.
ncbi
`-- dbGaP-11644
|-- files
| |-- 71954
| `-- 71955
|-- nannot
|-- refseq
|-- sra
`-- wgs
- To be decrypted, the data must be inside the project directory that you chose, so in this case, all of the data must be inside
dbGaP-11644
. - When the project directory is set up and the data is inside, you can use the
vdb-decrypt
command to decrypt the data (SRA data is decrypted differently, and there’s a lot of documentation on the NCBI website about how to deal with that data.)
Reuse
© HakyImLab and Listed Authors - CC BY 4.0 for Text and figures - MIT for code