# IntroStatGen R Studio Servers

For the one-day seminar, we had a hands-on lab where we decided we needed to set up R Studio Servers. The servers needed pre-loaded data, access to a terminal, pre-compiled binaries for torus and fastenloc, and the correct python/R/Linux environments to run all of our analyses. Here’s a guide about how we set up that server.

## Using Google Cloud Compute Engine

To set everything up, we had a basic workflow:

1. Create a new VM.
2. Configure the VM as an RStudio server with everything installed, downloaded, etc.
3. Take a snapshot of this VM.
4. Spin up a bunch of new VMs from this snapshot.

Most of the time (and therefore most of this document) was spent on step 2. Installing, compiling, configuring, uploading, and permissions-ing was the long part. Anyway, you won’t have to do all of that if you want to use the most current snapshot. It’s called rstudio-final-2020-06-12.

To spin up multiple VMs, we used Google’s command line tools. Most commands in the Google Cloud Console can be replicated in the command line, and just before creating a VM or Snapshot in the Console, you can find a link which gives you the analogous command. Here is what we used to spin up an array:

$cat gcloud_init.sh gcloud compute --project "introstatgen" disks create "qgt-${1}" \
--size "50" --zone "us-central1-a" \
--source-snapshot "rstudio-final-2020-06-12" --type "pd-standard"

gcloud beta compute --project=introstatgen instances create qgt-${1} \ --zone=us-central1-a --machine-type=custom-1-6656 --subnet=default \ --network-tier=PREMIUM --maintenance-policy=MIGRATE \ --service-account=10877517008-compute@developer.gserviceaccount.com \ --scopes=https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management.readonly,https://www.googleapis.com/auth/trace.append \ --tags=http-server --disk=name=qgt-${1},device-name=qgt-${1},mode=rw,boot=yes,auto-delete=yes --reservation-affinity=any$ for i in {0..35} ; do bash gcloud_init.sh \$i ; done


Which made 36 different servers, each with RStudio available on port 8787. To get a sense of the size of the servers, they were initiated with 50GB of disk space, 6.5 GB of RAM, and 1 processor. When we used Google Cloud’s smallest VM (which only had ~3GB RAM) there was a memory error when running S-MultiXcan.

## R Studio Server

This tutorial link was pretty helpful. One thing which took some figuring out was that the command listed under the Install R on your VM heading. The command listed downloaded a version of R incompatible with the Ubuntu version running on the VM. But this link had useful information as well as a directory of the R/Ubuntu versions for download.

The download and installation process for R Studio Server was documented very well on their website. R Studio ran automatically after the download, so I didn’t even need a startup script for the VM clones. It just runs automatically, I guess.

# Add a Student User

Students were asked to access the RStudio server using the account student with a given password. This corresponds to a user account on the VM, which can be created using the command sudo adduser student. Then, a password is specified, and these credentials can be used to log into the RStudio server.

NOTE because this user does not have sudo privileges, everything in the user’s home directory /home/student/ needs to be readable and writeable (preferrably owned) by student. This can be done with a combination of sudo chown student ... and sudo -iu student; the latter logs in as the student user.

## Add Python Environment

We used Anaconda. The installation was performed when logged in as the student user. Doing it as my own user and then changing permissions was a nightmare. The installer script can be downloaded using something like curl -O https://repo.anaconda.com/archive/Anaconda3-2019.03-Linux-x86_64.sh. Check here for the latest version.

A conda environment was defined using a yaml environment file link

## Add fastenloc and torus

Fastenloc and torus can be compiled pretty easily on Ubuntu. One may need to install a few libraries using apt. Make sure to compile static versions, because these binaries should end up in a folder at /home/student/bin/, and the student user may not have the necessary permissions to find linked libraries.

After compiling static versions, move them to /home/student/bin/; make sure to change owner to student and make them executable by student. It is also good to automatically add /home/student/bin to the PATH variable, which can be achieved by modifying the file at /home/student/.bashrc.

## Add Data

We used Box (this repo here) to gather and store data for this version of the class, and I didn’t find a good way to add/update data from Box to the VM. I ended up downloading from Box to my machine, and then scp-ing it to the VM. This means that each time the data in Box changed, I had to re-upload or manually update the data on the VM. Not pretty. I hope (hope) there is some way to download from Box out there, and a custom download script could be added to the VM creation script, so that a fresh version of the Box repository is added with each new VM.

The data should end up in /home/student/ and should be owned, readable, and writeable by the student.

## Add Lab Documents

Log in as the student, and clone the lab documents link again into the student’s home directory.

## Make the Server Image Publicly Available

In addition to the snapshot, there is a publicly available image named intro-stat-gen-rstudio-server-2020-06-16. It was made publicly available by this command (suggested by this page )

gcloud compute images add-iam-policy-binding intro-stat-gen-rstudio-server-2020-06-16 \
--member='allAuthenticatedUsers' \
--role='roles/compute.imageUser


### Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The source code is licensed under MIT.

### Suggest changes

If you find any mistakes (including typos) or want to suggest changes, please feel free to edit the source file of this page on Github and create a pull request.

### Citation

For attribution, please cite this work as

Owen Melia (2020). IntroStatGen R Studio Servers using Google Cloud. ImLab Notes. /post/2020/06/16/introstatgen-r-studio-servers-using-google-cloud/

BibTeX citation

@misc{
title = "IntroStatGen R Studio Servers using Google Cloud",
author = "Owen Melia",
year = "2020",
journal = "ImLab Notes",
note = "/post/2020/06/16/introstatgen-r-studio-servers-using-google-cloud/"
}