query alpha-missense

Author

Haky Im

Published

October 23, 2023

Code
suppressMessages(library(tidyverse))
suppressMessages(library(glue))

PRE = "/Users/haekyungim/Library/CloudStorage/Box-Box/LargeFiles/imlab-data/data-Github/web-data"
SLUG="query-alpha-missense" ## copy the slug from the header
bDATE='2023-10-23' ## copy the date from the blog's header here
DATA = glue("{PRE}/{bDATE}-{SLUG}")
if(!file.exists(DATA)) system(glue::glue("mkdir {DATA}"))
WORK=DATA
##system(glue("open {DATA}")) ## this will open the folder 

1. Create an SQLite Database

First, ensure SQLite3 is installed. If it’s not, you can install it using Homebrew on macOS:

brew install sqlite3

Now, let’s create an SQLite database from the TSV file:

## this needs uncompressed file
sqlite3 AlphaMissense.db <<EOF
CREATE TABLE mutations (uniprot_id TEXT, protein_variant TEXT, am_pathogenicity REAL, am_class TEXT);
.mode tabs
.import AlphaMissense_aa_substitutions.tsv.gz mutations
EOF
Code
##install.packages("readr")
library(RSQLite)
library(readr)

# Connect to the SQLite database
con <- dbConnect(SQLite(), glue("{DATA}/test-alpha-missense-query/AlphaMissense.db"))

# Read the gzipped TSV file
# data <- read_tsv(glue("{DATA}/AlphaMissense_aa_substitutions.tsv.gz"),skip=3)
# head(data)

# Write the data into the SQLite database
# tic=Sys.time()
# dbWriteTable(con, "mutations", data)
# toc=Sys.time()
# print(toc-tic)


dbExecute(con, "VACUUM")


# Close the connection
dbDisconnect(con)

Note: The provided command assumes that the TSV file doesn’t contain header lines. If it does, you should create a version of the TSV file without them, or delete the imported header row from the SQLite table afterwards.

2. Create a Shiny App

Make sure you have R, Shiny, and the required packages installed:

Code
install.packages("shiny")
install.packages("RSQLite")
install.packages("DT")

Here’s a simple Shiny app:

Code
library(shiny)
library(RSQLite)
library(DT)

# Connect to the SQLite database
con <- dbConnect(SQLite(), "AlphaMissense.db")

ui <- fluidPage(
    titlePanel("AlphaMissense Query"),
    
    sidebarLayout(
        sidebarPanel(
            textInput("uniprot_id", "UniProt ID", ""),
            textInput("protein_variant", "Protein Variant", ""),
            actionButton("query_button", "Query")
        ),
        
        mainPanel(
            DTOutput("results_table")
        )
    )
)

server <- function(input, output) {
    
    observeEvent(input$query_button, {
        query <- "SELECT * FROM mutations WHERE 1=1 "
        
        if (input$uniprot_id != "") {
            query <- paste0(query, "AND uniprot_id = '", input$uniprot_id, "' ")
        }
        
        if (input$protein_variant != "") {
            query <- paste0(query, "AND protein_variant = '", input$protein_variant, "'")
        }
        
        results <- dbGetQuery(con, query)
        output$results_table <- renderDT({results})
    })
}

shinyApp(ui = ui, server = server)

This app provides two input fields: one for the uniprot_id and another for the protein_variant. If the user fills out the uniprot_id and clicks “Query”, all matching mutations for that UniProt ID will be displayed. If they also provide a protein variant, the app will display only the matching mutation.

To run this app, save the code into a file, say app.R, then from the R console (or RStudio), navigate to the directory containing both the SQLite database (AlphaMissense.db) and the app (app.R), and run:

Code
library(shiny)
runApp(glue("{DATA}/app.R"))
Code
install.packages("bigrquery")
library(bigrquery)
Code
runApp(glue("~/Downloads/alphamissense-query/app.R"))
Code
library(shiny)
runApp(glue("~/Downloads/alphamissense-query-hugo/app.R"))

The error suggests that the deployed Shiny app is not able to authenticate with Google BigQuery because it does not have the required credentials. When you run the Shiny app locally on your machine, it might be using your personal Google credentials. However, when the app is deployed on shinyapps.io, it does not have access to these credentials.

To resolve this:

  1. Service Account:
    • You’ll need to create a service account in the Google Cloud Console that has access to the BigQuery dataset.
    • Download the JSON key for this service account.
  2. Include Service Account in Shiny App:
    • Save the JSON key in the same directory as your Shiny app.
    • In your Shiny app, before making any BigQuery calls, authenticate using this service account.
    library(bigrquery)
    
    # Authenticate using Service Account
    bq_auth(path = "path_to_service_account.json")
  3. Deploy the Shiny App:
    • Make sure you deploy both the Shiny app files (app.R, etc.) and the service account JSON key to shinyapps.io.
  4. Permissions:
    • Ensure the service account has the necessary permissions on BigQuery to read the data.
  5. Billing:
    • If your Google Cloud Project uses billing (as BigQuery is a paid service), make sure the service account has permissions to incur charges.
  6. Avoid Hardcoding Paths:
    • Instead of hardcoding the path to the service account file, you might want to use a more dynamic approach, for instance:
    # Authenticate using Service Account
    service_account_path <- Sys.getenv("SERVICE_ACCOUNT_PATH", "default_service_account.json")
    bq_auth(path = service_account_path)
    You can then set the SERVICE_ACCOUNT_PATH environment variable appropriately, either locally or on shinyapps.io.

After making these adjustments, try redeploying your Shiny app.

Code
library(biomaRt)
# Choose the ENSEMBL mart and dataset for human genes
ensembl <- useMart("ensembl", dataset="hsapiens_gene_ensembl")

# Fetch the required attributes: HUGO name and UniProt ID
mapping <- getBM(attributes=c('hgnc_symbol', 'uniprotswissprot'), mart=ensembl)

# Filter out entries with empty UniProt IDs or HUGO names
mapping <- mapping[mapping$hgnc_symbol != '' & mapping$uniprotswissprot != '', ]

write.csv(mapping, "hugo_to_uniprot_mapping.csv", row.names=FALSE)