query alpha-missense

Author

Haky Im

Published

October 23, 2023

Code

suppressMessages(library(tidyverse))
suppressMessages(library(glue))

PRE = "/Users/haekyungim/Library/CloudStorage/Box-Box/LargeFiles/imlab-data/data-Github/web-data"
SLUG="query-alpha-missense" ## copy the slug from the header
bDATE='2023-10-23' ## copy the date from the blog's header here
DATA = glue("{PRE}/{bDATE}-{SLUG}")
if(!file.exists(DATA)) system(glue::glue("mkdir {DATA}"))
WORK=DATA
##system(glue("open {DATA}")) ## this will open the folder

1. Create an SQLite Database

First, ensure SQLite3 is installed. If it’s not, you can install it using Homebrew on macOS:

brew install sqlite3

Now, let’s create an SQLite database from the TSV file:

## this needs uncompressed file
sqlite3 AlphaMissense.db <<EOF
CREATE TABLE mutations (uniprot_id TEXT, protein_variant TEXT, am_pathogenicity REAL, am_class TEXT);
.mode tabs
.import AlphaMissense_aa_substitutions.tsv.gz mutations
EOF

Code

##install.packages("readr")
library(RSQLite)
library(readr)

# Connect to the SQLite database
con <- dbConnect(SQLite(), glue("{DATA}/test-alpha-missense-query/AlphaMissense.db"))

# Read the gzipped TSV file
# data <- read_tsv(glue("{DATA}/AlphaMissense_aa_substitutions.tsv.gz"),skip=3)
# head(data)

# Write the data into the SQLite database
# tic=Sys.time()
# dbWriteTable(con, "mutations", data)
# toc=Sys.time()
# print(toc-tic)


dbExecute(con, "VACUUM")


# Close the connection
dbDisconnect(con)

Note: The provided command assumes that the TSV file doesn’t contain header lines. If it does, you should create a version of the TSV file without them, or delete the imported header row from the SQLite table afterwards.

2. Create a Shiny App

Make sure you have R, Shiny, and the required packages installed:

Code

install.packages("shiny")
install.packages("RSQLite")
install.packages("DT")

Here’s a simple Shiny app:

Code

library(shiny)
library(RSQLite)
library(DT)

# Connect to the SQLite database
con <- dbConnect(SQLite(), "AlphaMissense.db")

ui <- fluidPage(
    titlePanel("AlphaMissense Query"),
    
    sidebarLayout(
        sidebarPanel(
            textInput("uniprot_id", "UniProt ID", ""),
            textInput("protein_variant", "Protein Variant", ""),
            actionButton("query_button", "Query")
        ),
        
        mainPanel(
            DTOutput("results_table")
        )
    )
)

server <- function(input, output) {
    
    observeEvent(input$query_button, {
        query <- "SELECT * FROM mutations WHERE 1=1 "
        
        if (input$uniprot_id != "") {
            query <- paste0(query, "AND uniprot_id = '", input$uniprot_id, "' ")
        }
        
        if (input$protein_variant != "") {
            query <- paste0(query, "AND protein_variant = '", input$protein_variant, "'")
        }
        
        results <- dbGetQuery(con, query)
        output$results_table <- renderDT({results})
    })
}

shinyApp(ui = ui, server = server)

This app provides two input fields: one for the uniprot_id and another for the protein_variant. If the user fills out the uniprot_id and clicks “Query”, all matching mutations for that UniProt ID will be displayed. If they also provide a protein variant, the app will display only the matching mutation.

To run this app, save the code into a file, say app.R, then from the R console (or RStudio), navigate to the directory containing both the SQLite database (AlphaMissense.db) and the app (app.R), and run:

Code

library(shiny)
runApp(glue("{DATA}/app.R"))

Code

install.packages("bigrquery")
library(bigrquery)

Code

runApp(glue("~/Downloads/alphamissense-query/app.R"))

Code

library(shiny)
runApp(glue("~/Downloads/alphamissense-query-hugo/app.R"))

The error suggests that the deployed Shiny app is not able to authenticate with Google BigQuery because it does not have the required credentials. When you run the Shiny app locally on your machine, it might be using your personal Google credentials. However, when the app is deployed on shinyapps.io, it does not have access to these credentials.

To resolve this:

Service Account:
- You’ll need to create a service account in the Google Cloud Console that has access to the BigQuery dataset.
- Download the JSON key for this service account.
Include Service Account in Shiny App:
- Save the JSON key in the same directory as your Shiny app.
- In your Shiny app, before making any BigQuery calls, authenticate using this service account.
```
library(bigrquery)

# Authenticate using Service Account
bq_auth(path = "path_to_service_account.json")
```
Deploy the Shiny App:
- Make sure you deploy both the Shiny app files (app.R, etc.) and the service account JSON key to shinyapps.io.
Permissions:
- Ensure the service account has the necessary permissions on BigQuery to read the data.
Billing:
- If your Google Cloud Project uses billing (as BigQuery is a paid service), make sure the service account has permissions to incur charges.
Avoid Hardcoding Paths:
- Instead of hardcoding the path to the service account file, you might want to use a more dynamic approach, for instance:
```
# Authenticate using Service Account
service_account_path <- Sys.getenv("SERVICE_ACCOUNT_PATH", "default_service_account.json")
bq_auth(path = service_account_path)
```
You can then set the SERVICE_ACCOUNT_PATH environment variable appropriately, either locally or on shinyapps.io.

After making these adjustments, try redeploying your Shiny app.

Code

library(biomaRt)
# Choose the ENSEMBL mart and dataset for human genes
ensembl <- useMart("ensembl", dataset="hsapiens_gene_ensembl")

# Fetch the required attributes: HUGO name and UniProt ID
mapping <- getBM(attributes=c('hgnc_symbol', 'uniprotswissprot'), mart=ensembl)

# Filter out entries with empty UniProt IDs or HUGO names
mapping <- mapping[mapping$hgnc_symbol != '' & mapping$uniprotswissprot != '', ]

write.csv(mapping, "hugo_to_uniprot_mapping.csv", row.names=FALSE)

Reuse

--- title: query alpha-missense date: 2023-10-23 author: Haky Im editor_options: chunk_output_type: console --- ```{r eval=FALSE} suppressMessages(library(tidyverse)) suppressMessages(library(glue)) PRE = "/Users/haekyungim/Library/CloudStorage/Box-Box/LargeFiles/imlab-data/data-Github/web-data" SLUG="query-alpha-missense" ## copy the slug from the header bDATE='2023-10-23' ## copy the date from the blog's header here DATA = glue("{PRE}/{bDATE}-{SLUG}") if(!file.exists(DATA)) system(glue::glue("mkdir {DATA}")) WORK=DATA ##system(glue("open {DATA}")) ## this will open the folder ``` ### 1. Create an SQLite Database First, ensure SQLite3 is installed. If it's not, you can install it using Homebrew on macOS: ```bash brew install sqlite3 ``` Now, let's create an SQLite database from the TSV file: ```bash ## this needs uncompressed file sqlite3 AlphaMissense.db <<EOF CREATE TABLE mutations (uniprot_id TEXT, protein_variant TEXT, am_pathogenicity REAL, am_class TEXT); .mode tabs .import AlphaMissense_aa_substitutions.tsv.gz mutations EOF ``` ```{r eval=FALSE} ##install.packages("readr") library(RSQLite) library(readr) # Connect to the SQLite database con <- dbConnect(SQLite(), glue("{DATA}/test-alpha-missense-query/AlphaMissense.db")) # Read the gzipped TSV file # data <- read_tsv(glue("{DATA}/AlphaMissense_aa_substitutions.tsv.gz"),skip=3) # head(data) # Write the data into the SQLite database # tic=Sys.time() # dbWriteTable(con, "mutations", data) # toc=Sys.time() # print(toc-tic) dbExecute(con, "VACUUM") # Close the connection dbDisconnect(con) ``` Note: The provided command assumes that the TSV file doesn't contain header lines. If it does, you should create a version of the TSV file without them, or delete the imported header row from the SQLite table afterwards. ### 2. Create a Shiny App Make sure you have R, Shiny, and the required packages installed: ```{r eval=FALSE} install.packages("shiny") install.packages("RSQLite") install.packages("DT") ``` Here's a simple Shiny app: ```{r eval=FALSE} library(shiny) library(RSQLite) library(DT) # Connect to the SQLite database con <- dbConnect(SQLite(), "AlphaMissense.db") ui <- fluidPage( titlePanel("AlphaMissense Query"), sidebarLayout( sidebarPanel( textInput("uniprot_id", "UniProt ID", ""), textInput("protein_variant", "Protein Variant", ""), actionButton("query_button", "Query") ), mainPanel( DTOutput("results_table") ) ) ) server <- function(input, output) { observeEvent(input$query_button, { query <- "SELECT * FROM mutations WHERE 1=1 " if (input$uniprot_id != "") { query <- paste0(query, "AND uniprot_id = '", input$uniprot_id, "' ") } if (input$protein_variant != "") { query <- paste0(query, "AND protein_variant = '", input$protein_variant, "'") } results <- dbGetQuery(con, query) output$results_table <- renderDT({results}) }) } shinyApp(ui = ui, server = server) ``` This app provides two input fields: one for the `uniprot_id` and another for the `protein_variant`. If the user fills out the `uniprot_id` and clicks "Query", all matching mutations for that UniProt ID will be displayed. If they also provide a protein variant, the app will display only the matching mutation. To run this app, save the code into a file, say `app.R`, then from the R console (or RStudio), navigate to the directory containing both the SQLite database (`AlphaMissense.db`) and the app (`app.R`), and run: ```{r eval=FALSE} library(shiny) runApp(glue("{DATA}/app.R")) ``` ```{r eval=FALSE} install.packages("bigrquery") library(bigrquery) ``` ```{r eval=FALSE} runApp(glue("~/Downloads/alphamissense-query/app.R")) ``` ```{r eval=FALSE} library(shiny) runApp(glue("~/Downloads/alphamissense-query-hugo/app.R")) ``` The error suggests that the deployed Shiny app is not able to authenticate with Google BigQuery because it does not have the required credentials. When you run the Shiny app locally on your machine, it might be using your personal Google credentials. However, when the app is deployed on `shinyapps.io`, it does not have access to these credentials. To resolve this: 1. **Service Account**: - You'll need to create a service account in the Google Cloud Console that has access to the BigQuery dataset. - Download the JSON key for this service account. 2. **Include Service Account in Shiny App**: - Save the JSON key in the same directory as your Shiny app. - In your Shiny app, before making any BigQuery calls, authenticate using this service account. ```R library(bigrquery) # Authenticate using Service Account bq_auth(path = "path_to_service_account.json") ``` 3. **Deploy the Shiny App**: - Make sure you deploy both the Shiny app files (`app.R`, etc.) and the service account JSON key to `shinyapps.io`. 4. **Permissions**: - Ensure the service account has the necessary permissions on BigQuery to read the data. 5. **Billing**: - If your Google Cloud Project uses billing (as BigQuery is a paid service), make sure the service account has permissions to incur charges. 6. **Avoid Hardcoding Paths**: - Instead of hardcoding the path to the service account file, you might want to use a more dynamic approach, for instance: ```R # Authenticate using Service Account service_account_path <- Sys.getenv("SERVICE_ACCOUNT_PATH", "default_service_account.json") bq_auth(path = service_account_path) ``` You can then set the `SERVICE_ACCOUNT_PATH` environment variable appropriately, either locally or on `shinyapps.io`. After making these adjustments, try redeploying your Shiny app. ```{r eval=FALSE} library(biomaRt) # Choose the ENSEMBL mart and dataset for human genes ensembl <- useMart("ensembl", dataset="hsapiens_gene_ensembl") # Fetch the required attributes: HUGO name and UniProt ID mapping <- getBM(attributes=c('hgnc_symbol', 'uniprotswissprot'), mart=ensembl) # Filter out entries with empty UniProt IDs or HUGO names mapping <- mapping[mapping$hgnc_symbol != '' & mapping$uniprotswissprot != '', ] write.csv(mapping, "hugo_to_uniprot_mapping.csv", row.names=FALSE) ```