How to use AlphaGenome to make predictions

how_to

Author

Temi

Published

July 3, 2025

Sources

There is a more comprehensive set of tutorials [here] (https://www.alphagenomedocs.com/colabs/quick_start.html) that cover many usage options.

Introduction

This notebook describes how you could make predictions with AlphaGenome. First, you need to obtain an API key which detailed here here.

Software environment

If you are not on a Colab notebook, you can install the required packages using the following command:

conda create -p ./alphagenome-env python=3.11
conda activate alphagenome-env
pip install alphagenome

There are other ways here

Making predictions

# import the neccesary libraries

from alphagenome.data import genome
from alphagenome.models import dna_client
from alphagenome.visualization import plot_components
import matplotlib.pyplot as plt

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

/Users/temidayoadeluwa/Work/software/conda_envs/alphagenome-env/lib/python3.11/site-packages/google/protobuf/runtime_version.py:98: UserWarning: Protobuf gencode version 5.27.2 is exactly one major version older than the runtime version 6.31.1 at alphagenome/protos/dna_model.proto. Please update the gencode to avoid compatibility violations in the next runtime release.
  warnings.warn(
/Users/temidayoadeluwa/Work/software/conda_envs/alphagenome-env/lib/python3.11/site-packages/google/protobuf/runtime_version.py:98: UserWarning: Protobuf gencode version 5.27.2 is exactly one major version older than the runtime version 6.31.1 at alphagenome/protos/tensor.proto. Please update the gencode to avoid compatibility violations in the next runtime release.
  warnings.warn(
/Users/temidayoadeluwa/Work/software/conda_envs/alphagenome-env/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

# use your api key here to get the model
API_KEY = '<< insert your API key here' #AIzaSyB1t8L2Pt1ifgRB0pw8Iu_F41TLM50ohvE'
# get the model
alphagenome_model = dna_client.create(API_KEY)

Make predictions on a DNA sequence or list of DNA sequences

# alphagenome typically takes in a sequence of DNA of length  ~1mb
# you can get the exact number by runing
print(f"AlphaGenome typically takes in a sequence length of {dna_client.SEQUENCE_LENGTH_1MB}")

# however, the model can also predict on sequences shorter than specifically [2048, 16384, 131072, 524288]

AlphaGenome typically takes in a sequence length of 1048576

# you can use this code to create a random sequence of DNA
def create_random_dna_sequence(length):
    out = np.random.choice(['A', 'T', 'C', 'G'], size=length)
    return ''.join(out)
# e.g. 
create_random_dna_sequence(10)

'GCCTAGTCAC'

# first create the seqence you want to predict on
# ideally you would want to obtain this from say the reference genome

input_sequence = create_random_dna_sequence(dna_client.SEQUENCE_LENGTH_1MB)
print(f"Input sequence length: {len(input_sequence)}")

Input sequence length: 1048576

# you can use this to check the supported sequence lengths
dna_client.SUPPORTED_SEQUENCE_LENGTHS

{'SEQUENCE_LENGTH_2KB': 2048,
 'SEQUENCE_LENGTH_16KB': 16384,
 'SEQUENCE_LENGTH_100KB': 131072,
 'SEQUENCE_LENGTH_500KB': 524288,
 'SEQUENCE_LENGTH_1MB': 1048576}

# now you can make predictions on the sequence
# the important arguments are
# 1. the sequence you want to predict on
# 2. the requested outputs you want to obtain
# 
output = alphagenome_model.predict_sequence(
    sequence = input_sequence,
    requested_outputs=[dna_client.OutputType.RNA_SEQ,
                           dna_client.OutputType.CHIP_TF,
                           dna_client.OutputType.ATAC,
                           dna_client.OutputType.CAGE,
                           dna_client.OutputType.CHIP_HISTONE,
                           dna_client.OutputType.CONTACT_MAPS,
                           dna_client.OutputType.DNASE,
                           dna_client.OutputType.PROCAP],
     ontology_terms=[] # this can be an empty list                      
    )

You can find the list of available outputs and ontologies here

If you wanted to make predictions for a list of DNA sequences, you can use the following code:

alphagenome_model.predict_sequences(sequences = [list of sequences], ...)

What does the output look like?

It is an alphagenome.models.dna_output.Output object.

It contains the following attributes:

# you can access the following attributes of the output object
# some of them are functions that may map to the entire object e.g. to filter or calulate statistics
# some lead you to the actual data you want to visualize or analyze i.e. the profiles you requested predictions for
[f for f in dir(output) if not f.startswith('__')]

['atac',
 'cage',
 'chip_histone',
 'chip_tf',
 'contact_maps',
 'dnase',
 'filter_ontology_terms',
 'filter_output_type',
 'filter_to_strand',
 'get',
 'map_track_data',
 'procap',
 'resize',
 'rna_seq',
 'splice_junctions',
 'splice_site_usage',
 'splice_sites']

# we can explore the atac track or profiles
# this is now a TrackData object
atac_profiles = output.atac

type(atac_profiles)

alphagenome.data.track_data.TrackData

# and it contains the following attributes
[f for f in dir(atac_profiles) if not f.startswith('_')]

['bin_index',
 'change_resolution',
 'copy',
 'downsample',
 'filter_to_negative_strand',
 'filter_to_nonnegative_strand',
 'filter_to_nonpositive_strand',
 'filter_to_positive_strand',
 'filter_to_stranded',
 'filter_to_unstranded',
 'filter_tracks',
 'groupby',
 'interval',
 'metadata',
 'names',
 'num_tracks',
 'ontology_terms',
 'pad',
 'positional_axes',
 'resize',
 'resolution',
 'reverse_complement',
 'select_tracks_by_index',
 'select_tracks_by_name',
 'slice_by_interval',
 'slice_by_positions',
 'strands',
 'to_protos',
 'uns',
 'upsample',
 'values',
 'width']

# you can extract the predictions directly
atac_profiles.values, atac_profiles.values.shape

(array([[0.00230408, 0.00120544, 0.00154877, ..., 0.00744629, 0.00442505,
         0.00524902],
        [0.0072937 , 0.00402832, 0.00491333, ..., 0.0234375 , 0.01330566,
         0.01556396],
        [0.00127411, 0.00074768, 0.0008812 , ..., 0.00396729, 0.00260925,
         0.00299072],
        ...,
        [0.00105286, 0.00048447, 0.00064468, ..., 0.00338745, 0.00238037,
         0.00254822],
        [0.00077057, 0.00037575, 0.00053406, ..., 0.00263977, 0.00195312,
         0.00204468],
        [0.00112152, 0.00061798, 0.0008316 , ..., 0.00338745, 0.00260925,
         0.00271606]], shape=(1048576, 167), dtype=float32),
 (1048576, 167))

There are 167 ATAC-seq profiles, and one values for each of the 1048576 input nucleotides.

Plotting

The AlphaGenome team provided tools to make visualization easy.

To visualize, we can predict for a small interval of the genome

small_interval = genome.Interval('chr22', 36_150_498, 36_252_898).resize(
    dna_client.SEQUENCE_LENGTH_2KB
)
small_interval

Interval(chromosome='chr22', start=36200674, end=36202722, strand='.', name='')

# we will request for just the RNA_SEQ track
# note that we use a different function here: predict_interval

predictions_small_interval = alphagenome_model.predict_interval(
    interval = small_interval,
    requested_outputs=[dna_client.OutputType.RNA_SEQ],
     ontology_terms=[]                      
)

rna_seq_predictions = predictions_small_interval.rna_seq
type(rna_seq_predictions)

alphagenome.data.track_data.TrackData

# how many tracks are there
rna_seq_predictions.values.shape

(2048, 667)

The AlphaGenome supplied helper codes to visualize predictions, but it will not be easy to plot 667 track. Instead, we can subset to a manageable number of tracks. Let’s use the plot the first 6 tracks.

# the argument here is a set of conditions of what tracks to return, here the first 6 tracks are True and the rest are False

filtered_rna_seq_predictions = rna_seq_predictions.filter_tracks(
    [True, True, True, True, True, True] + [False] * (rna_seq_predictions.values.shape[1] - 6)
)
type(filtered_rna_seq_predictions)

alphagenome.data.track_data.TrackData

# then we can visualize these tracks
plot_components.plot(
    [
        plot_components.Tracks(
            tdata=filtered_rna_seq_predictions
        )
    ],
    interval = filtered_rna_seq_predictions.interval
)
plt.show()