Pertpy-GPU#

Accelerated Perturbation Distance Analysis

Authors: Lukas Heumos, Severin Dicks Copyright scverse

Here, we explore GPU-accelerated perturbation distance computations using rapids-singlecell’s rsc.ptg module, which mirrors the API of pertpy’s Distance class.

By running these analyses on GPUs, we can scale to large perturbation screens (many groups, many cells) where pairwise distance computation would otherwise be a bottleneck. We use the E-distance (energy distance) to quantify how strongly each perturbation shifts the cell-state distribution relative to controls.

import rapids_singlecell as rsc
import anndata as ad
import pertpy as pt
import seaborn as sns
import matplotlib.pyplot as plt

Load Example Data#

We use the distance_example dataset of pertpy — a small, preprocessed subset of the Perturb-seq data from Dixit et al., 2016 — which contains a perturbation annotation in .obs and a PCA embedding in .obsm["X_pca"].

adata = pt.dt.distance_example()
adata
AnnData object with n_obs × n_vars = 3200 × 2000
    obs: 'perturbation', 'grna_lenient', 'target', 'moi', 'cell_line', 'celltype', 'perturbation_type', 'cancer', 'disease', 'guide_id', 'ncounts', 'ngenes', 'percent_mito', 'percent_ribo', 'nperts', 'n_counts'
    var: 'gene_id', 'mt', 'ribo', 'ncounts', 'ncells', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
    uns: 'hvg', 'log1p', 'pca'
    obsm: 'X_pca'
    varm: 'PCs'

Prepare for distance metrics#

Distance metrics are computed in PCA space to avoid the curse of dimensionality.

rsc.get.anndata_to_GPU(adata)
rsc.pp.pca(adata, n_comps=50)

Pairwise E-distances#

The Distance class computes pairwise distances between all groups defined by a column in .obs. By default it reads the embedding from .obsm["X_pca"].

%%time
distance = rsc.ptg.Distance(metric="edistance", obsm_key="X_pca")
df = distance.pairwise(adata, groupby="perturbation")
df.head()
CPU times: user 34.2 ms, sys: 88.9 ms, total: 123 ms
Wall time: 140 ms
perturbation control p-INTERGENIC216151 p-INTERGENIC393453 p-INTERGENIC393453_p-sgELF1-2 p-INTERGENIC1144056 p-INTERGENIC1216445 p-sgCREB1-2 p-sgCREB1-4 p-sgCREB1-5 p-sgE2F4-6 ... p-sgETS1-5 p-sgGABPA-1 p-sgGABPA-9 p-sgIRF1-2 p-sgIRF1-3 p-sgNR2C2-2 p-sgNR2C2-3 p-sgNR2C2-5 p-sgYY1-3 p-sgYY1-10
perturbation
control 0.000000 0.186348 0.221331 0.271880 0.262410 0.259739 0.252042 0.388346 0.255588 0.305329 ... 11.117786 11.345467 10.817892 10.989595 10.938345 10.762506 10.968601 11.135164 10.956791 11.002567
p-INTERGENIC216151 0.186348 0.000000 -0.004433 0.029946 -0.004575 -0.007889 0.033900 0.007107 -0.023688 -0.010982 ... 10.886147 11.144894 10.528135 10.766272 10.794701 10.597757 10.688713 10.957037 10.697678 10.740091
p-INTERGENIC393453 0.221331 -0.004433 0.000000 -0.011823 0.014314 0.090172 0.001603 0.047546 0.031596 -0.002492 ... 10.911249 11.148912 10.549551 10.722809 10.763332 10.516935 10.690411 10.937792 10.667976 10.665668
p-INTERGENIC393453_p-sgELF1-2 0.271880 0.029946 -0.011823 0.000000 0.037041 0.076026 0.072382 0.057895 0.060162 -0.025307 ... 10.918885 11.118722 10.542020 10.687470 10.792460 10.574527 10.676620 10.971744 10.721988 10.744924
p-INTERGENIC1144056 0.262410 -0.004575 0.014314 0.037041 0.000000 0.050689 0.090553 0.043807 0.030745 -0.004827 ... 10.845412 11.141158 10.521075 10.756488 10.758624 10.565110 10.643741 10.930161 10.650750 10.681153

5 rows × 32 columns

sns.heatmap(df, robust=True, cmap="viridis", xticklabels=True, yticklabels=True)
plt.title("Pairwise E-distances between perturbations")
plt.show()
../_images/27ea8b58def7c630eb8775d79284fe20f97dc5c845845ee05a1253f9cca88efd.png

Contrast Against a Baseline#

A common perturbation-screen question is “how strongly does each perturbation shift cells away from the unperturbed baseline?”. We answer this with Distance.create_contrasts (which builds a tidy contrasts table — one row per (target, reference) pair) and Distance.contrast_distances (which fills in the distance for each contrast). This is more flexible than the raw pairwise matrix: you can pass multiple references, restrict to a subset of targets, or stratify by another .obs column (e.g. cell type) via split_by.

%%time
contrasts = rsc.ptg.Distance.create_contrasts(
    adata, groupby="perturbation", selected_group="control"
)
result = distance.contrast_distances(adata, contrasts=contrasts)
result.sort_values("edistance", ascending=False).head(10)
CPU times: user 5.82 ms, sys: 0 ns, total: 5.82 ms
Wall time: 5.43 ms
perturbation reference edistance
6 p-sgCREB1-4 control 0.388346
8 p-sgE2F4-6 control 0.305329
14 p-sgELF1-2 control 0.284894
4 p-INTERGENIC393453_p-sgELF1-2 control 0.271880
0 p-INTERGENIC1144056 control 0.262410
1 p-INTERGENIC1216445 control 0.259739
13 p-sgELF1-1 control 0.256044
7 p-sgCREB1-5 control 0.255588
5 p-sgCREB1-2 control 0.252042
12 p-sgEGR1-4 control 0.231539

Bootstrap Variance Estimation#

Setting bootstrap=True returns both the distance estimates and a per-pair variance, computed by resampling cells. Unlike pertpy’s CPU implementation, the GPU version recomputes distances each iteration rather than precomputing an n×n cell-distance matrix, so memory scales linearly in the number of cells.

%%time
df_mean, df_var = distance.pairwise(
    adata, groupby="perturbation", bootstrap=True, n_bootstrap=50, random_state=0
)
df_var.head()
CPU times: user 198 ms, sys: 36 ms, total: 234 ms
Wall time: 233 ms
perturbation control p-INTERGENIC216151 p-INTERGENIC393453 p-INTERGENIC393453_p-sgELF1-2 p-INTERGENIC1144056 p-INTERGENIC1216445 p-sgCREB1-2 p-sgCREB1-4 p-sgCREB1-5 p-sgE2F4-6 ... p-sgETS1-5 p-sgGABPA-1 p-sgGABPA-9 p-sgIRF1-2 p-sgIRF1-3 p-sgNR2C2-2 p-sgNR2C2-3 p-sgNR2C2-5 p-sgYY1-3 p-sgYY1-10
perturbation
control 0.000000 0.615649 0.676248 0.707371 0.690252 0.755486 0.730110 0.638013 0.613150 0.629978 ... 0.492958 0.507647 0.545649 0.510374 0.527144 0.538761 0.512155 0.527787 0.539033 0.561692
p-INTERGENIC216151 0.615649 0.000000 0.620489 0.845119 0.677616 0.606445 0.695760 0.449387 0.503359 0.487891 ... 0.372297 0.412535 0.424212 0.457976 0.380719 0.561012 0.367475 0.583241 0.481966 0.469871
p-INTERGENIC393453 0.676248 0.620489 0.000000 0.729666 0.739730 0.572041 0.667218 0.475155 0.465109 0.439003 ... 0.378681 0.478793 0.450716 0.400178 0.476515 0.455279 0.392606 0.477264 0.418814 0.487899
p-INTERGENIC393453_p-sgELF1-2 0.707371 0.845119 0.729666 0.000000 0.940516 0.728477 0.796336 0.615655 0.631689 0.575075 ... 0.507098 0.495822 0.604118 0.634317 0.585327 0.621911 0.548687 0.698822 0.528945 0.545236
p-INTERGENIC1144056 0.690252 0.677616 0.739730 0.940516 0.000000 0.739009 0.698736 0.606369 0.644801 0.539035 ... 0.463750 0.559078 0.561818 0.606893 0.667035 0.612539 0.575785 0.593551 0.513972 0.595869

5 rows × 32 columns