pertpy-GPU: ptg#
pertpy provides tools for perturbation analysis [HJM+25].
rapids_singlecell.ptg accelerates some of these methods.
Distance#
|
GPU-accelerated distance computation between groups of cells. |
- class rapids_singlecell.ptg.Distance(metric='edistance', layer_key=None, obsm_key=None, **kwargs)[source]
GPU-accelerated distance computation between groups of cells.
API compatible with pertpy’s Distance class.
Currently supported metrics:
"edistance": Energy distance (default).Twice the mean pairwise distance between cells of two groups minus the mean pairwise distance between cells within each group. See Peidli et al. (2023).
"euclidean"and"root_mean_squared_error": Euclidean distancebetween group mean vectors.
"mse": Mean squared distance between group mean vectors."mean_absolute_error": Mean absolute distance between group meanvectors.
"pearson_distance": Pearson distance between group mean vectors."cosine_distance": Cosine distance between group mean vectors."r2_distance": One minus the coefficient of determination betweengroup mean vectors.
"wasserstein": Entropy-regularized 2-Wasserstein via Sinkhorn.Squared-Euclidean ground cost; per-pair auto-epsilon defaulting to
0.05 * std(C)to match OTT-JAX. Returns OTT’sreg_ot_costvalue.
- Parameters:
- metric
Literal['edistance','euclidean','root_mean_squared_error','mse','mean_absolute_error','pearson_distance','cosine_distance','r2_distance','wasserstein'] (default:'edistance') Distance metric to use.
- layer_key
str|None(default:None) Key in adata.layers for cell data. Mutually exclusive with
obsm_key.- obsm_key
str|None(default:None) Key in adata.obsm for embeddings. Mutually exclusive with
layer_key. Defaults to"X_pca"if neither is specified.
- metric
Notes
The
edistancebootstrap implementation differs from pertpy: rather than precomputing an n×n cell distance matrix and sampling from it, this implementation resamples cells and recomputes distances from scratch each iteration. This scales better for large datasets (O(n) vs O(n²) memory) and leverages multi-GPU parallelism for each bootstrap iteration."edistance"and"wasserstein"use multi-GPU (pairs are split across devices). Pseudobulk metrics aggregate cells into K group-mean vectors before computing distances, and the resulting K×K kernel is cheap enough on a single GPU that distributing it is not worth the cost. Passingmulti_gpu=Truefor those metrics falls back to a single device with a warning.Examples
>>> import rapids_singlecell as rsc >>> distance = rsc.ptg.Distance(metric='edistance') >>> result = distance.pairwise(adata, groupby='perturbation')
>>> # Direct computation on arrays >>> d = distance(X, Y)
Methods
pairwise(adata, groupby, *[, groups, ...])Compute pairwise distances between all cell groups.
onesided_distances(adata, groupby, ...[, ...])Compute distances from one selected group to all other groups.
contrast_distances(adata, contrasts, *[, ...])Compute distances for contrasts.
create_contrasts(adata, groupby, ...[, ...])Build a contrasts DataFrame for use with
contrast_distances().bootstrap(X, Y, *[, n_bootstrap, random_state])Compute bootstrap mean and variance for distance between two arrays.
- __call__(X, Y)[source]
Compute distance between two cell groups directly from arrays.
This provides pertpy-compatible API for direct distance computation.
- Parameters:
- X np.ndarray | cp.ndarray
First array of shape (n_samples_x, n_features)
- Y np.ndarray | cp.ndarray
Second array of shape (n_samples_y, n_features)
- Return type:
float
- Returns:
float Distance between X and Y
Examples
>>> distance = Distance(metric='edistance') >>> X = adata.obsm["X_pca"][adata.obs["group"] == "A"] >>> Y = adata.obsm["X_pca"][adata.obs["group"] == "B"] >>> d = distance(X, Y)
- pairwise(adata, groupby, *, groups=None, bootstrap=False, n_bootstrap=100, random_state=0, multi_gpu=None)[source]
Compute pairwise distances between all cell groups.
- Parameters:
- adata
AnnData Annotated data matrix
- groupby
str Key in adata.obs for grouping cells
- groups
Sequence[str] |None(default:None) Specific groups to compute (if None, use all)
- bootstrap
bool(default:False) Whether to compute bootstrap variance estimates
- n_bootstrap
int(default:100) Number of bootstrap iterations (if bootstrap=True)
- random_state
int(default:0) Random seed for reproducibility
- multi_gpu
bool|list[int] |str|None(default:None) GPU selection: - None: Use all GPUs if metric supports it, else GPU 0 (default) - True: Use all available GPUs - False: Use only GPU 0 - list[int]: Use specific GPU IDs (e.g., [0, 2]) - str: Comma-separated GPU IDs (e.g., “0,2”)
- adata
- Returns:
result DataFrame with pairwise distances. If bootstrap=True, returns tuple of (distances, distances_var) DataFrames.
Examples
>>> distance = Distance(metric='edistance') >>> result = distance.pairwise(adata, groupby='condition')
- onesided_distances(adata, groupby, selected_group, *, groups=None, bootstrap=False, n_bootstrap=100, random_state=0, multi_gpu=None)[source]
Compute distances from one selected group to all other groups.
- Parameters:
- adata
AnnData Annotated data matrix
- groupby
str Key in adata.obs for grouping cells
- selected_group
Sequence[str] |str Reference group to compute distances from
- groups
Sequence[str] |None(default:None) Specific groups to compute distances to (if None, use all)
- bootstrap
bool(default:False) Whether to compute bootstrap variance estimates
- n_bootstrap
int(default:100) Number of bootstrap iterations (if bootstrap=True)
- random_state
int(default:0) Random seed for reproducibility
- multi_gpu
bool|list[int] |str|None(default:None) GPU selection: - None: Use all GPUs if metric supports it, else GPU 0 (default) - True: Use all available GPUs - False: Use only GPU 0 - list[int]: Use specific GPU IDs (e.g., [0, 2]) - str: Comma-separated GPU IDs (e.g., “0,2”)
- adata
- Return type:
Series|DataFrame|tuple[Series,Series] |tuple[DataFrame,DataFrame]- Returns:
distances Series containing distances from selected_group to all other groups. If bootstrap=True, returns tuple of (distances, distances_var).
Examples
>>> distance = Distance(metric='edistance') >>> distances = distance.onesided_distances( ... adata, groupby='condition', selected_group='control' ... )
- contrast_distances(adata, contrasts, *, multi_gpu=None)[source]
Compute distances for contrasts.
Accepts a DataFrame (from
create_contrasts()or constructed manually) with the following layout:First column: the groupby column (target values to compare)
``reference`` column: the control value in the groupby column
Other columns: split-by filters (e.g., cell type)
- Parameters:
- adata
AnnData Annotated data matrix
- contrasts
DataFrame DataFrame with a groupby column, a
referencecolumn, and optional split columns.- multi_gpu
bool|list[int] |str|None(default:None) GPU selection: - None: Use all GPUs if metric supports it, else GPU 0 (default) - True: Use all available GPUs - False: Use only GPU 0 - list[int]: Use specific GPU IDs (e.g., [0, 2]) - str: Comma-separated GPU IDs (e.g., “0,2”)
- adata
- Return type:
- Returns:
pd.DataFrame Copy of the input DataFrame with an added distance column.
Examples
>>> distance = Distance(metric='edistance')
>>> # Using create_contrasts helper >>> contrasts = Distance.create_contrasts( ... adata, groupby="target_gene", selected_group="Non_target", ... split_by="group_name", ... ) >>> result = distance.contrast_distances(adata, contrasts=contrasts)
>>> # Manual DataFrame construction >>> import pandas as pd >>> contrasts = pd.DataFrame({ ... "target_gene": ["Irf7", "Ski"], ... "reference": ["Non_target", "Non_target"], ... "group_name": ["CD4", "CD4"], ... }) >>> result = distance.contrast_distances(adata, contrasts)
- static create_contrasts(adata, groupby, selected_group, *, groups=None, split_by=None)[source]
Build a contrasts DataFrame for use with
contrast_distances().Each row represents one contrast: comparing a group against the reference, optionally within each level of
split_bycolumns. The resulting DataFrame can be filtered or modified before passing tocontrast_distances().The output layout is:
First column (
groupby): the target values to compare``reference`` column: the control value in the groupby column
Remaining columns (
split_by): stratification filters
- Parameters:
- adata
AnnData Annotated data matrix
- groupby
str Column in
adata.obswhose levels are compared againstselected_group- selected_group
str|Sequence[str] The reference (control) value(s) in the
groupbycolumn. When a sequence is passed, each target is compared against every reference, producing one row per (target, reference) combination.- groups
Sequence[str] |None(default:None) Specific groups to include. If None, all non-reference groups are included.
- split_by
str|Sequence[str] |None(default:None) Column(s) in
adata.obsto stratify by. If provided, contrasts are computed within each unique combination of these columns. Only combinations where the reference group exists are included.
- adata
- Return type:
- Returns:
pd.DataFrame One row per contrast. First column is
groupby, thenreference, then anysplit_bycolumns.
Examples
>>> # All targets vs control, ignoring celltype >>> contrasts = Distance.create_contrasts( ... adata, groupby="target_gene", selected_group="Non_target" ... )
>>> # Multiple references >>> contrasts = Distance.create_contrasts( ... adata, groupby="target_gene", ... selected_group=["Non_target", "Scramble"], ... )
>>> # Stratified by celltype >>> contrasts = Distance.create_contrasts( ... adata, groupby="target_gene", selected_group="Non_target", ... split_by="group_name", ... )
>>> # Filter before computing >>> contrasts = contrasts[contrasts["group_name"] != "rare_type"] >>> result = distance.contrast_distances(adata, contrasts=contrasts)
>>> # Manual construction (no helper needed) >>> import pandas as pd >>> contrasts = pd.DataFrame({ ... "target_gene": ["Irf7", "Ski"], ... "reference": ["Non_target", "Non_target"], ... "group_name": ["CD4", "CD4"], ... })
- bootstrap(X, Y, *, n_bootstrap=100, random_state=0)[source]
Compute bootstrap mean and variance for distance between two arrays.
This provides pertpy-compatible API for bootstrap computation directly on arrays without requiring an AnnData object.
- Parameters:
- X np.ndarray | cp.ndarray
First array of shape (n_samples_x, n_features)
- Y np.ndarray | cp.ndarray
Second array of shape (n_samples_y, n_features)
- n_bootstrap int (default:
100) Number of bootstrap iterations
- random_state int (default:
0) Random seed for reproducibility
- Return type:
MeanVar
- Returns:
result Named tuple containing mean and variance of bootstrapped distances
Examples
>>> distance = Distance(metric='edistance') >>> X = adata.obsm["X_pca"][adata.obs["group"] == "A"] >>> Y = adata.obsm["X_pca"][adata.obs["group"] == "B"] >>> result = distance.bootstrap(X, Y, n_bootstrap=100) >>> print(f"Distance: {result.mean:.3f} ± {result.variance**0.5:.3f}")
GuideAssignment#
GPU-accelerated guide RNA assignment. |
- class rapids_singlecell.ptg.GuideAssignment[source]
GPU-accelerated guide RNA assignment.
Provides threshold-based and mixture-model-based methods for assigning cells to guide RNAs, compatible with pertpy’s
GuideAssignmentAPI. The mixture model follows crispat’s Poisson-Gaussian assignment rule while using batched EM on GPU instead of per-guide Pyro SVI, yielding orders-of-magnitude speedup.Methods
assign_by_threshold(adata, *, ...[, layer, ...])Assign cells to gRNAs exceeding a count threshold.
assign_to_max_guide(adata, *, ...[, layer, ...])Assign each cell to its most expressed gRNA.
assign_mixture_model(adata, *[, layer, ...])Assign gRNAs using a GPU-accelerated Poisson–Gaussian mixture model.
- assign_by_threshold(adata, *, assignment_threshold, layer=None, output_layer='assigned_guides')[source]
Assign cells to gRNAs exceeding a count threshold.
Each cell is assigned to every gRNA with at least
assignment_thresholdcounts. Expects unnormalized count data.- Parameters:
- adata
AnnData Annotated data matrix of shape
n_obs x n_vars.- assignment_threshold
float Minimum count for a viable assignment.
- layer
str|None(default:None) Layer with raw counts. Uses
adata.XifNone.- output_layer
str(default:'assigned_guides') Key under which the binary assignment matrix is stored in
adata.layers.
- adata
- Return type:
- assign_to_max_guide(adata, *, assignment_threshold, layer=None, obs_key='assigned_guide', no_grna_assigned_key='Negative')[source]
Assign each cell to its most expressed gRNA.
Each cell is assigned to the gRNA with the highest count, provided that count is at least
assignment_threshold. Expects unnormalized count data.- Parameters:
- adata
AnnData Annotated data matrix of shape
n_obs x n_vars.- assignment_threshold
float Minimum count for a viable assignment.
- layer
str|None(default:None) Layer with raw counts. Uses
adata.XifNone.- obs_key
str(default:'assigned_guide') Column in
adata.obswhere the assignment is stored.- no_grna_assigned_key
str(default:'Negative') Label for cells with no guide above threshold.
- adata
- Return type:
- assign_mixture_model(adata, *, layer=None, assigned_guides_key='assigned_guide', no_grna_assigned_key='negative', max_assignments_per_cell=5, multiple_grna_assigned_key='multiple', multiple_grna_assignment_string='+', only_return_results=False, max_iter=90, tol=0.0001, posterior_threshold=0.5)[source]
Assign gRNAs using a GPU-accelerated Poisson–Gaussian mixture model.
Fits a two-component mixture (Poisson background + Gaussian signal) to the log₂-transformed non-zero counts of each guide simultaneously using batched Expectation-Maximization on GPU. Like crispat’s Poisson-Gaussian assignment, the fitted model is converted to an integer raw-count threshold. The default posterior cutoff matches pertpy’s crispat-style threshold rule.
- Parameters:
- adata
AnnData Annotated data matrix with guide RNA counts.
- layer
str|None(default:None) Layer with raw counts. Uses
adata.XifNone.- assigned_guides_key
str(default:'assigned_guide') Key in
adata.obsfor storing the assignment result.- no_grna_assigned_key
str(default:'negative') Label for cells negative for all gRNAs.
- max_assignments_per_cell
int(default:5) Maximum number of gRNAs a cell can be assigned to.
- multiple_grna_assigned_key
str(default:'multiple') Label for cells exceeding
max_assignments_per_cell.- multiple_grna_assignment_string
str(default:'+') Delimiter for joining multiple guide names.
- only_return_results
bool(default:False) If
True, return assignments without modifyingadata.- max_iter
int(default:90) Maximum number of EM iterations.
- tol
float(default:0.0001) Convergence tolerance on parameter changes.
- posterior_threshold
float(default:0.5) Minimum posterior probability of the Gaussian component required for a raw UMI count to define the assignment threshold.
- adata
- Return type:
- Returns:
If
only_return_resultsisTrue, returns an array of assignments. Otherwise modifiesadatain-place and returnsNone.