pertpy-GPU: ptg

pertpy-GPU: `ptg`#

pertpy provides tools for perturbation analysis [HJM+25]. rapids_singlecell.ptg accelerates some of these methods.

Distance#

Distance([metric, layer_key, obsm_key])

GPU-accelerated distance computation between groups of cells.

class rapids_singlecell.ptg.Distance(metric='edistance', layer_key=None, obsm_key=None, **kwargs)[source]

GPU-accelerated distance computation between groups of cells.

API compatible with pertpy’s Distance class.

Currently supported metrics:

"edistance": Energy distance (default).
Twice the mean pairwise distance between cells of two groups minus the mean pairwise distance between cells within each group. See Peidli et al. (2023).
"euclidean" and "root_mean_squared_error": Euclidean distance
between group mean vectors.
"mse": Mean squared distance between group mean vectors.
"mean_absolute_error": Mean absolute distance between group mean
vectors.
"pearson_distance": Pearson distance between group mean vectors.
"cosine_distance": Cosine distance between group mean vectors.
"r2_distance": One minus the coefficient of determination between
group mean vectors.
"wasserstein": Entropy-regularized 2-Wasserstein via Sinkhorn.
Squared-Euclidean ground cost; per-pair auto-epsilon defaulting to 0.05 * std(C) to match OTT-JAX. Returns OTT’s reg_ot_cost value.

Parameters:

metric Literal['edistance', 'euclidean', 'root_mean_squared_error', 'mse', 'mean_absolute_error', 'pearson_distance', 'cosine_distance', 'r2_distance', 'wasserstein'] (default: 'edistance'): Distance metric to use.
layer_key str | None (default: None): Key in adata.layers for cell data. Mutually exclusive with obsm_key.
obsm_key str | None (default: None): Key in adata.obsm for embeddings. Mutually exclusive with layer_key. Defaults to "X_pca" if neither is specified.

Notes

The edistance bootstrap implementation differs from pertpy: rather than precomputing an n×n cell distance matrix and sampling from it, this implementation resamples cells and recomputes distances from scratch each iteration. This scales better for large datasets (O(n) vs O(n²) memory) and leverages multi-GPU parallelism for each bootstrap iteration.

"edistance" and "wasserstein" use multi-GPU (pairs are split across devices). Pseudobulk metrics aggregate cells into K group-mean vectors before computing distances, and the resulting K×K kernel is cheap enough on a single GPU that distributing it is not worth the cost. Passing multi_gpu=True for those metrics falls back to a single device with a warning.

Examples

>>> import rapids_singlecell as rsc
>>> distance = rsc.ptg.Distance(metric='edistance')
>>> result = distance.pairwise(adata, groupby='perturbation')

>>> # Direct computation on arrays
>>> d = distance(X, Y)

Methods

`pairwise`(adata, groupby, *[, groups, ...])	Compute pairwise distances between all cell groups.
`onesided_distances`(adata, groupby, ...[, ...])	Compute distances from one selected group to all other groups.
`contrast_distances`(adata, contrasts, *[, ...])	Compute distances for contrasts.
`create_contrasts`(adata, groupby, ...[, ...])	Build a contrasts DataFrame for use with `contrast_distances()`.
`bootstrap`(X, Y, *[, n_bootstrap, random_state])	Compute bootstrap mean and variance for distance between two arrays.

__call__(X, Y)[source]

Compute distance between two cell groups directly from arrays.

This provides pertpy-compatible API for direct distance computation.

Parameters:

X np.ndarray | cp.ndarray: First array of shape (n_samples_x, n_features)
Y np.ndarray | cp.ndarray: Second array of shape (n_samples_y, n_features)

Return type:

float

Returns:

float Distance between X and Y

Examples

>>> distance = Distance(metric='edistance')
>>> X = adata.obsm["X_pca"][adata.obs["group"] == "A"]
>>> Y = adata.obsm["X_pca"][adata.obs["group"] == "B"]
>>> d = distance(X, Y)

pairwise(adata, groupby, *, groups=None, bootstrap=False, n_bootstrap=100, random_state=0, multi_gpu=None)[source]

Compute pairwise distances between all cell groups.

Parameters:

adata AnnData: Annotated data matrix
groupby str: Key in adata.obs for grouping cells
groups Sequence[str] | None (default: None): Specific groups to compute (if None, use all)
bootstrap bool (default: False): Whether to compute bootstrap variance estimates
n_bootstrap int (default: 100): Number of bootstrap iterations (if bootstrap=True)
random_state int (default: 0): Random seed for reproducibility
multi_gpu bool | list[int] | str | None (default: None): GPU selection: - None: Use all GPUs if metric supports it, else GPU 0 (default) - True: Use all available GPUs - False: Use only GPU 0 - list[int]: Use specific GPU IDs (e.g., [0, 2]) - str: Comma-separated GPU IDs (e.g., “0,2”)

Returns:

result DataFrame with pairwise distances. If bootstrap=True, returns tuple of (distances, distances_var) DataFrames.

Examples

>>> distance = Distance(metric='edistance')
>>> result = distance.pairwise(adata, groupby='condition')

onesided_distances(adata, groupby, selected_group, *, groups=None, bootstrap=False, n_bootstrap=100, random_state=0, multi_gpu=None)[source]

Compute distances from one selected group to all other groups.

Parameters:

adata AnnData: Annotated data matrix
groupby str: Key in adata.obs for grouping cells
selected_group Sequence[str] | str: Reference group to compute distances from
groups Sequence[str] | None (default: None): Specific groups to compute distances to (if None, use all)
bootstrap bool (default: False): Whether to compute bootstrap variance estimates
n_bootstrap int (default: 100): Number of bootstrap iterations (if bootstrap=True)
random_state int (default: 0): Random seed for reproducibility
multi_gpu bool | list[int] | str | None (default: None): GPU selection: - None: Use all GPUs if metric supports it, else GPU 0 (default) - True: Use all available GPUs - False: Use only GPU 0 - list[int]: Use specific GPU IDs (e.g., [0, 2]) - str: Comma-separated GPU IDs (e.g., “0,2”)

Return type:

Series | DataFrame | tuple[Series, Series] | tuple[DataFrame, DataFrame]

Returns:

distances Series containing distances from selected_group to all other groups. If bootstrap=True, returns tuple of (distances, distances_var).

Examples

>>> distance = Distance(metric='edistance')
>>> distances = distance.onesided_distances(
...     adata, groupby='condition', selected_group='control'
... )

contrast_distances(adata, contrasts, *, multi_gpu=None)[source]

Compute distances for contrasts.

Accepts a DataFrame (from create_contrasts() or constructed manually) with the following layout:

First column: the groupby column (target values to compare)
``reference`` column: the control value in the groupby column
Other columns: split-by filters (e.g., cell type)

Parameters:

adata AnnData: Annotated data matrix
contrasts DataFrame: DataFrame with a groupby column, a reference column, and optional split columns.
multi_gpu bool | list[int] | str | None (default: None): GPU selection: - None: Use all GPUs if metric supports it, else GPU 0 (default) - True: Use all available GPUs - False: Use only GPU 0 - list[int]: Use specific GPU IDs (e.g., [0, 2]) - str: Comma-separated GPU IDs (e.g., “0,2”)

Return type:

DataFrame

Returns:

pd.DataFrame Copy of the input DataFrame with an added distance column.

Examples

>>> distance = Distance(metric='edistance')

>>> # Using create_contrasts helper
>>> contrasts = Distance.create_contrasts(
...     adata, groupby="target_gene", selected_group="Non_target",
...     split_by="group_name",
... )
>>> result = distance.contrast_distances(adata, contrasts=contrasts)

>>> # Manual DataFrame construction
>>> import pandas as pd
>>> contrasts = pd.DataFrame({
...     "target_gene": ["Irf7", "Ski"],
...     "reference": ["Non_target", "Non_target"],
...     "group_name": ["CD4", "CD4"],
... })
>>> result = distance.contrast_distances(adata, contrasts)

static create_contrasts(adata, groupby, selected_group, *, groups=None, split_by=None)[source]

Build a contrasts DataFrame for use with contrast_distances().

Each row represents one contrast: comparing a group against the reference, optionally within each level of split_by columns. The resulting DataFrame can be filtered or modified before passing to contrast_distances().

The output layout is:

First column (groupby): the target values to compare
``reference`` column: the control value in the groupby column
Remaining columns (split_by): stratification filters

Parameters:

adata AnnData: Annotated data matrix
groupby str: Column in adata.obs whose levels are compared against selected_group
selected_group str | Sequence[str]: The reference (control) value(s) in the groupby column. When a sequence is passed, each target is compared against every reference, producing one row per (target, reference) combination.
groups Sequence[str] | None (default: None): Specific groups to include. If None, all non-reference groups are included.
split_by str | Sequence[str] | None (default: None): Column(s) in adata.obs to stratify by. If provided, contrasts are computed within each unique combination of these columns. Only combinations where the reference group exists are included.

Return type:

DataFrame

Returns:

pd.DataFrame One row per contrast. First column is groupby, then reference, then any split_by columns.

Examples

>>> # All targets vs control, ignoring celltype
>>> contrasts = Distance.create_contrasts(
...     adata, groupby="target_gene", selected_group="Non_target"
... )

>>> # Multiple references
>>> contrasts = Distance.create_contrasts(
...     adata, groupby="target_gene",
...     selected_group=["Non_target", "Scramble"],
... )

>>> # Stratified by celltype
>>> contrasts = Distance.create_contrasts(
...     adata, groupby="target_gene", selected_group="Non_target",
...     split_by="group_name",
... )

>>> # Filter before computing
>>> contrasts = contrasts[contrasts["group_name"] != "rare_type"]
>>> result = distance.contrast_distances(adata, contrasts=contrasts)

>>> # Manual construction (no helper needed)
>>> import pandas as pd
>>> contrasts = pd.DataFrame({
...     "target_gene": ["Irf7", "Ski"],
...     "reference": ["Non_target", "Non_target"],
...     "group_name": ["CD4", "CD4"],
... })

bootstrap(X, Y, *, n_bootstrap=100, random_state=0)[source]

Compute bootstrap mean and variance for distance between two arrays.

This provides pertpy-compatible API for bootstrap computation directly on arrays without requiring an AnnData object.

Parameters:

X np.ndarray | cp.ndarray: First array of shape (n_samples_x, n_features)
Y np.ndarray | cp.ndarray: Second array of shape (n_samples_y, n_features)
n_bootstrap int (default: 100): Number of bootstrap iterations
random_state int (default: 0): Random seed for reproducibility

Return type:

MeanVar

Returns:

result Named tuple containing mean and variance of bootstrapped distances

Examples

>>> distance = Distance(metric='edistance')
>>> X = adata.obsm["X_pca"][adata.obs["group"] == "A"]
>>> Y = adata.obsm["X_pca"][adata.obs["group"] == "B"]
>>> result = distance.bootstrap(X, Y, n_bootstrap=100)
>>> print(f"Distance: {result.mean:.3f} ± {result.variance**0.5:.3f}")

GuideAssignment#

GuideAssignment()

GPU-accelerated guide RNA assignment.

class rapids_singlecell.ptg.GuideAssignment[source]

GPU-accelerated guide RNA assignment.

Provides threshold-based and mixture-model-based methods for assigning cells to guide RNAs, compatible with pertpy’s GuideAssignment API. The mixture model follows crispat’s Poisson-Gaussian assignment rule while using batched EM on GPU instead of per-guide Pyro SVI, yielding orders-of-magnitude speedup.

Methods

`assign_by_threshold`(adata, *, ...[, layer, ...])	Assign cells to gRNAs exceeding a count threshold.
`assign_to_max_guide`(adata, *, ...[, layer, ...])	Assign each cell to its most expressed gRNA.
`assign_mixture_model`(adata, *[, layer, ...])	Assign gRNAs using a GPU-accelerated Poisson–Gaussian mixture model.

assign_by_threshold(adata, *, assignment_threshold, layer=None, output_layer='assigned_guides')[source]

Assign cells to gRNAs exceeding a count threshold.

Each cell is assigned to every gRNA with at least assignment_threshold counts. Expects unnormalized count data.

Parameters:

adata AnnData: Annotated data matrix of shape n_obs x n_vars.
assignment_threshold float: Minimum count for a viable assignment.
layer str | None (default: None): Layer with raw counts. Uses adata.X if None.
output_layer str (default: 'assigned_guides'): Key under which the binary assignment matrix is stored in adata.layers.

Return type:

None

assign_to_max_guide(adata, *, assignment_threshold, layer=None, obs_key='assigned_guide', no_grna_assigned_key='Negative')[source]

Assign each cell to its most expressed gRNA.

Each cell is assigned to the gRNA with the highest count, provided that count is at least assignment_threshold. Expects unnormalized count data.

Parameters:

adata AnnData: Annotated data matrix of shape n_obs x n_vars.
assignment_threshold float: Minimum count for a viable assignment.
layer str | None (default: None): Layer with raw counts. Uses adata.X if None.
obs_key str (default: 'assigned_guide'): Column in adata.obs where the assignment is stored.
no_grna_assigned_key str (default: 'Negative'): Label for cells with no guide above threshold.

Return type:

None

assign_mixture_model(adata, *, layer=None, assigned_guides_key='assigned_guide', no_grna_assigned_key='negative', max_assignments_per_cell=5, multiple_grna_assigned_key='multiple', multiple_grna_assignment_string='+', only_return_results=False, max_iter=90, tol=0.0001, posterior_threshold=0.5)[source]

Assign gRNAs using a GPU-accelerated Poisson–Gaussian mixture model.

Fits a two-component mixture (Poisson background + Gaussian signal) to the log₂-transformed non-zero counts of each guide simultaneously using batched Expectation-Maximization on GPU. Like crispat’s Poisson-Gaussian assignment, the fitted model is converted to an integer raw-count threshold. The default posterior cutoff matches pertpy’s crispat-style threshold rule.

Parameters:

adata AnnData: Annotated data matrix with guide RNA counts.
layer str | None (default: None): Layer with raw counts. Uses adata.X if None.
assigned_guides_key str (default: 'assigned_guide'): Key in adata.obs for storing the assignment result.
no_grna_assigned_key str (default: 'negative'): Label for cells negative for all gRNAs.
max_assignments_per_cell int (default: 5): Maximum number of gRNAs a cell can be assigned to.
multiple_grna_assigned_key str (default: 'multiple'): Label for cells exceeding max_assignments_per_cell.
multiple_grna_assignment_string str (default: '+'): Delimiter for joining multiple guide names.
only_return_results bool (default: False): If True, return assignments without modifying adata.
max_iter int (default: 90): Maximum number of EM iterations.
tol float (default: 0.0001): Convergence tolerance on parameter changes.
posterior_threshold float (default: 0.5): Minimum posterior probability of the Gaussian component required for a raw UMI count to define the assignment threshold.

Return type:

ndarray | None

Returns:

If only_return_results is True, returns an array of assignments. Otherwise modifies adata in-place and returns None.

pertpy-GPU: ptg

Contents

pertpy-GPU: ptg#

Distance#

GuideAssignment#

pertpy-GPU: `ptg`#