rapids_singlecell.ptg.Distance#

class rapids_singlecell.ptg.Distance(metric='edistance', layer_key=None, obsm_key=None, **kwargs)[source]#

GPU-accelerated distance computation between groups of cells.

API compatible with pertpy’s Distance class.

Currently supported metrics:

  • "edistance": Energy distance (default).

    Twice the mean pairwise distance between cells of two groups minus the mean pairwise distance between cells within each group. See Peidli et al. (2023).

  • "euclidean" and "root_mean_squared_error": Euclidean distance

    between group mean vectors.

  • "mse": Mean squared distance between group mean vectors.

  • "mean_absolute_error": Mean absolute distance between group mean

    vectors.

  • "pearson_distance": Pearson distance between group mean vectors.

  • "cosine_distance": Cosine distance between group mean vectors.

  • "r2_distance": One minus the coefficient of determination between

    group mean vectors.

  • "wasserstein": Entropy-regularized 2-Wasserstein via Sinkhorn.

    Squared-Euclidean ground cost; per-pair auto-epsilon defaulting to 0.05 * std(C) to match OTT-JAX. Returns OTT’s reg_ot_cost value.

Parameters:
metric Literal['edistance', 'euclidean', 'root_mean_squared_error', 'mse', 'mean_absolute_error', 'pearson_distance', 'cosine_distance', 'r2_distance', 'wasserstein'] (default: 'edistance')

Distance metric to use.

layer_key str | None (default: None)

Key in adata.layers for cell data. Mutually exclusive with obsm_key.

obsm_key str | None (default: None)

Key in adata.obsm for embeddings. Mutually exclusive with layer_key. Defaults to "X_pca" if neither is specified.

Notes

The edistance bootstrap implementation differs from pertpy: rather than precomputing an n×n cell distance matrix and sampling from it, this implementation resamples cells and recomputes distances from scratch each iteration. This scales better for large datasets (O(n) vs O(n²) memory) and leverages multi-GPU parallelism for each bootstrap iteration.

"edistance" and "wasserstein" use multi-GPU (pairs are split across devices). Pseudobulk metrics aggregate cells into K group-mean vectors before computing distances, and the resulting K×K kernel is cheap enough on a single GPU that distributing it is not worth the cost. Passing multi_gpu=True for those metrics falls back to a single device with a warning.

Examples

>>> import rapids_singlecell as rsc
>>> distance = rsc.ptg.Distance(metric='edistance')
>>> result = distance.pairwise(adata, groupby='perturbation')
>>> # Direct computation on arrays
>>> d = distance(X, Y)

Methods table#

bootstrap(X, Y, *[, n_bootstrap, random_state])

Compute bootstrap mean and variance for distance between two arrays.

contrast_distances(adata, contrasts, *[, ...])

Compute distances for contrasts.

create_contrasts(adata, groupby, ...[, ...])

Build a contrasts DataFrame for use with contrast_distances().

onesided_distances(adata, groupby, ...[, ...])

Compute distances from one selected group to all other groups.

pairwise(adata, groupby, *[, groups, ...])

Compute pairwise distances between all cell groups.

validate_contrasts(adata, contrasts)

Validate a contrasts DataFrame against an AnnData object.

Methods#

bootstrap#

Distance.bootstrap(X, Y, *, n_bootstrap=100, random_state=0)[source]#

Compute bootstrap mean and variance for distance between two arrays.

This provides pertpy-compatible API for bootstrap computation directly on arrays without requiring an AnnData object.

Parameters:
X np.ndarray | cp.ndarray

First array of shape (n_samples_x, n_features)

Y np.ndarray | cp.ndarray

Second array of shape (n_samples_y, n_features)

n_bootstrap int (default: 100)

Number of bootstrap iterations

random_state int (default: 0)

Random seed for reproducibility

Return type:

MeanVar

Returns:

result Named tuple containing mean and variance of bootstrapped distances

Examples

>>> distance = Distance(metric='edistance')
>>> X = adata.obsm["X_pca"][adata.obs["group"] == "A"]
>>> Y = adata.obsm["X_pca"][adata.obs["group"] == "B"]
>>> result = distance.bootstrap(X, Y, n_bootstrap=100)
>>> print(f"Distance: {result.mean:.3f} ± {result.variance**0.5:.3f}")

contrast_distances#

Distance.contrast_distances(adata, contrasts, *, multi_gpu=None)[source]#

Compute distances for contrasts.

Accepts a DataFrame (from create_contrasts() or constructed manually) with the following layout:

  • First column: the groupby column (target values to compare)

  • ``reference`` column: the control value in the groupby column

  • Other columns: split-by filters (e.g., cell type)

Parameters:
adata AnnData

Annotated data matrix

contrasts DataFrame

DataFrame with a groupby column, a reference column, and optional split columns.

multi_gpu bool | list[int] | str | None (default: None)

GPU selection: - None: Use all GPUs if metric supports it, else GPU 0 (default) - True: Use all available GPUs - False: Use only GPU 0 - list[int]: Use specific GPU IDs (e.g., [0, 2]) - str: Comma-separated GPU IDs (e.g., “0,2”)

Return type:

DataFrame

Returns:

pd.DataFrame Copy of the input DataFrame with an added distance column.

Examples

>>> distance = Distance(metric='edistance')
>>> # Using create_contrasts helper
>>> contrasts = Distance.create_contrasts(
...     adata, groupby="target_gene", selected_group="Non_target",
...     split_by="group_name",
... )
>>> result = distance.contrast_distances(adata, contrasts=contrasts)
>>> # Manual DataFrame construction
>>> import pandas as pd
>>> contrasts = pd.DataFrame({
...     "target_gene": ["Irf7", "Ski"],
...     "reference": ["Non_target", "Non_target"],
...     "group_name": ["CD4", "CD4"],
... })
>>> result = distance.contrast_distances(adata, contrasts)

create_contrasts#

static Distance.create_contrasts(adata, groupby, selected_group, *, groups=None, split_by=None)[source]#

Build a contrasts DataFrame for use with contrast_distances().

Each row represents one contrast: comparing a group against the reference, optionally within each level of split_by columns. The resulting DataFrame can be filtered or modified before passing to contrast_distances().

The output layout is:

  • First column (groupby): the target values to compare

  • ``reference`` column: the control value in the groupby column

  • Remaining columns (split_by): stratification filters

Parameters:
adata AnnData

Annotated data matrix

groupby str

Column in adata.obs whose levels are compared against selected_group

selected_group str | Sequence[str]

The reference (control) value(s) in the groupby column. When a sequence is passed, each target is compared against every reference, producing one row per (target, reference) combination.

groups Sequence[str] | None (default: None)

Specific groups to include. If None, all non-reference groups are included.

split_by str | Sequence[str] | None (default: None)

Column(s) in adata.obs to stratify by. If provided, contrasts are computed within each unique combination of these columns. Only combinations where the reference group exists are included.

Return type:

DataFrame

Returns:

pd.DataFrame One row per contrast. First column is groupby, then reference, then any split_by columns.

Examples

>>> # All targets vs control, ignoring celltype
>>> contrasts = Distance.create_contrasts(
...     adata, groupby="target_gene", selected_group="Non_target"
... )
>>> # Multiple references
>>> contrasts = Distance.create_contrasts(
...     adata, groupby="target_gene",
...     selected_group=["Non_target", "Scramble"],
... )
>>> # Stratified by celltype
>>> contrasts = Distance.create_contrasts(
...     adata, groupby="target_gene", selected_group="Non_target",
...     split_by="group_name",
... )
>>> # Filter before computing
>>> contrasts = contrasts[contrasts["group_name"] != "rare_type"]
>>> result = distance.contrast_distances(adata, contrasts=contrasts)
>>> # Manual construction (no helper needed)
>>> import pandas as pd
>>> contrasts = pd.DataFrame({
...     "target_gene": ["Irf7", "Ski"],
...     "reference": ["Non_target", "Non_target"],
...     "group_name": ["CD4", "CD4"],
... })

onesided_distances#

Distance.onesided_distances(adata, groupby, selected_group, *, groups=None, bootstrap=False, n_bootstrap=100, random_state=0, multi_gpu=None)[source]#

Compute distances from one selected group to all other groups.

Parameters:
adata AnnData

Annotated data matrix

groupby str

Key in adata.obs for grouping cells

selected_group Sequence[str] | str

Reference group to compute distances from

groups Sequence[str] | None (default: None)

Specific groups to compute distances to (if None, use all)

bootstrap bool (default: False)

Whether to compute bootstrap variance estimates

n_bootstrap int (default: 100)

Number of bootstrap iterations (if bootstrap=True)

random_state int (default: 0)

Random seed for reproducibility

multi_gpu bool | list[int] | str | None (default: None)

GPU selection: - None: Use all GPUs if metric supports it, else GPU 0 (default) - True: Use all available GPUs - False: Use only GPU 0 - list[int]: Use specific GPU IDs (e.g., [0, 2]) - str: Comma-separated GPU IDs (e.g., “0,2”)

Return type:

Series | DataFrame | tuple[Series, Series] | tuple[DataFrame, DataFrame]

Returns:

distances Series containing distances from selected_group to all other groups. If bootstrap=True, returns tuple of (distances, distances_var).

Examples

>>> distance = Distance(metric='edistance')
>>> distances = distance.onesided_distances(
...     adata, groupby='condition', selected_group='control'
... )

pairwise#

Distance.pairwise(adata, groupby, *, groups=None, bootstrap=False, n_bootstrap=100, random_state=0, multi_gpu=None)[source]#

Compute pairwise distances between all cell groups.

Parameters:
adata AnnData

Annotated data matrix

groupby str

Key in adata.obs for grouping cells

groups Sequence[str] | None (default: None)

Specific groups to compute (if None, use all)

bootstrap bool (default: False)

Whether to compute bootstrap variance estimates

n_bootstrap int (default: 100)

Number of bootstrap iterations (if bootstrap=True)

random_state int (default: 0)

Random seed for reproducibility

multi_gpu bool | list[int] | str | None (default: None)

GPU selection: - None: Use all GPUs if metric supports it, else GPU 0 (default) - True: Use all available GPUs - False: Use only GPU 0 - list[int]: Use specific GPU IDs (e.g., [0, 2]) - str: Comma-separated GPU IDs (e.g., “0,2”)

Returns:

result DataFrame with pairwise distances. If bootstrap=True, returns tuple of (distances, distances_var) DataFrames.

Examples

>>> distance = Distance(metric='edistance')
>>> result = distance.pairwise(adata, groupby='condition')

validate_contrasts#

static Distance.validate_contrasts(adata, contrasts)[source]#

Validate a contrasts DataFrame against an AnnData object.

Expects the DataFrame layout produced by create_contrasts(): first column is the groupby column, reference column contains the control value, remaining columns are split-by filters.

Parameters:
adata AnnData

Annotated data matrix

contrasts DataFrame

DataFrame to validate

Raises:

ValueError – If validation fails.

Return type:

None