rapids_singlecell.ptg.Distance#
- class rapids_singlecell.ptg.Distance(metric='edistance', layer_key=None, obsm_key=None, **kwargs)[source]#
GPU-accelerated distance computation between groups of cells.
API compatible with pertpy’s Distance class.
Currently supported metrics:
"edistance": Energy distance (default).Twice the mean pairwise distance between cells of two groups minus the mean pairwise distance between cells within each group. See Peidli et al. (2023).
"euclidean"and"root_mean_squared_error": Euclidean distancebetween group mean vectors.
"mse": Mean squared distance between group mean vectors."mean_absolute_error": Mean absolute distance between group meanvectors.
"pearson_distance": Pearson distance between group mean vectors."cosine_distance": Cosine distance between group mean vectors."r2_distance": One minus the coefficient of determination betweengroup mean vectors.
"wasserstein": Entropy-regularized 2-Wasserstein via Sinkhorn.Squared-Euclidean ground cost; per-pair auto-epsilon defaulting to
0.05 * std(C)to match OTT-JAX. Returns OTT’sreg_ot_costvalue.
- Parameters:
- metric
Literal['edistance','euclidean','root_mean_squared_error','mse','mean_absolute_error','pearson_distance','cosine_distance','r2_distance','wasserstein'] (default:'edistance') Distance metric to use.
- layer_key
str|None(default:None) Key in adata.layers for cell data. Mutually exclusive with
obsm_key.- obsm_key
str|None(default:None) Key in adata.obsm for embeddings. Mutually exclusive with
layer_key. Defaults to"X_pca"if neither is specified.
- metric
Notes
The
edistancebootstrap implementation differs from pertpy: rather than precomputing an n×n cell distance matrix and sampling from it, this implementation resamples cells and recomputes distances from scratch each iteration. This scales better for large datasets (O(n) vs O(n²) memory) and leverages multi-GPU parallelism for each bootstrap iteration."edistance"and"wasserstein"use multi-GPU (pairs are split across devices). Pseudobulk metrics aggregate cells into K group-mean vectors before computing distances, and the resulting K×K kernel is cheap enough on a single GPU that distributing it is not worth the cost. Passingmulti_gpu=Truefor those metrics falls back to a single device with a warning.Examples
>>> import rapids_singlecell as rsc >>> distance = rsc.ptg.Distance(metric='edistance') >>> result = distance.pairwise(adata, groupby='perturbation')
>>> # Direct computation on arrays >>> d = distance(X, Y)
Methods table#
|
Compute bootstrap mean and variance for distance between two arrays. |
|
Compute distances for contrasts. |
|
Build a contrasts DataFrame for use with |
|
Compute distances from one selected group to all other groups. |
|
Compute pairwise distances between all cell groups. |
|
Validate a contrasts DataFrame against an AnnData object. |
Methods#
bootstrap#
- Distance.bootstrap(X, Y, *, n_bootstrap=100, random_state=0)[source]#
Compute bootstrap mean and variance for distance between two arrays.
This provides pertpy-compatible API for bootstrap computation directly on arrays without requiring an AnnData object.
- Parameters:
- X np.ndarray | cp.ndarray
First array of shape (n_samples_x, n_features)
- Y np.ndarray | cp.ndarray
Second array of shape (n_samples_y, n_features)
- n_bootstrap int (default:
100) Number of bootstrap iterations
- random_state int (default:
0) Random seed for reproducibility
- Return type:
MeanVar
- Returns:
result Named tuple containing mean and variance of bootstrapped distances
Examples
>>> distance = Distance(metric='edistance') >>> X = adata.obsm["X_pca"][adata.obs["group"] == "A"] >>> Y = adata.obsm["X_pca"][adata.obs["group"] == "B"] >>> result = distance.bootstrap(X, Y, n_bootstrap=100) >>> print(f"Distance: {result.mean:.3f} ± {result.variance**0.5:.3f}")
contrast_distances#
- Distance.contrast_distances(adata, contrasts, *, multi_gpu=None)[source]#
Compute distances for contrasts.
Accepts a DataFrame (from
create_contrasts()or constructed manually) with the following layout:First column: the groupby column (target values to compare)
``reference`` column: the control value in the groupby column
Other columns: split-by filters (e.g., cell type)
- Parameters:
- adata
AnnData Annotated data matrix
- contrasts
DataFrame DataFrame with a groupby column, a
referencecolumn, and optional split columns.- multi_gpu
bool|list[int] |str|None(default:None) GPU selection: - None: Use all GPUs if metric supports it, else GPU 0 (default) - True: Use all available GPUs - False: Use only GPU 0 - list[int]: Use specific GPU IDs (e.g., [0, 2]) - str: Comma-separated GPU IDs (e.g., “0,2”)
- adata
- Return type:
- Returns:
pd.DataFrame Copy of the input DataFrame with an added distance column.
Examples
>>> distance = Distance(metric='edistance')
>>> # Using create_contrasts helper >>> contrasts = Distance.create_contrasts( ... adata, groupby="target_gene", selected_group="Non_target", ... split_by="group_name", ... ) >>> result = distance.contrast_distances(adata, contrasts=contrasts)
>>> # Manual DataFrame construction >>> import pandas as pd >>> contrasts = pd.DataFrame({ ... "target_gene": ["Irf7", "Ski"], ... "reference": ["Non_target", "Non_target"], ... "group_name": ["CD4", "CD4"], ... }) >>> result = distance.contrast_distances(adata, contrasts)
create_contrasts#
- static Distance.create_contrasts(adata, groupby, selected_group, *, groups=None, split_by=None)[source]#
Build a contrasts DataFrame for use with
contrast_distances().Each row represents one contrast: comparing a group against the reference, optionally within each level of
split_bycolumns. The resulting DataFrame can be filtered or modified before passing tocontrast_distances().The output layout is:
First column (
groupby): the target values to compare``reference`` column: the control value in the groupby column
Remaining columns (
split_by): stratification filters
- Parameters:
- adata
AnnData Annotated data matrix
- groupby
str Column in
adata.obswhose levels are compared againstselected_group- selected_group
str|Sequence[str] The reference (control) value(s) in the
groupbycolumn. When a sequence is passed, each target is compared against every reference, producing one row per (target, reference) combination.- groups
Sequence[str] |None(default:None) Specific groups to include. If None, all non-reference groups are included.
- split_by
str|Sequence[str] |None(default:None) Column(s) in
adata.obsto stratify by. If provided, contrasts are computed within each unique combination of these columns. Only combinations where the reference group exists are included.
- adata
- Return type:
- Returns:
pd.DataFrame One row per contrast. First column is
groupby, thenreference, then anysplit_bycolumns.
Examples
>>> # All targets vs control, ignoring celltype >>> contrasts = Distance.create_contrasts( ... adata, groupby="target_gene", selected_group="Non_target" ... )
>>> # Multiple references >>> contrasts = Distance.create_contrasts( ... adata, groupby="target_gene", ... selected_group=["Non_target", "Scramble"], ... )
>>> # Stratified by celltype >>> contrasts = Distance.create_contrasts( ... adata, groupby="target_gene", selected_group="Non_target", ... split_by="group_name", ... )
>>> # Filter before computing >>> contrasts = contrasts[contrasts["group_name"] != "rare_type"] >>> result = distance.contrast_distances(adata, contrasts=contrasts)
>>> # Manual construction (no helper needed) >>> import pandas as pd >>> contrasts = pd.DataFrame({ ... "target_gene": ["Irf7", "Ski"], ... "reference": ["Non_target", "Non_target"], ... "group_name": ["CD4", "CD4"], ... })
onesided_distances#
- Distance.onesided_distances(adata, groupby, selected_group, *, groups=None, bootstrap=False, n_bootstrap=100, random_state=0, multi_gpu=None)[source]#
Compute distances from one selected group to all other groups.
- Parameters:
- adata
AnnData Annotated data matrix
- groupby
str Key in adata.obs for grouping cells
- selected_group
Sequence[str] |str Reference group to compute distances from
- groups
Sequence[str] |None(default:None) Specific groups to compute distances to (if None, use all)
- bootstrap
bool(default:False) Whether to compute bootstrap variance estimates
- n_bootstrap
int(default:100) Number of bootstrap iterations (if bootstrap=True)
- random_state
int(default:0) Random seed for reproducibility
- multi_gpu
bool|list[int] |str|None(default:None) GPU selection: - None: Use all GPUs if metric supports it, else GPU 0 (default) - True: Use all available GPUs - False: Use only GPU 0 - list[int]: Use specific GPU IDs (e.g., [0, 2]) - str: Comma-separated GPU IDs (e.g., “0,2”)
- adata
- Return type:
Series|DataFrame|tuple[Series,Series] |tuple[DataFrame,DataFrame]- Returns:
distances Series containing distances from selected_group to all other groups. If bootstrap=True, returns tuple of (distances, distances_var).
Examples
>>> distance = Distance(metric='edistance') >>> distances = distance.onesided_distances( ... adata, groupby='condition', selected_group='control' ... )
pairwise#
- Distance.pairwise(adata, groupby, *, groups=None, bootstrap=False, n_bootstrap=100, random_state=0, multi_gpu=None)[source]#
Compute pairwise distances between all cell groups.
- Parameters:
- adata
AnnData Annotated data matrix
- groupby
str Key in adata.obs for grouping cells
- groups
Sequence[str] |None(default:None) Specific groups to compute (if None, use all)
- bootstrap
bool(default:False) Whether to compute bootstrap variance estimates
- n_bootstrap
int(default:100) Number of bootstrap iterations (if bootstrap=True)
- random_state
int(default:0) Random seed for reproducibility
- multi_gpu
bool|list[int] |str|None(default:None) GPU selection: - None: Use all GPUs if metric supports it, else GPU 0 (default) - True: Use all available GPUs - False: Use only GPU 0 - list[int]: Use specific GPU IDs (e.g., [0, 2]) - str: Comma-separated GPU IDs (e.g., “0,2”)
- adata
- Returns:
result DataFrame with pairwise distances. If bootstrap=True, returns tuple of (distances, distances_var) DataFrames.
Examples
>>> distance = Distance(metric='edistance') >>> result = distance.pairwise(adata, groupby='condition')
validate_contrasts#
- static Distance.validate_contrasts(adata, contrasts)[source]#
Validate a contrasts DataFrame against an AnnData object.
Expects the DataFrame layout produced by
create_contrasts(): first column is the groupby column,referencecolumn contains the control value, remaining columns are split-by filters.- Parameters:
- Raises:
ValueError – If validation fails.
- Return type: