rapids_singlecell.ptg.Distance#
- class rapids_singlecell.ptg.Distance(metric='edistance', layer_key=None, obsm_key=None, **kwargs)[source]#
GPU-accelerated distance computation between groups of cells.
API compatible with pertpy’s Distance class.
Currently supported metrics:
"edistance": Energy distance (default).Twice the mean pairwise distance between cells of two groups minus the mean pairwise distance between cells within each group. See Peidli et al. (2023). Accepts dense embeddings (e.g.
obsm_key="X_pca") or sparse CSR expression data (a sparse layer orlayer_key="X"), which is densified inside the kernel rather than on the host.
"euclidean"and"root_mean_squared_error": Euclidean distancebetween group mean vectors.
"mse": Mean squared distance between group mean vectors."mean_absolute_error": Mean absolute distance between group meanvectors.
"pearson_distance": Pearson distance between group mean vectors."cosine_distance": Cosine distance between group mean vectors."r2_distance": One minus the coefficient of determination betweengroup mean vectors.
"wasserstein": Entropy-regularized 2-Wasserstein via Sinkhorn.Squared-Euclidean ground cost; per-pair auto-epsilon defaulting to
0.05 * std(C)to match OTT-JAX. Returns OTT’sreg_ot_costvalue.
- Parameters:
- metric
Literal['edistance','euclidean','root_mean_squared_error','mse','mean_absolute_error','pearson_distance','cosine_distance','r2_distance','wasserstein'] (default:'edistance') Distance metric to use.
- layer_key
str|None(default:None) Key in adata.layers for cell data, or
"X"to useadata.X. Mutually exclusive withobsm_key.- obsm_key
str|None(default:None) Key in adata.obsm for embeddings. Mutually exclusive with
layer_key. Defaults to"X_pca"if neither is specified.
- metric
Notes
The
edistancebootstrap implementation differs from pertpy: rather than precomputing an n×n cell distance matrix and sampling from it, this implementation resamples cells and recomputes distances from scratch each iteration. This scales better for large datasets (O(n) vs O(n²) memory) and leverages multi-GPU parallelism for each bootstrap iteration."edistance"and"wasserstein"use multi-GPU (pairs are split across devices). Pseudobulk metrics aggregate cells into K group-mean vectors before computing distances, and the resulting K×K kernel is cheap enough on a single GPU that distributing it is not worth the cost. Passingmulti_gpu=Truefor those metrics falls back to a single device with a warning.Examples
>>> import rapids_singlecell as rsc >>> distance = rsc.ptg.Distance(metric='edistance') >>> result = distance.pairwise(adata, groupby='perturbation')
>>> # Direct computation on arrays >>> d = distance(X, Y)
Methods table#
|
Compute bootstrap mean and variance for distance between two arrays. |
|
Compute distances for contrasts. |
|
Build a contrasts DataFrame for use with |
|
Compute distances from one selected group to all other groups. |
|
Compute pairwise distances between all cell groups. |
|
Validate a contrasts DataFrame against an AnnData object. |
Methods#
bootstrap#
- Distance.bootstrap(X, Y, *, n_bootstrap=100, random_state=0)[source]#
Compute bootstrap mean and variance for distance between two arrays.
This provides pertpy-compatible API for bootstrap computation directly on arrays without requiring an AnnData object.
- Parameters:
- X np.ndarray | cp.ndarray
First array of shape (n_samples_x, n_features)
- Y np.ndarray | cp.ndarray
Second array of shape (n_samples_y, n_features)
- n_bootstrap int (default:
100) Number of bootstrap iterations
- random_state int (default:
0) Random seed for reproducibility
- Return type:
MeanVar
- Returns:
result Named tuple containing mean and variance of bootstrapped distances
Examples
>>> distance = Distance(metric='edistance') >>> X = adata.obsm["X_pca"][adata.obs["group"] == "A"] >>> Y = adata.obsm["X_pca"][adata.obs["group"] == "B"] >>> result = distance.bootstrap(X, Y, n_bootstrap=100) >>> print(f"Distance: {result.mean:.3f} ± {result.variance**0.5:.3f}")
contrast_distances#
- Distance.contrast_distances(adata, contrasts, *, multi_gpu=None)[source]#
Compute distances for contrasts.
Accepts a DataFrame (from
create_contrasts()or constructed manually) with the following layout:First column: the groupby column (target values to compare)
``reference`` column: the control value in the groupby column
Other columns: split-by filters (e.g., cell type)
- Parameters:
- adata
AnnData Annotated data matrix
- contrasts
DataFrame DataFrame with a groupby column, a
referencecolumn, and optional split columns.- multi_gpu
bool|list[int] |str|None(default:None) GPU selection: - None: Use all GPUs if metric supports it, else GPU 0 (default) - True: Use all available GPUs - False: Use only GPU 0 - list[int]: Use specific GPU IDs (e.g., [0, 2]) - str: Comma-separated GPU IDs (e.g., “0,2”)
- adata
- Return type:
- Returns:
pd.DataFrame Copy of the input DataFrame with an added distance column.
Examples
>>> distance = Distance(metric='edistance')
>>> # Using create_contrasts helper >>> contrasts = Distance.create_contrasts( ... adata, groupby="target_gene", selected_group="Non_target", ... split_by="group_name", ... ) >>> result = distance.contrast_distances(adata, contrasts=contrasts)
>>> # Manual DataFrame construction >>> import pandas as pd >>> contrasts = pd.DataFrame({ ... "target_gene": ["Irf7", "Ski"], ... "reference": ["Non_target", "Non_target"], ... "group_name": ["CD4", "CD4"], ... }) >>> result = distance.contrast_distances(adata, contrasts)
create_contrasts#
- static Distance.create_contrasts(adata, groupby, selected_group, *, groups=None, split_by=None)[source]#
Build a contrasts DataFrame for use with
contrast_distances().Each row represents one contrast: comparing a group against the reference, optionally within each level of
split_bycolumns. The resulting DataFrame can be filtered or modified before passing tocontrast_distances().The output layout is:
First column (
groupby): the target values to compare``reference`` column: the control value in the groupby column
Remaining columns (
split_by): stratification filters
- Parameters:
- adata
AnnData Annotated data matrix
- groupby
str Column in
adata.obswhose levels are compared againstselected_group- selected_group
str|Sequence[str] The reference (control) value(s) in the
groupbycolumn. When a sequence is passed, each target is compared against every reference, producing one row per (target, reference) combination.- groups
Sequence[str] |None(default:None) Specific groups to include. If None, all non-reference groups are included.
- split_by
str|Sequence[str] |None(default:None) Column(s) in
adata.obsto stratify by. If provided, contrasts are computed within each unique combination of these columns. Only combinations where the reference group exists are included.
- adata
- Return type:
- Returns:
pd.DataFrame One row per contrast. First column is
groupby, thenreference, then anysplit_bycolumns.
Examples
>>> # All targets vs control, ignoring celltype >>> contrasts = Distance.create_contrasts( ... adata, groupby="target_gene", selected_group="Non_target" ... )
>>> # Multiple references >>> contrasts = Distance.create_contrasts( ... adata, groupby="target_gene", ... selected_group=["Non_target", "Scramble"], ... )
>>> # Stratified by celltype >>> contrasts = Distance.create_contrasts( ... adata, groupby="target_gene", selected_group="Non_target", ... split_by="group_name", ... )
>>> # Filter before computing >>> contrasts = contrasts[contrasts["group_name"] != "rare_type"] >>> result = distance.contrast_distances(adata, contrasts=contrasts)
>>> # Manual construction (no helper needed) >>> import pandas as pd >>> contrasts = pd.DataFrame({ ... "target_gene": ["Irf7", "Ski"], ... "reference": ["Non_target", "Non_target"], ... "group_name": ["CD4", "CD4"], ... })
onesided_distances#
- Distance.onesided_distances(adata, groupby, selected_group, *, groups=None, bootstrap=False, n_bootstrap=100, random_state=0, multi_gpu=None)[source]#
Compute distances from one selected group to all other groups.
- Parameters:
- adata
AnnData Annotated data matrix
- groupby
str Key in adata.obs for grouping cells
- selected_group
Sequence[str] |str Reference group to compute distances from
- groups
Sequence[str] |None(default:None) Specific groups to compute distances to (if None, use all)
- bootstrap
bool(default:False) Whether to compute bootstrap variance estimates
- n_bootstrap
int(default:100) Number of bootstrap iterations (if bootstrap=True)
- random_state
int(default:0) Random seed for reproducibility
- multi_gpu
bool|list[int] |str|None(default:None) GPU selection: - None: Use all GPUs if metric supports it, else GPU 0 (default) - True: Use all available GPUs - False: Use only GPU 0 - list[int]: Use specific GPU IDs (e.g., [0, 2]) - str: Comma-separated GPU IDs (e.g., “0,2”)
- adata
- Return type:
Series|DataFrame|tuple[Series,Series] |tuple[DataFrame,DataFrame]- Returns:
distances Series containing distances from selected_group to all other groups. If bootstrap=True, returns tuple of (distances, distances_var).
Examples
>>> distance = Distance(metric='edistance') >>> distances = distance.onesided_distances( ... adata, groupby='condition', selected_group='control' ... )
pairwise#
- Distance.pairwise(adata, groupby, *, groups=None, bootstrap=False, n_bootstrap=100, random_state=0, multi_gpu=None)[source]#
Compute pairwise distances between all cell groups.
- Parameters:
- adata
AnnData Annotated data matrix
- groupby
str Key in adata.obs for grouping cells
- groups
Sequence[str] |None(default:None) Specific groups to compute (if None, use all)
- bootstrap
bool(default:False) Whether to compute bootstrap variance estimates
- n_bootstrap
int(default:100) Number of bootstrap iterations (if bootstrap=True)
- random_state
int(default:0) Random seed for reproducibility
- multi_gpu
bool|list[int] |str|None(default:None) GPU selection: - None: Use all GPUs if metric supports it, else GPU 0 (default) - True: Use all available GPUs - False: Use only GPU 0 - list[int]: Use specific GPU IDs (e.g., [0, 2]) - str: Comma-separated GPU IDs (e.g., “0,2”)
- adata
- Returns:
result DataFrame with pairwise distances. If bootstrap=True, returns tuple of (distances, distances_var) DataFrames.
Examples
>>> distance = Distance(metric='edistance') >>> result = distance.pairwise(adata, groupby='condition')
validate_contrasts#
- static Distance.validate_contrasts(adata, contrasts)[source]#
Validate a contrasts DataFrame against an AnnData object.
Expects the DataFrame layout produced by
create_contrasts(): first column is the groupby column,referencecolumn contains the control value, remaining columns are split-by filters.- Parameters:
- Raises:
ValueError – If validation fails.
- Return type: