pertpy-GPU: ptg

pertpy-GPU: `ptg`#

pertpy provides tools for perturbation analysis [HJM+25]. rapids_singlecell.ptg accelerates some of these methods.

Distance#

Distance([metric, layer_key, obsm_key])

GPU-accelerated distance computation between groups of cells.

class rapids_singlecell.ptg.Distance(metric='edistance', layer_key=None, obsm_key=None, **kwargs)[source]

GPU-accelerated distance computation between groups of cells.

API compatible with pertpy’s Distance class.

Currently supported metrics:

"edistance": Energy distance (default).
Twice the mean pairwise distance between cells of two groups minus the mean pairwise distance between cells within each group. See Peidli et al. (2023). Accepts dense embeddings (e.g. obsm_key="X_pca") or sparse CSR expression data (a sparse layer or layer_key="X"), which is densified inside the kernel rather than on the host.
"euclidean" and "root_mean_squared_error": Euclidean distance
between group mean vectors.
"mse": Mean squared distance between group mean vectors.
"mean_absolute_error": Mean absolute distance between group mean
vectors.
"pearson_distance": Pearson distance between group mean vectors.
"cosine_distance": Cosine distance between group mean vectors.
"r2_distance": One minus the coefficient of determination between
group mean vectors.
"wasserstein": Entropy-regularized 2-Wasserstein via Sinkhorn.
Squared-Euclidean ground cost; per-pair auto-epsilon defaulting to 0.05 * std(C) to match OTT-JAX. Returns OTT’s reg_ot_cost value.

Parameters:

metric Literal['edistance', 'euclidean', 'root_mean_squared_error', 'mse', 'mean_absolute_error', 'pearson_distance', 'cosine_distance', 'r2_distance', 'wasserstein'] (default: 'edistance'): Distance metric to use.
layer_key str | None (default: None): Key in adata.layers for cell data, or "X" to use adata.X. Mutually exclusive with obsm_key.
obsm_key str | None (default: None): Key in adata.obsm for embeddings. Mutually exclusive with layer_key. Defaults to "X_pca" if neither is specified.

Notes

The edistance bootstrap implementation differs from pertpy: rather than precomputing an n×n cell distance matrix and sampling from it, this implementation resamples cells and recomputes distances from scratch each iteration. This scales better for large datasets (O(n) vs O(n²) memory) and leverages multi-GPU parallelism for each bootstrap iteration.

"edistance" and "wasserstein" use multi-GPU (pairs are split across devices). Pseudobulk metrics aggregate cells into K group-mean vectors before computing distances, and the resulting K×K kernel is cheap enough on a single GPU that distributing it is not worth the cost. Passing multi_gpu=True for those metrics falls back to a single device with a warning.

Examples

>>> import rapids_singlecell as rsc
>>> distance = rsc.ptg.Distance(metric='edistance')
>>> result = distance.pairwise(adata, groupby='perturbation')

>>> # Direct computation on arrays
>>> d = distance(X, Y)

Methods

`pairwise`(adata, groupby, *[, groups, ...])	Compute pairwise distances between all cell groups.
`onesided_distances`(adata, groupby, ...[, ...])	Compute distances from one selected group to all other groups.
`contrast_distances`(adata, contrasts, *[, ...])	Compute distances for contrasts.
`create_contrasts`(adata, groupby, ...[, ...])	Build a contrasts DataFrame for use with `contrast_distances()`.
`bootstrap`(X, Y, *[, n_bootstrap, random_state])	Compute bootstrap mean and variance for distance between two arrays.

__call__(X, Y)[source]

Compute distance between two cell groups directly from arrays.

This provides pertpy-compatible API for direct distance computation.

Parameters:

X np.ndarray | cp.ndarray: First array of shape (n_samples_x, n_features)
Y np.ndarray | cp.ndarray: Second array of shape (n_samples_y, n_features)

Return type:

float

Returns:

float Distance between X and Y

Examples

>>> distance = Distance(metric='edistance')
>>> X = adata.obsm["X_pca"][adata.obs["group"] == "A"]
>>> Y = adata.obsm["X_pca"][adata.obs["group"] == "B"]
>>> d = distance(X, Y)

pairwise(adata, groupby, *, groups=None, bootstrap=False, n_bootstrap=100, random_state=0, multi_gpu=None)[source]

Compute pairwise distances between all cell groups.

Parameters:

adata AnnData: Annotated data matrix
groupby str: Key in adata.obs for grouping cells
groups Sequence[str] | None (default: None): Specific groups to compute (if None, use all)
bootstrap bool (default: False): Whether to compute bootstrap variance estimates
n_bootstrap int (default: 100): Number of bootstrap iterations (if bootstrap=True)
random_state int (default: 0): Random seed for reproducibility
multi_gpu bool | list[int] | str | None (default: None): GPU selection: - None: Use all GPUs if metric supports it, else GPU 0 (default) - True: Use all available GPUs - False: Use only GPU 0 - list[int]: Use specific GPU IDs (e.g., [0, 2]) - str: Comma-separated GPU IDs (e.g., “0,2”)

Returns:

result DataFrame with pairwise distances. If bootstrap=True, returns tuple of (distances, distances_var) DataFrames.

Examples

>>> distance = Distance(metric='edistance')
>>> result = distance.pairwise(adata, groupby='condition')

onesided_distances(adata, groupby, selected_group, *, groups=None, bootstrap=False, n_bootstrap=100, random_state=0, multi_gpu=None)[source]

Compute distances from one selected group to all other groups.

Parameters:

adata AnnData: Annotated data matrix
groupby str: Key in adata.obs for grouping cells
selected_group Sequence[str] | str: Reference group to compute distances from
groups Sequence[str] | None (default: None): Specific groups to compute distances to (if None, use all)
bootstrap bool (default: False): Whether to compute bootstrap variance estimates
n_bootstrap int (default: 100): Number of bootstrap iterations (if bootstrap=True)
random_state int (default: 0): Random seed for reproducibility
multi_gpu bool | list[int] | str | None (default: None): GPU selection: - None: Use all GPUs if metric supports it, else GPU 0 (default) - True: Use all available GPUs - False: Use only GPU 0 - list[int]: Use specific GPU IDs (e.g., [0, 2]) - str: Comma-separated GPU IDs (e.g., “0,2”)

Return type:

Series | DataFrame | tuple[Series, Series] | tuple[DataFrame, DataFrame]

Returns:

distances Series containing distances from selected_group to all other groups. If bootstrap=True, returns tuple of (distances, distances_var).

Examples

>>> distance = Distance(metric='edistance')
>>> distances = distance.onesided_distances(
...     adata, groupby='condition', selected_group='control'
... )

contrast_distances(adata, contrasts, *, multi_gpu=None)[source]

Compute distances for contrasts.

Accepts a DataFrame (from create_contrasts() or constructed manually) with the following layout:

First column: the groupby column (target values to compare)
``reference`` column: the control value in the groupby column
Other columns: split-by filters (e.g., cell type)

Parameters:

adata AnnData: Annotated data matrix
contrasts DataFrame: DataFrame with a groupby column, a reference column, and optional split columns.
multi_gpu bool | list[int] | str | None (default: None): GPU selection: - None: Use all GPUs if metric supports it, else GPU 0 (default) - True: Use all available GPUs - False: Use only GPU 0 - list[int]: Use specific GPU IDs (e.g., [0, 2]) - str: Comma-separated GPU IDs (e.g., “0,2”)

Return type:

DataFrame

Returns:

pd.DataFrame Copy of the input DataFrame with an added distance column.

Examples

>>> distance = Distance(metric='edistance')

>>> # Using create_contrasts helper
>>> contrasts = Distance.create_contrasts(
...     adata, groupby="target_gene", selected_group="Non_target",
...     split_by="group_name",
... )
>>> result = distance.contrast_distances(adata, contrasts=contrasts)

>>> # Manual DataFrame construction
>>> import pandas as pd
>>> contrasts = pd.DataFrame({
...     "target_gene": ["Irf7", "Ski"],
...     "reference": ["Non_target", "Non_target"],
...     "group_name": ["CD4", "CD4"],
... })
>>> result = distance.contrast_distances(adata, contrasts)

static create_contrasts(adata, groupby, selected_group, *, groups=None, split_by=None)[source]

Build a contrasts DataFrame for use with contrast_distances().

Each row represents one contrast: comparing a group against the reference, optionally within each level of split_by columns. The resulting DataFrame can be filtered or modified before passing to contrast_distances().

The output layout is:

First column (groupby): the target values to compare
``reference`` column: the control value in the groupby column
Remaining columns (split_by): stratification filters

Parameters:

adata AnnData: Annotated data matrix
groupby str: Column in adata.obs whose levels are compared against selected_group
selected_group str | Sequence[str]: The reference (control) value(s) in the groupby column. When a sequence is passed, each target is compared against every reference, producing one row per (target, reference) combination.
groups Sequence[str] | None (default: None): Specific groups to include. If None, all non-reference groups are included.
split_by str | Sequence[str] | None (default: None): Column(s) in adata.obs to stratify by. If provided, contrasts are computed within each unique combination of these columns. Only combinations where the reference group exists are included.

Return type:

DataFrame

Returns:

pd.DataFrame One row per contrast. First column is groupby, then reference, then any split_by columns.

Examples

>>> # All targets vs control, ignoring celltype
>>> contrasts = Distance.create_contrasts(
...     adata, groupby="target_gene", selected_group="Non_target"
... )

>>> # Multiple references
>>> contrasts = Distance.create_contrasts(
...     adata, groupby="target_gene",
...     selected_group=["Non_target", "Scramble"],
... )

>>> # Stratified by celltype
>>> contrasts = Distance.create_contrasts(
...     adata, groupby="target_gene", selected_group="Non_target",
...     split_by="group_name",
... )

>>> # Filter before computing
>>> contrasts = contrasts[contrasts["group_name"] != "rare_type"]
>>> result = distance.contrast_distances(adata, contrasts=contrasts)

>>> # Manual construction (no helper needed)
>>> import pandas as pd
>>> contrasts = pd.DataFrame({
...     "target_gene": ["Irf7", "Ski"],
...     "reference": ["Non_target", "Non_target"],
...     "group_name": ["CD4", "CD4"],
... })

bootstrap(X, Y, *, n_bootstrap=100, random_state=0)[source]

Compute bootstrap mean and variance for distance between two arrays.

This provides pertpy-compatible API for bootstrap computation directly on arrays without requiring an AnnData object.

Parameters:

X np.ndarray | cp.ndarray: First array of shape (n_samples_x, n_features)
Y np.ndarray | cp.ndarray: Second array of shape (n_samples_y, n_features)
n_bootstrap int (default: 100): Number of bootstrap iterations
random_state int (default: 0): Random seed for reproducibility

Return type:

MeanVar

Returns:

result Named tuple containing mean and variance of bootstrapped distances

Examples

>>> distance = Distance(metric='edistance')
>>> X = adata.obsm["X_pca"][adata.obs["group"] == "A"]
>>> Y = adata.obsm["X_pca"][adata.obs["group"] == "B"]
>>> result = distance.bootstrap(X, Y, n_bootstrap=100)
>>> print(f"Distance: {result.mean:.3f} ± {result.variance**0.5:.3f}")

GuideAssignment#

GuideAssignment()

GPU-accelerated guide RNA assignment.

class rapids_singlecell.ptg.GuideAssignment[source]

GPU-accelerated guide RNA assignment.

Provides threshold-based and mixture-model-based methods for assigning cells to guide RNAs, compatible with pertpy’s GuideAssignment API. The mixture model fits a Poisson-Gaussian mixture per guide with batched EM on GPU, yielding orders-of-magnitude speedup.

Methods

`assign_by_threshold`(adata, *, ...[, layer, ...])	Assign cells to gRNAs exceeding a count threshold.
`assign_to_max_guide`(adata, *, ...[, layer, ...])	Assign each cell to its most expressed gRNA.
`assign_mixture_model`(adata, *[, layer, ...])	Assign gRNAs using a GPU-accelerated Poisson–Gaussian mixture model.

assign_by_threshold(adata, *, assignment_threshold, layer=None, output_layer='assigned_guides')[source]

Assign cells to gRNAs exceeding a count threshold.

Each cell is assigned to every gRNA with at least assignment_threshold counts. Expects unnormalized count data.

Parameters:

adata AnnData: Annotated data matrix of shape n_obs x n_vars.
assignment_threshold float: Minimum count for a viable assignment.
layer str | None (default: None): Layer with raw counts. Uses adata.X if None.
output_layer str (default: 'assigned_guides'): Key under which the binary assignment matrix is stored in adata.layers.

Return type:

None

assign_to_max_guide(adata, *, assignment_threshold, layer=None, obs_key='assigned_guide', no_grna_assigned_key='Negative')[source]

Assign each cell to its most expressed gRNA.

Each cell is assigned to the gRNA with the highest count, provided that count is at least assignment_threshold. Expects unnormalized count data.

Parameters:

adata AnnData: Annotated data matrix of shape n_obs x n_vars.
assignment_threshold float: Minimum count for a viable assignment.
layer str | None (default: None): Layer with raw counts. Uses adata.X if None.
obs_key str (default: 'assigned_guide'): Column in adata.obs where the assignment is stored.
no_grna_assigned_key str (default: 'Negative'): Label for cells with no guide above threshold.

Return type:

None

assign_mixture_model(adata, *, layer=None, assigned_guides_key='assigned_guide', no_grna_assigned_key='negative', max_assignments_per_cell=5, multiple_grna_assigned_key='multiple', multiple_grna_assignment_string='+', only_return_results=False, max_iter=90, tol=0.0001, posterior_threshold=0.5)[source]

Assign gRNAs using a GPU-accelerated Poisson–Gaussian mixture model.

Fits a two-component mixture (Poisson background + Gaussian signal) to the log₂-transformed non-zero counts of each guide simultaneously using batched Expectation-Maximization on GPU. The fitted model is converted to an integer raw-count threshold; the default posterior cutoff matches pertpy’s threshold rule.

Parameters:

adata AnnData: Annotated data matrix with guide RNA counts.
layer str | None (default: None): Layer with raw counts. Uses adata.X if None.
assigned_guides_key str (default: 'assigned_guide'): Key in adata.obs for storing the assignment result.
no_grna_assigned_key str (default: 'negative'): Label for cells negative for all gRNAs.
max_assignments_per_cell int (default: 5): Maximum number of gRNAs a cell can be assigned to.
multiple_grna_assigned_key str (default: 'multiple'): Label for cells exceeding max_assignments_per_cell.
multiple_grna_assignment_string str (default: '+'): Delimiter for joining multiple guide names.
only_return_results bool (default: False): If True, return assignments without modifying adata.
max_iter int (default: 90): Maximum number of EM iterations.
tol float (default: 0.0001): Convergence tolerance on parameter changes.
posterior_threshold float (default: 0.5): Minimum posterior probability of the Gaussian component required for a raw UMI count to define the assignment threshold.

Return type:

ndarray | None

Returns:

If only_return_results is True, returns an array of assignments. Otherwise modifies adata in-place and returns None.

Mixscape#

Mixscape()

GPU-accelerated Mixscape for pooled CRISPR screens.

class rapids_singlecell.ptg.Mixscape[source]

GPU-accelerated Mixscape for pooled CRISPR screens.

Identifies cells with a detectable perturbation effect and separates them from cells that escaped perturbation, following Seurat’s Mixscape and pertpy’s Mixscape. The perturbation signature and the iterative Gaussian-mixture classification run on the GPU; every gene’s spherical, fixed-control-component mixture is fit in a single batched CUDA kernel (one block per gene), via _gmm_cuda.mixscape_project_em.

Methods

`perturbation_signature`(adata, pert_key, ...)	Calculate the perturbation signature.
`mixscape`(adata, pert_key, control, *[, ...])	Identify perturbed and escaping cells per target gene.
`lda`(adata, pert_key, control, *[, ...])	Linear discriminant analysis on the mixscape result.

perturbation_signature(adata, pert_key, control, *, ref_selection_mode='nn', split_by=None, n_neighbors=20, use_rep=None, n_dims=15, n_pcs=None, knn_algorithm='brute', knn_kwargs=None, copy=False)[source]

Calculate the perturbation signature.

The perturbation signature replaces each cell’s expression with the residual against comparable control cells, removing confounding variation so that what remains reflects the perturbation. The result is written to adata.layers["X_pert"]. As in the original implementation, this is intended to run on unscaled log-normalized data.

Parameters:

adata AnnData: The annotated data object.
pert_key str: The column of .obs with perturbation categories; must also contain control.
control str: Name of the control category in pert_key.
ref_selection_mode Literal['nn', 'split_by'] (default: 'nn'): How reference cells are selected. "nn" uses the n_neighbors nearest control cells in the chosen representation; "split_by" uses all control cells within the same split_by group.
split_by str | None (default: None): Column of .obs used to compute the signature separately per group (e.g. biological replicate). Required for ref_selection_mode="split_by".
n_neighbors int (default: 20): Number of control neighbors used for ref_selection_mode="nn". Capped to the number of control cells available in each split, so a split with fewer controls than n_neighbors still runs (pertpy would error).
use_rep str | None (default: None): Representation to use for neighbor selection. "X" or any .obsm key. If None, .X is used when n_vars is below 50, otherwise "X_pca" (computed if absent).
n_dims int | None (default: 15): Number of dimensions of the representation to use. None uses all.
n_pcs int | None (default: None): Number of principal components to compute if a PCA representation is built.
knn_algorithm str (default: 'brute'): Nearest-neighbor backend for ref_selection_mode="nn": "brute" (exact, default), or the approximate cuVS backends "ivfflat", "cagra", "ivfpq" which are much faster for large datasets.
knn_kwargs dict | None (default: None): Extra parameters for the approximate backends (e.g. n_lists / n_probes for "ivfflat").
copy bool (default: False): Whether to return a copy of adata.

Return type:

AnnData | None

Returns:

Returns the modified copy if copy=True, otherwise writes adata.layers["X_pert"] in place and returns None.

mixscape(adata, pert_key, control, *, new_class_name='mixscape_class', layer=None, min_de_genes=5, logfc_threshold=0.25, de_layer=None, test_method='wilcoxon', iter_num=10, scale=True, split_by=None, pval_cutoff=0.05, perturbation_type='KO', random_state=0, copy=False)[source]

Identify perturbed and escaping cells per target gene.

For each target gene, differentially expressed genes are found against the control, the perturbation signature is projected onto the gene-specific perturbation direction, and a two-component spherical Gaussian mixture (with the control component held fixed) iteratively separates knocked-out (perturbed) from non-perturbed cells.

Parameters:

adata AnnData: The annotated data object.
pert_key str: The column of .obs with target gene labels.
control str: Control category in pert_key.
new_class_name str (default: 'mixscape_class'): Name of the .obs column for the classification result.
layer str | None (default: None): Layer used for the mixture. Defaults to .layers["X_pert"].
min_de_genes int (default: 5): Minimum number of differentially expressed genes required to test a gene for perturbation.
logfc_threshold float (default: 0.25): Minimum absolute log fold change for a gene to count as differentially expressed.
de_layer str | None (default: None): Layer used for differential expression. None uses .X.
test_method str (default: 'wilcoxon'): Differential-expression test passed to rapids_singlecell.tl.rank_genes_groups().
iter_num int (default: 10): Maximum number of refinement iterations.
scale bool (default: True): Scale the mixture input before fitting.
split_by str | None (default: None): .obs column with a condition/cell-type annotation if perturbations are condition specific.
pval_cutoff float (default: 0.05): Adjusted p-value cutoff for differentially expressed genes.
perturbation_type str (default: 'KO'): Label suffix used for perturbed cells (e.g. "KO").
random_state int (default: 0): Accepted for pertpy.tl.Mixscape API compatibility; has no effect, as the spherical mixture is initialized deterministically from the per-gene projection statistics.
copy bool (default: False): Whether to return a copy of adata.

Return type:

AnnData | None

Returns:

Returns the modified copy if copy=True, otherwise writes the results to adata in place and returns None. The results are adata.obs[new_class_name] (per-gene classification), adata.obs[f"{new_class_name}_global"] (perturbed/NP/NT), adata.obs[f"{new_class_name}_p_{perturbation_type.lower()}"] (posterior probability) and adata.uns["mixscape"].

lda(adata, pert_key, control, *, mixscape_class_global='mixscape_class_global', layer=None, n_comps=10, min_de_genes=5, logfc_threshold=0.25, test_method='wilcoxon', split_by=None, pval_cutoff=0.05, perturbation_type='KO', copy=False)[source]

Linear discriminant analysis on the mixscape result.

Requires mixscape() to have been run. For each perturbed gene, a PCA is fit on its differentially expressed genes and all perturbed and control cells are projected into that subspace; the concatenated projections are then reduced with a GPU linear discriminant analysis (a CuPy port of scikit-learn’s SVD solver). The embedding is written to adata.uns["mixscape_lda"].

Parameters:

adata AnnData: The annotated data object.
pert_key str: The column of .obs with target gene labels.
control str: Control category in pert_key.
mixscape_class_global str (default: 'mixscape_class_global'): The .obs column with the global mixscape classification.
layer str | None (default: None): Layer used for differential expression. None uses .X.
n_comps int (default: 10): Number of principal components per gene subspace. Reduced per gene to min(n_comps, min(cells, genes) - 1); genes that leave fewer than one component are skipped rather than raising (pertpy errors).
min_de_genes int (default: 5): Minimum number of differentially expressed genes to test a gene.
logfc_threshold float (default: 0.25): Minimum absolute log fold change for a differentially expressed gene.
test_method str (default: 'wilcoxon'): Differential-expression test passed to rapids_singlecell.tl.rank_genes_groups().
split_by str | None (default: None): .obs column with a condition/cell-type annotation if perturbations are condition specific.
pval_cutoff float (default: 0.05): Adjusted p-value cutoff for differentially expressed genes.
perturbation_type str (default: 'KO'): Label used for perturbed cells (e.g. "KO").
copy bool (default: False): Whether to return a copy of adata.

Return type:

AnnData | None

Returns:

Returns the modified copy if copy=True, otherwise writes adata.uns["mixscape_lda"] in place and returns None.

Mixscale#

Mixscale()

GPU-accelerated Mixscale for continuous perturbation-efficiency scoring.

class rapids_singlecell.ptg.Mixscale[source]

GPU-accelerated Mixscale for continuous perturbation-efficiency scoring.

Unlike Mixscape, which performs a binary knocked-out/non-perturbed classification with a Gaussian mixture, Mixscale assigns each cell a continuous perturbation-efficiency score. It follows Seurat’s Mixscale and pertpy’s Mixscale; the perturbation signature is computed via perturbation_signature() and the per-gene projection and z-score scoring run on the GPU.

Methods

`perturbation_signature`(adata, pert_key, ...)	Calculate the perturbation signature.
`mixscale`(adata, pert_key, control, *[, ...])	Continuous perturbation efficiency scores (Mixscale).

perturbation_signature(adata, pert_key, control, *, ref_selection_mode='nn', split_by=None, n_neighbors=20, use_rep=None, n_dims=15, n_pcs=None, knn_algorithm='brute', knn_kwargs=None, copy=False)[source]

Calculate the perturbation signature.

The perturbation signature replaces each cell’s expression with the residual against comparable control cells, removing confounding variation so that what remains reflects the perturbation. The result is written to adata.layers["X_pert"]. As in the original implementation, this is intended to run on unscaled log-normalized data.

Parameters:

adata AnnData: The annotated data object.
pert_key str: The column of .obs with perturbation categories; must also contain control.
control str: Name of the control category in pert_key.
ref_selection_mode Literal['nn', 'split_by'] (default: 'nn'): How reference cells are selected. "nn" uses the n_neighbors nearest control cells in the chosen representation; "split_by" uses all control cells within the same split_by group.
split_by str | None (default: None): Column of .obs used to compute the signature separately per group (e.g. biological replicate). Required for ref_selection_mode="split_by".
n_neighbors int (default: 20): Number of control neighbors used for ref_selection_mode="nn". Capped to the number of control cells available in each split, so a split with fewer controls than n_neighbors still runs (pertpy would error).
use_rep str | None (default: None): Representation to use for neighbor selection. "X" or any .obsm key. If None, .X is used when n_vars is below 50, otherwise "X_pca" (computed if absent).
n_dims int | None (default: 15): Number of dimensions of the representation to use. None uses all.
n_pcs int | None (default: None): Number of principal components to compute if a PCA representation is built.
knn_algorithm str (default: 'brute'): Nearest-neighbor backend for ref_selection_mode="nn": "brute" (exact, default), or the approximate cuVS backends "ivfflat", "cagra", "ivfpq" which are much faster for large datasets.
knn_kwargs dict | None (default: None): Extra parameters for the approximate backends (e.g. n_lists / n_probes for "ivfflat").
copy bool (default: False): Whether to return a copy of adata.

Return type:

AnnData | None

Returns:

Returns the modified copy if copy=True, otherwise writes adata.layers["X_pert"] in place and returns None.

mixscale(adata, pert_key, control, *, new_class_name='mixscale_score', layer=None, min_de_genes=5, max_de_genes=100, logfc_threshold=0.25, de_layer=None, test_method='wilcoxon', scale=True, split_by=None, pval_cutoff=0.05, perturbation_type='KO', copy=False)[source]

Continuous perturbation efficiency scores (Mixscale).

Unlike mixscape(), which performs a binary knocked-out/non-perturbed classification with a Gaussian mixture, this assigns each cell a continuous perturbation-efficiency score: the scalar projection of its perturbation signature onto the per-gene perturbation direction (mean perturbed minus mean control), z-score standardized relative to the non-targeting control distribution. This is useful for CRISPRi/CRISPRa screens where cells show a gradient of perturbation strength rather than a binary knockout. Control cells receive a score of 0.

Implements Jiang, Dalgarno et al., “Systematic reconstruction of molecular pathway signatures using scalable single-cell perturbation screens”, Nature Cell Biology (2025), following pertpy’s mixscale().

Parameters:

adata AnnData: The annotated data object.
pert_key str: The column of .obs with target gene labels.
control str: Control category in pert_key.
new_class_name str (default: 'mixscale_score'): Name of the .obs column for the continuous score.
layer str | None (default: None): Layer used for scoring. Defaults to .layers["X_pert"].
min_de_genes int (default: 5): Minimum number of differentially expressed genes required to score a gene; genes with fewer are skipped.
max_de_genes int (default: 100): Maximum number of (top-ranked) differentially expressed genes used to define the perturbation direction.
logfc_threshold float (default: 0.25): Minimum absolute log fold change for a gene to count as differentially expressed.
de_layer str | None (default: None): Layer used for differential expression. None uses .X.
test_method str (default: 'wilcoxon'): Differential-expression test passed to rapids_singlecell.tl.rank_genes_groups().
scale bool (default: True): Scale the per-gene sub-matrix before computing scores.
split_by str | None (default: None): .obs column with a condition/cell-type annotation if perturbations are condition specific.
pval_cutoff float (default: 0.05): Adjusted p-value cutoff for differentially expressed genes.
perturbation_type str (default: 'KO'): Accepted for pertpy.tl.Mixscale.mixscale API compatibility; has no effect on the continuous score.
copy bool (default: False): Whether to return a copy of adata.

Return type:

AnnData | None

Returns:

Returns the modified copy if copy=True, otherwise writes adata.obs[new_class_name] in place and returns None. Higher absolute values indicate a stronger perturbation effect; control cells and any gene that cannot be scored receive 0.

pertpy-GPU: ptg

Contents

pertpy-GPU: ptg#

Distance#

GuideAssignment#

Mixscape#

Mixscale#

pertpy-GPU: `ptg`#