rapids_singlecell.ptg.Mixscape#

class rapids_singlecell.ptg.Mixscape[source]#

GPU-accelerated Mixscape for pooled CRISPR screens.

Identifies cells with a detectable perturbation effect and separates them from cells that escaped perturbation, following Seurat’s Mixscape and pertpy’s Mixscape. The perturbation signature and the iterative Gaussian-mixture classification run on the GPU; every gene’s spherical, fixed-control-component mixture is fit in a single batched CUDA kernel (one block per gene), via _gmm_cuda.mixscape_project_em.

Methods table#

lda(adata, pert_key, control, *[, ...])

Linear discriminant analysis on the mixscape result.

mixscale(adata, pert_key, control, *[, ...])

Continuous perturbation efficiency scores (Mixscale).

mixscape(adata, pert_key, control, *[, ...])

Identify perturbed and escaping cells per target gene.

perturbation_signature(adata, pert_key, ...)

Calculate the perturbation signature.

Methods#

lda#

Mixscape.lda(adata, pert_key, control, *, mixscape_class_global='mixscape_class_global', layer=None, n_comps=10, min_de_genes=5, logfc_threshold=0.25, test_method='wilcoxon', split_by=None, pval_cutoff=0.05, perturbation_type='KO', copy=False)[source]#

Linear discriminant analysis on the mixscape result.

Requires mixscape() to have been run. For each perturbed gene, a PCA is fit on its differentially expressed genes and all perturbed and control cells are projected into that subspace; the concatenated projections are then reduced with a GPU linear discriminant analysis (a CuPy port of scikit-learn’s SVD solver). The embedding is written to adata.uns["mixscape_lda"].

Parameters:
adata AnnData

The annotated data object.

pert_key str

The column of .obs with target gene labels.

control str

Control category in pert_key.

mixscape_class_global str (default: 'mixscape_class_global')

The .obs column with the global mixscape classification.

layer str | None (default: None)

Layer used for differential expression. None uses .X.

n_comps int (default: 10)

Number of principal components per gene subspace. Reduced per gene to min(n_comps, min(cells, genes) - 1); genes that leave fewer than one component are skipped rather than raising (pertpy errors).

min_de_genes int (default: 5)

Minimum number of differentially expressed genes to test a gene.

logfc_threshold float (default: 0.25)

Minimum absolute log fold change for a differentially expressed gene.

test_method str (default: 'wilcoxon')

Differential-expression test passed to rapids_singlecell.tl.rank_genes_groups().

split_by str | None (default: None)

.obs column with a condition/cell-type annotation if perturbations are condition specific.

pval_cutoff float (default: 0.05)

Adjusted p-value cutoff for differentially expressed genes.

perturbation_type str (default: 'KO')

Label used for perturbed cells (e.g. "KO").

copy bool (default: False)

Whether to return a copy of adata.

Return type:

AnnData | None

Returns:

Returns the modified copy if copy=True, otherwise writes adata.uns["mixscape_lda"] in place and returns None.

mixscale#

Mixscape.mixscale(adata, pert_key, control, *, new_class_name='mixscale_score', layer=None, min_de_genes=5, max_de_genes=100, logfc_threshold=0.25, de_layer=None, test_method='wilcoxon', scale=True, split_by=None, pval_cutoff=0.05, perturbation_type='KO', copy=False)[source]#

Continuous perturbation efficiency scores (Mixscale).

Unlike mixscape(), which performs a binary knocked-out/non-perturbed classification with a Gaussian mixture, this assigns each cell a continuous perturbation-efficiency score: the scalar projection of its perturbation signature onto the per-gene perturbation direction (mean perturbed minus mean control), z-score standardized relative to the non-targeting control distribution. This is useful for CRISPRi/CRISPRa screens where cells show a gradient of perturbation strength rather than a binary knockout. Control cells receive a score of 0.

Implements Jiang, Dalgarno et al., “Systematic reconstruction of molecular pathway signatures using scalable single-cell perturbation screens”, Nature Cell Biology (2025), following pertpy’s mixscale().

Parameters:
adata AnnData

The annotated data object.

pert_key str

The column of .obs with target gene labels.

control str

Control category in pert_key.

new_class_name str (default: 'mixscale_score')

Name of the .obs column for the continuous score.

layer str | None (default: None)

Layer used for scoring. Defaults to .layers["X_pert"].

min_de_genes int (default: 5)

Minimum number of differentially expressed genes required to score a gene; genes with fewer are skipped.

max_de_genes int (default: 100)

Maximum number of (top-ranked) differentially expressed genes used to define the perturbation direction.

logfc_threshold float (default: 0.25)

Minimum absolute log fold change for a gene to count as differentially expressed.

de_layer str | None (default: None)

Layer used for differential expression. None uses .X.

test_method str (default: 'wilcoxon')

Differential-expression test passed to rapids_singlecell.tl.rank_genes_groups().

scale bool (default: True)

Scale the per-gene sub-matrix before computing scores.

split_by str | None (default: None)

.obs column with a condition/cell-type annotation if perturbations are condition specific.

pval_cutoff float (default: 0.05)

Adjusted p-value cutoff for differentially expressed genes.

perturbation_type str (default: 'KO')

Accepted for pertpy.tl.Mixscape.mixscale API compatibility; has no effect on the continuous score.

copy bool (default: False)

Whether to return a copy of adata.

Return type:

AnnData | None

Returns:

Returns the modified copy if copy=True, otherwise writes adata.obs[new_class_name] in place and returns None. Higher absolute values indicate a stronger perturbation effect; control cells and any gene that cannot be scored receive 0.

mixscape#

Mixscape.mixscape(adata, pert_key, control, *, new_class_name='mixscape_class', layer=None, min_de_genes=5, logfc_threshold=0.25, de_layer=None, test_method='wilcoxon', iter_num=10, scale=True, split_by=None, pval_cutoff=0.05, perturbation_type='KO', random_state=0, copy=False)[source]#

Identify perturbed and escaping cells per target gene.

For each target gene, differentially expressed genes are found against the control, the perturbation signature is projected onto the gene-specific perturbation direction, and a two-component spherical Gaussian mixture (with the control component held fixed) iteratively separates knocked-out (perturbed) from non-perturbed cells.

Parameters:
adata AnnData

The annotated data object.

pert_key str

The column of .obs with target gene labels.

control str

Control category in pert_key.

new_class_name str (default: 'mixscape_class')

Name of the .obs column for the classification result.

layer str | None (default: None)

Layer used for the mixture. Defaults to .layers["X_pert"].

min_de_genes int (default: 5)

Minimum number of differentially expressed genes required to test a gene for perturbation.

logfc_threshold float (default: 0.25)

Minimum absolute log fold change for a gene to count as differentially expressed.

de_layer str | None (default: None)

Layer used for differential expression. None uses .X.

test_method str (default: 'wilcoxon')

Differential-expression test passed to rapids_singlecell.tl.rank_genes_groups().

iter_num int (default: 10)

Maximum number of refinement iterations.

scale bool (default: True)

Scale the mixture input before fitting.

split_by str | None (default: None)

.obs column with a condition/cell-type annotation if perturbations are condition specific.

pval_cutoff float (default: 0.05)

Adjusted p-value cutoff for differentially expressed genes.

perturbation_type str (default: 'KO')

Label suffix used for perturbed cells (e.g. "KO").

random_state int (default: 0)

Accepted for pertpy.tl.Mixscape API compatibility; has no effect, as the spherical mixture is initialized deterministically from the per-gene projection statistics.

copy bool (default: False)

Whether to return a copy of adata.

Return type:

AnnData | None

Returns:

Returns the modified copy if copy=True, otherwise writes the results to adata in place and returns None. The results are adata.obs[new_class_name] (per-gene classification), adata.obs[f"{new_class_name}_global"] (perturbed/NP/NT), adata.obs[f"{new_class_name}_p_{perturbation_type.lower()}"] (posterior probability) and adata.uns["mixscape"].

perturbation_signature#

Mixscape.perturbation_signature(adata, pert_key, control, *, ref_selection_mode='nn', split_by=None, n_neighbors=20, use_rep=None, n_dims=15, n_pcs=None, knn_algorithm='brute', knn_kwargs=None, copy=False)[source]#

Calculate the perturbation signature.

The perturbation signature replaces each cell’s expression with the residual against comparable control cells, removing confounding variation so that what remains reflects the perturbation. The result is written to adata.layers["X_pert"]. As in the original implementation, this is intended to run on unscaled log-normalized data.

Parameters:
adata AnnData

The annotated data object.

pert_key str

The column of .obs with perturbation categories; must also contain control.

control str

Name of the control category in pert_key.

ref_selection_mode Literal['nn', 'split_by'] (default: 'nn')

How reference cells are selected. "nn" uses the n_neighbors nearest control cells in the chosen representation; "split_by" uses all control cells within the same split_by group.

split_by str | None (default: None)

Column of .obs used to compute the signature separately per group (e.g. biological replicate). Required for ref_selection_mode="split_by".

n_neighbors int (default: 20)

Number of control neighbors used for ref_selection_mode="nn". Capped to the number of control cells available in each split, so a split with fewer controls than n_neighbors still runs (pertpy would error).

use_rep str | None (default: None)

Representation to use for neighbor selection. "X" or any .obsm key. If None, .X is used when n_vars is below 50, otherwise "X_pca" (computed if absent).

n_dims int | None (default: 15)

Number of dimensions of the representation to use. None uses all.

n_pcs int | None (default: None)

Number of principal components to compute if a PCA representation is built.

knn_algorithm str (default: 'brute')

Nearest-neighbor backend for ref_selection_mode="nn": "brute" (exact, default), or the approximate cuVS backends "ivfflat", "cagra", "ivfpq" which are much faster for large datasets.

knn_kwargs dict | None (default: None)

Extra parameters for the approximate backends (e.g. n_lists / n_probes for "ivfflat").

copy bool (default: False)

Whether to return a copy of adata.

Return type:

AnnData | None

Returns:

Returns the modified copy if copy=True, otherwise writes adata.layers["X_pert"] in place and returns None.