rapids_singlecell.tl.rank_genes_groups#
- rapids_singlecell.tl.rank_genes_groups(adata, groupby, *, mask_var=None, use_raw=None, groups='all', reference='rest', n_genes=None, rankby_abs=False, pts=False, key_added=None, method=None, corr_method='benjamini-hochberg', tie_correct=False, use_continuity=False, layer=None, chunk_size=None, pre_load=False, n_bins=None, bin_range=None, **kwds)[source]#
Rank genes for characterizing groups using GPU acceleration.
Expects logarithmized data.
Note
Dask support:
't-test','t-test_overestim_var', and'wilcoxon_binned'support Dask arrays. The'wilcoxon'and'logreg'methods do not support Dask arrays.- Parameters:
- adata
AnnData Annotated data matrix.
- groupby
str The key of the observations grouping to consider.
- mask_var
ndarray[tuple[Any,...],dtype[bool]] |str|None(default:None) Select subset of genes to use in statistical tests. Can be a boolean array of shape
(n_vars,)or a key inadata.var.- use_raw
bool|None(default:None) Use
rawattribute ofadataif present.- groups
Union[Literal['all'],Iterable[str]] (default:'all') Subset of groups, e.g. [
'g1','g2','g3'], to which comparison shall be restricted, or'all'(default), for all groups.- reference
str(default:'rest') If
'rest', compare each group to the union of the rest of the group. If a group identifier, compare with respect to this group.- n_genes
int|None(default:None) The number of genes that appear in the returned tables. Defaults to all genes.
- rankby_abs
bool(default:False) Rank genes by the absolute value of the score, not by the score. The returned scores are never the absolute values.
- pts
bool(default:False) Compute the fraction of cells expressing the genes.
- key_added
str|None(default:None) The key in
adata.unsinformation is saved to.- method
Literal['logreg','t-test','t-test_overestim_var','wilcoxon','wilcoxon_binned'] |None(default:None) 't-test'uses Welch’s t-test (default),'t-test_overestim_var'overestimates variance of each group,'wilcoxon'uses Wilcoxon rank-sum,'wilcoxon_binned'uses histogram-based approximate Wilcoxon rank-sum (faster for large datasets, supports Dask arrays),'logreg'uses logistic regression.- corr_method
Literal['benjamini-hochberg','bonferroni'] (default:'benjamini-hochberg') p-value correction method. Used only for
't-test','t-test_overestim_var','wilcoxon', and'wilcoxon_binned'.- tie_correct
bool(default:False) Use tie correction for
'wilcoxon'and'wilcoxon_binned'scores. Adjusts the variance of the rank-sum statistic for tied values. For'wilcoxon_binned', each histogram bin acts as a tie group and the correction is derived from the bin counts.- use_continuity
bool(default:False) Apply continuity correction to
'wilcoxon'and'wilcoxon_binned'z-scores. Subtracts 0.5 from|R - E[R]|before dividing by the standard deviation, matchingscipy.stats.mannwhitneyu()default behavior.- layer
str|None(default:None) Key from
adata.layerswhose value will be used to perform tests on.- chunk_size
int|None(default:None) Number of genes to process at once for
'wilcoxon'and'wilcoxon_binned'. Default is 128 for'wilcoxon'. For'wilcoxon_binned'the default is sized dynamically based onn_groupsandn_binsto keep histogram memory stable.- pre_load
bool(default:False) Pre-load the data into GPU memory. Used only for
'wilcoxon'.- n_bins
int|None(default:None) Number of histogram bins for
'wilcoxon_binned'. Higher values give a better approximation at slightly increased cost. Default is 1000 for in-memory arrays and 200 for Dask arrays.- bin_range
Optional[Literal['log1p','auto']] (default:None) How to determine the histogram bin range for
'wilcoxon_binned'.None(default) uses'auto'for in-memory arrays and'log1p'for Dask arrays (to avoid a costly data scan).'log1p'uses a fixed [0, 15] range suitable for most log1p-normalized data.'auto'computes the actual data range. Use this for z-scored or unnormalized data.- **kwds
Additional arguments passed to the method. For
'logreg', these are passed tocuml.linear_model.LogisticRegression.
- adata
- Return type:
- Returns:
Updates
adatawith the following fields:adata.uns['rank_genes_groups' | key_added]['names']Structured array to be indexed by group id storing the gene names. Ordered according to scores.
adata.uns['rank_genes_groups' | key_added]['scores']Structured array to be indexed by group id storing the z-score underlying the computation of a p-value for each gene for each group. Ordered according to scores.
adata.uns['rank_genes_groups' | key_added]['logfoldchanges']Structured array to be indexed by group id storing the log2 fold change for each gene for each group.
adata.uns['rank_genes_groups' | key_added]['pvals']p-values. Only for
't-test','t-test_overestim_var','wilcoxon', and'wilcoxon_binned'.adata.uns['rank_genes_groups' | key_added]['pvals_adj']Corrected p-values. Only for
't-test','t-test_overestim_var','wilcoxon', and'wilcoxon_binned'.adata.uns['rank_genes_groups' | key_added]['pts']Fraction of cells expressing genes per group. Only if
pts=True.adata.uns['rank_genes_groups' | key_added]['pts_rest']Fraction of cells expressing genes in rest. Only if
pts=Trueandreference='rest'.