rapids_singlecell.pp.pca#
- rapids_singlecell.pp.pca(adata, n_comps=None, *, layer=None, zero_center=True, svd_solver=None, random_state=0, mask_var=<object object>, use_highly_variable=None, dtype='float32', chunked=False, chunk_size=None, key_added=None, copy=False, **kwargs)[source]#
Principal component analysis using GPU acceleration [HMT09, TQOA24].
Uses the following implementations based on data type (defaults for
svd_solverin parentheses):Dense
Sparse
Dask
zero_center=TruecuML PCA (
'full')Custom (
'lanczos'if n_vars > 8k, else'covariance_eigh')Custom (
'covariance_eigh')zero_center=FalsecuML TruncatedSVD (
'full')Custom (
'lanczos'if n_vars > 8k, else'covariance_eigh')Custom (
'covariance_eigh')chunked=TruecuML IncrementalPCA
cuML IncrementalPCA
Not supported
- Parameters:
- adata
AnnData AnnData object
- n_comps
int|None(default:None) Number of principal components to compute. Defaults to 50, or 1 - minimum dimension size of selected representation.
- layer
str(default:None) If provided, use
adata.layers[layer]for expression values instead ofadata.X.- zero_center
bool(default:True) If
True, compute standard PCA from covariance matrix. IfFalse, omit zero-centering variables (truncated SVD).- svd_solver
str|None(default:None) SVD solver to use. See table above for which implementation is used based on data type, as well as the default solver when
svd_solver=None.NoneChoose automatically based on data type (see table above).
'covariance_eigh'Eigendecomposition of the covariance matrix. Fast for sparse matrices with fewer than ~8,000 features. Works with Dask arrays.
'lanczos'Lanczos bidiagonalization with implicit restarts. Memory efficient for large sparse matrices (>8,000 features). Best singular value accuracy. Does not support Dask arrays.
'randomized'Randomized SVD (Halko et al. 2009) with CholeskyQR2 orthogonalization (Tomás et al. 2024). Faster than Lanczos but approximate. Does not support Dask arrays.
'full'cuML: Full eigendecomposition of covariance matrix. For dense arrays only.
'jacobi'cuML: Jacobi iterative solver. Faster but less accurate. For dense arrays only.
- random_state
int|None(default:0) Random state for initialization.
- mask_var
ndarray[tuple[Any,...],dtype[bool]] |str|None(default:<object object at 0x7debb0a411d0>) Mask to use for the PCA computation. If
None, all variables are used. Ifnp.ndarray, use the provided mask. Ifstr, use the mask stored inadata.var[mask_var].- use_highly_variable
bool|None(default:None) Whether to use highly variable genes only, stored in
.var['highly_variable']. By default uses them if they have been determined beforehand.- dtype
str(default:'float32') Numpy data type string to which to convert the result.
- chunked
bool(default:False) If
True, perform an incremental PCA on segments ofchunk_size. The incremental PCA automatically zero centers and ignores settings ofrandom_seedandsvd_solver. IfFalse, perform a full PCA.- chunk_size
int(default:None) Number of observations to include in each chunk. Required if
chunked=Truewas passed.- key_added
str|None(default:None) If not specified, the embedding is stored as
obsm['X_pca'], the loadings asvarm['PCs'], and the parameters inuns['pca']. If specified, the embedding is stored asobsm[key_added], the loadings asvarm[key_added], and the parameters inuns[key_added].- copy
bool(default:False) Whether to return a copy or update
adata.- **kwargs
Additional arguments for specific SVD solvers. For
svd_solver='randomized':n_oversamples: Extra random vectors for better approximation. Higher values improve accuracy. Default is 10.n_iter: Number of power iterations. Higher values improve accuracy for matrices with slowly decaying singular values. Default is 2.
- adata
- Return type:
- Returns:
adds fields to
adata:.obsm['X_pca' | key_added]PCA representation of data.
.varm['PCs' | key_added]The principal components containing the loadings.
.uns['pca' | key_added]['variance_ratio']Ratio of explained variance.
.uns['pca' | key_added]['variance']Explained variance, equivalent to the eigenvalues of the covariance matrix.