rapids_singlecell.pp.pca

Contents

rapids_singlecell.pp.pca#

rapids_singlecell.pp.pca(adata, n_comps=None, *, layer=None, zero_center=True, svd_solver=None, random_state=0, mask_var=<object object>, use_highly_variable=None, dtype='float32', chunked=False, chunk_size=None, key_added=None, copy=False, **kwargs)[source]#

Principal component analysis using GPU acceleration [HMT09, TQOA24].

Uses the following implementations based on data type (defaults for svd_solver in parentheses):

Dense

Sparse

Dask

zero_center=True

cuML PCA ('full')

Custom ('lanczos' if n_vars > 8k, else 'covariance_eigh')

Custom ('covariance_eigh')

zero_center=False

cuML TruncatedSVD ('full')

Custom ('lanczos' if n_vars > 8k, else 'covariance_eigh')

Custom ('covariance_eigh')

chunked=True

cuML IncrementalPCA

cuML IncrementalPCA

Not supported

Parameters:
adata AnnData

AnnData object

n_comps int | None (default: None)

Number of principal components to compute. Defaults to 50, or 1 - minimum dimension size of selected representation.

layer str (default: None)

If provided, use adata.layers[layer] for expression values instead of adata.X.

zero_center bool (default: True)

If True, compute standard PCA from covariance matrix. If False, omit zero-centering variables (truncated SVD).

svd_solver str | None (default: None)

SVD solver to use. See table above for which implementation is used based on data type, as well as the default solver when svd_solver=None.

None

Choose automatically based on data type (see table above).

'covariance_eigh'

Eigendecomposition of the covariance matrix. Fast for sparse matrices with fewer than ~8,000 features. Works with Dask arrays.

'lanczos'

Lanczos bidiagonalization with implicit restarts. Memory efficient for large sparse matrices (>8,000 features). Best singular value accuracy. Does not support Dask arrays.

'randomized'

Randomized SVD (Halko et al. 2009) with CholeskyQR2 orthogonalization (Tomás et al. 2024). Faster than Lanczos but approximate. Does not support Dask arrays.

'full'

cuML: Full eigendecomposition of covariance matrix. For dense arrays only.

'jacobi'

cuML: Jacobi iterative solver. Faster but less accurate. For dense arrays only.

random_state int | None (default: 0)

Random state for initialization.

mask_var ndarray[tuple[Any, ...], dtype[bool]] | str | None (default: <object object at 0x7debb0a411d0>)

Mask to use for the PCA computation. If None, all variables are used. If np.ndarray, use the provided mask. If str, use the mask stored in adata.var[mask_var].

use_highly_variable bool | None (default: None)

Whether to use highly variable genes only, stored in .var['highly_variable']. By default uses them if they have been determined beforehand.

dtype str (default: 'float32')

Numpy data type string to which to convert the result.

chunked bool (default: False)

If True, perform an incremental PCA on segments of chunk_size. The incremental PCA automatically zero centers and ignores settings of random_seed and svd_solver. If False, perform a full PCA.

chunk_size int (default: None)

Number of observations to include in each chunk. Required if chunked=True was passed.

key_added str | None (default: None)

If not specified, the embedding is stored as obsm['X_pca'], the loadings as varm['PCs'], and the parameters in uns['pca']. If specified, the embedding is stored as obsm[key_added], the loadings as varm[key_added], and the parameters in uns[key_added].

copy bool (default: False)

Whether to return a copy or update adata.

**kwargs

Additional arguments for specific SVD solvers. For svd_solver='randomized':

  • n_oversamples: Extra random vectors for better approximation. Higher values improve accuracy. Default is 10.

  • n_iter: Number of power iterations. Higher values improve accuracy for matrices with slowly decaying singular values. Default is 2.

Return type:

None | AnnData

Returns:

adds fields to adata:

.obsm['X_pca' | key_added]

PCA representation of data.

.varm['PCs' | key_added]

The principal components containing the loadings.

.uns['pca' | key_added]['variance_ratio']

Ratio of explained variance.

.uns['pca' | key_added]['variance']

Explained variance, equivalent to the eigenvalues of the covariance matrix.