rapids_singlecell.tl.umap

Contents

rapids_singlecell.tl.umap#

rapids_singlecell.tl.umap(adata, *, min_dist=0.5, spread=1.0, n_components=2, maxiter=None, alpha=1.0, negative_sample_rate=5, init_pos='auto', random_state=0, a=None, b=None, key_added=None, neighbors_key=None, copy=False)[source]#

Embed the neighborhood graph using UMAP [MHM18] [NLR+21].

UMAP (Uniform Manifold Approximation and Projection) is a manifold learning technique suitable for visualizing high-dimensional data. Besides tending to be faster than tSNE, it optimizes the embedding such that it best reflects the topology of the data, which we represent throughout rapids-singlecell using a neighborhood graph. tSNE, by contrast, optimizes the distribution of nearest-neighbor distances in the embedding such that these best match the distribution of distances in the high-dimensional space.

Parameters:
adata AnnData

Annotated data matrix.

min_dist float (default: 0.5)

The effective minimum distance between embedded points. Smaller values will result in a more clustered/clumped embedding where nearby points on the manifold are drawn closer together, while larger values will result on a more even dispersal of points. The value should be set relative to the spread value, which determines the scale at which embedded points will be spread out.

spread float (default: 1.0)

The effective scale of embedded points. In combination with min_dist this determines how clustered/clumped the embedded points are.

n_components int (default: 2)

The number of dimensions of the embedding.

maxiter int | None (default: None)

The number of iterations (epochs) of the optimization. Called n_epochs in the original UMAP.

alpha float (default: 1.0)

The initial learning rate for the embedding optimization.

negative_sample_rate int (default: 5)

The number of negative edge/1-simplex samples to use per positive edge/1-simplex sample in optimizing the low dimensional embedding.

init_pos Union[Literal['auto', 'spectral', 'random', 'paga'], ndarray, ndarray, str, None] (default: 'auto')

How to initialize the low dimensional embedding. Called init in the original UMAP. Options are:

  • ’auto’: chooses ‘spectral’ for 'n_samples' < 1000000, ‘random’ otherwise.

  • ’spectral’: use a spectral embedding of the graph.

  • ’random’: assign initial embedding positions at random.

  • ’paga’: use the paga() layout as initial embedding positions.

  • Array of shape (n_obs, 2)

  • Any key for obsm

Note

If your embedding looks odd it’s recommended setting init_pos to ‘random’.

random_state int (default: 0)

int, random_state is the seed used by the random number generator

a float | None (default: None)

More specific parameters controlling the embedding. If None these values are set automatically as determined by min_dist and spread.

b float | None (default: None)

More specific parameters controlling the embedding. If None these values are set automatically as determined by min_dist and spread.

key_added str | None (default: None)

If not specified, the embedding is stored as obsm['X_umap'] and the the parameters in uns['umap']. If specified, the embedding is stored as obsm[key_added] and the the parameters in uns[key_added].

neighbors_key str | None (default: None)

If not specified, umap looks .uns[‘neighbors’] for neighbors settings and .obsp[‘connectivities’] for connectivities (default storage places for pp.neighbors). If specified, umap looks .uns[neighbors_key] for neighbors settings and .obsp[.uns[neighbors_key][‘connectivities_key’]] for connectivities.

copy bool (default: False)

Return a copy instead of writing to adata.

Return type:

AnnData | None

Returns:

Depending on copy, returns or updates adata with the following fields.

adata.obsm['X_umap' | key_added]ndarray (dtype float)

UMAP coordinates of data.

adata.uns['umap' | key_added]['params']dict

UMAP parameters a, b, and random_state (if specified).