rapids_singlecell.pp.neighbors

Contents

rapids_singlecell.pp.neighbors#

rapids_singlecell.pp.neighbors(adata, n_neighbors=15, n_pcs=None, use_rep=None, random_state=0, algorithm='brute', metric='euclidean', metric_kwds=mappingproxy({}), key_added=None, copy=False)[source]#

Compute a neighborhood graph of observations with cuml.

The neighbor search efficiency of this heavily relies on cuml, which also provides a method for estimating connectivities of data points - the connectivity of the manifold.

Parameters:
adata AnnData

Annotated data matrix.

n_neighbors int (default: 15)

The size of local neighborhood (in terms of number of neighboring data points) used for manifold approximation. Larger values result in more global views of the manifold, while smaller values result in more local data being preserved. In general values should be in the range 2 to 100.

n_pcs Optional[int] (default: None)

Use this many PCs. If n_pcs==0 use .X if use_rep is None.

use_rep Optional[str] (default: None)

Use the indicated representation. 'X' or any key for .obsm is valid. If None, the representation is chosen automatically: For .n_vars < 50, .X is used, otherwise 'X_pca' is used. If 'X_pca' is not present, it’s computed with default parameters or n_pcs if present.

random_state Union[None, int, RandomState] (default: 0)

A numpy random seed.

algorithm Literal['brute', 'ivfflat', 'ivfpq', 'cagra'] (default: 'brute')

The query algorithm to use. Valid options are:
  • ’brute’: Brute-force search that computes distances to all data points, guaranteeing exact results.

  • ’ivfflat’: Uses inverted file indexing to partition the dataset into coarse quantizer cells and performs the search within the relevant cells.

  • ’ivfpq’: Combines inverted file indexing with product quantization to encode sub-vectors of the dataset, facilitating faster distance computation.

  • ’cagra’: Employs the Compressed, Accurate Graph-based search to quickly find nearest neighbors by traversing a graph structure.

Please ensure that the chosen algorithm is compatible with your dataset and the specific requirements of your search problem.

metric Union[Literal['l2', 'chebyshev', 'manhattan', 'taxicab', 'correlation', 'inner_product', 'euclidean', 'canberra', 'lp', 'minkowski', 'cosine', 'jensenshannon', 'linf', 'cityblock', 'l1', 'haversine', 'sqeuclidean'], Literal['canberra', 'chebyshev', 'cityblock', 'cosine', 'euclidean', 'hellinger', 'inner_product', 'jaccard', 'l1', 'l2', 'linf', 'lp', 'manhattan', 'minkowski', 'taxicab']] (default: 'euclidean')

A known metric’s name or a callable that returns a distance.

metric_kwds Mapping[str, Any] (default: mappingproxy({}))

Options for the metric.

key_added Optional[str] (default: None)

If not specified, the neighbors data is stored in .uns[‘neighbors’], distances and connectivities are stored in .obsp[‘distances’] and .obsp[‘connectivities’] respectively. If specified, the neighbors data is added to .uns[key_added], distances are stored in .obsp[key_added+’_distances’] and connectivities in .obsp[key_added+’_connectivities’].

copy bool (default: False)

Return a copy instead of writing to adata.

Return type:

Optional[AnnData]

Returns:

Depending on copy, updates or returns adata with the following:

See key_added parameter description for the storage path of connectivities and distances.

connectivitiessparse matrix of dtype float32.

Weighted adjacency matrix of the neighborhood graph of data points. Weights should be interpreted as connectivities.

distancessparse matrix of dtype float32.

Instead of decaying weights, this stores distances for each pair of neighbors.