rapids_singlecell.pp.neighbors#
- rapids_singlecell.pp.neighbors(adata, n_neighbors=15, n_pcs=None, *, use_rep=None, random_state=0, algorithm='brute', metric='euclidean', metric_kwds=mappingproxy({}), algorithm_kwds=mappingproxy({}), method='umap', key_added=None, copy=False)[source]#
Compute a neighborhood graph of observations [ONN+24].
The neighbor search efficiency of this heavily relies on cuVS, which also provides a method for estimating connectivities of data points - the connectivity of the manifold.
- Parameters:
- adata
AnnData Annotated data matrix.
- n_neighbors
int(default:15) The size of local neighborhood (in terms of number of neighboring data points) used for manifold approximation. Larger values result in more global views of the manifold, while smaller values result in more local data being preserved. In general values should be in the range 2 to 100.
- n_pcs
int|None(default:None) Use this many PCs. If
n_pcs==0use.Xifuse_rep is None.- use_rep
str|None(default:None) Use the indicated representation.
'X'or any key for.obsmis valid. If None, the representation is chosen automatically: For .n_vars < 50, .X is used, otherwise'X_pca'is used. If'X_pca'is not present, it’s computed with default parameters orn_pcsif present.- random_state
None|int|RandomState(default:0) A numpy random seed.
- algorithm
Literal['brute','cagra','ivfflat','ivfpq','nn_descent','all_neighbors','mg_ivfflat','mg_ivfpq'] (default:'brute') - The query algorithm to use. Valid options are:
'brute'Brute-force search that computes distances to all data points, guaranteeing exact results.
'ivfflat'Uses inverted file indexing to partition the dataset into coarse quantizer cells and performs the search within the relevant cells.
'ivfpq'Combines inverted file indexing with product quantization to encode sub-vectors of the dataset, facilitating faster distance computation.
'cagra'Employs the Compressed, Accurate Graph-based search to quickly find nearest neighbors by traversing a graph structure.
'nn_descent'Uses the NN-descent algorithm to approximate the k-nearest neighbors. Note: Performance may be degraded when Dask is active.
'all_neighbors'Uses the all-neighbors algorithm to approximate the k-nearest neighbors. Note: Performance may be degraded when Dask is active and algorithm is
nn_descent.'mg_ivfflat'Uses the Multi-GPU inverted file indexing to partition the dataset into coarse quantizer cells and performs the search within the relevant cells.
'mg_ivfpq'Combines Multi-GPU inverted file indexing with product quantization to encode sub-vectors of the dataset, facilitating faster distance computation.
Please ensure that the chosen algorithm is compatible with your dataset and the specific requirements of your search problem.
- metric
Union[Literal['l2','chebyshev','manhattan','taxicab','correlation','inner_product','euclidean','canberra','lp','minkowski','cosine','jensenshannon','linf','cityblock','l1','haversine','sqeuclidean'],Literal['canberra','chebyshev','cityblock','cosine','euclidean','hellinger','inner_product','jaccard','l1','l2','linf','lp','manhattan','minkowski','taxicab']] (default:'euclidean') A known metric’s name or a callable that returns a distance.
- metric_kwds
Mapping[str,Any] (default:mappingproxy({})) Options for the metric.
- method
Literal['umap','gauss','jaccard'] (default:'umap') Method for computing connectivities.
'umap'UMAP fuzzy simplicial set (default).
'gauss'Adaptive Gaussian kernel.
'jaccard'PhenoGraph Jaccard index.
- algorithm_kwds
Mapping[str,Any] (default:mappingproxy({})) Options for the algorithm. For
ivfflatandivfpqalgorithms, the following parameters can be specified:’n_lists’: Number of inverted lists for IVF indexing. Default is 2 * next_power_of_2(sqrt(n_samples)).
’n_probes’: Number of lists to probe during search. Default is 20. Higher values increase accuracy but reduce speed.
For
nn_descentalgorithm, the following parameters can be specified:’intermediate_graph_degree’: The degree of the intermediate graph. Default is None. It is recommended to set it to
>= 1.5 * n_neighbors.
For
all_neighborsalgorithm, the following parameters can be specified:’algo’: The algorithm to use. Valid options are: ‘ivf_pq’ and ‘nn_descent’. Default is ‘nn_descent’.
’n_clusters’: Number of clusters/batches to partition the dataset into (> overlap_factor). Default is number of GPUs.
’overlap_factor’: Number of clusters each point is assigned to (must be < n_clusters). Default is 1.
’n_lists’: Number of inverted lists for IVF indexing. Default is 2 * next_power_of_2(sqrt(n_samples)). Only available for
ivf_pqalgorithm.’intermediate_graph_degree’: The degree of the intermediate graph. Default is None. It is recommended to set it to
>= 1.5 * n_neighbors. Only available fornn_descentalgorithm.
For
mg_ivfflatandmg_ivfpqalgorithms, the following parameters can be specified:’distribution_mode’: The distribution mode to use. Valid options are: ‘replicated’ and ‘shared’. Default is ‘replicated’.
’n_lists’: Number of inverted lists for IVF indexing. Default is 2 * next_power_of_2(sqrt(n_samples)).
’n_probes’: Number of lists to probe during search. Default is 20. Higher values increase accuracy but reduce speed.
- key_added
str|None(default:None) If not specified, the neighbors data is stored in .uns[‘neighbors’], distances and connectivities are stored in .obsp[‘distances’] and .obsp[‘connectivities’] respectively. If specified, the neighbors data is added to .uns[key_added], distances are stored in .obsp[key_added+’_distances’] and connectivities in .obsp[key_added+’_connectivities’].
- copy
bool(default:False) Return a copy instead of writing to adata.
- adata
- Return type:
- Returns:
Depending on
copy, updates or returnsadatawith the following:See
key_addedparameter description for the storage path of connectivities and distances.- connectivitiessparse matrix of dtype
float32. Weighted adjacency matrix of the neighborhood graph of data points. Weights should be interpreted as connectivities.
- distancessparse matrix of dtype
float32. Instead of decaying weights, this stores distances for each pair of neighbors.
- connectivitiessparse matrix of dtype