rapids_singlecell.pp.neighbors#
- rapids_singlecell.pp.neighbors(adata, n_neighbors=15, n_pcs=None, *, use_rep=None, random_state=0, algorithm='brute', metric='euclidean', metric_kwds=mappingproxy({}), key_added=None, copy=False)[source]#
Compute a neighborhood graph of observations with cuml.
The neighbor search efficiency of this heavily relies on cuml, which also provides a method for estimating connectivities of data points - the connectivity of the manifold.
- Parameters:
- adata
AnnData
Annotated data matrix.
- n_neighbors
int
(default:15
) The size of local neighborhood (in terms of number of neighboring data points) used for manifold approximation. Larger values result in more global views of the manifold, while smaller values result in more local data being preserved. In general values should be in the range 2 to 100.
- n_pcs
int
|None
(default:None
) Use this many PCs. If
n_pcs==0
use.X
ifuse_rep is None
.- use_rep
str
|None
(default:None
) Use the indicated representation.
'X'
or any key for.obsm
is valid. If None, the representation is chosen automatically: For .n_vars < 50, .X is used, otherwise'X_pca'
is used. If'X_pca'
is not present, it’s computed with default parameters orn_pcs
if present.- random_state
None
|int
|RandomState
(default:0
) A numpy random seed.
- algorithm
Literal
['brute'
,'ivfflat'
,'ivfpq'
,'cagra'
] (default:'brute'
) - The query algorithm to use. Valid options are:
’brute’: Brute-force search that computes distances to all data points, guaranteeing exact results.
’ivfflat’: Uses inverted file indexing to partition the dataset into coarse quantizer cells and performs the search within the relevant cells.
’ivfpq’: Combines inverted file indexing with product quantization to encode sub-vectors of the dataset, facilitating faster distance computation.
’cagra’: Employs the Compressed, Accurate Graph-based search to quickly find nearest neighbors by traversing a graph structure.
Please ensure that the chosen algorithm is compatible with your dataset and the specific requirements of your search problem.
- metric
Union
[Literal
['l2'
,'chebyshev'
,'manhattan'
,'taxicab'
,'correlation'
,'inner_product'
,'euclidean'
,'canberra'
,'lp'
,'minkowski'
,'cosine'
,'jensenshannon'
,'linf'
,'cityblock'
,'l1'
,'haversine'
,'sqeuclidean'
],Literal
['canberra'
,'chebyshev'
,'cityblock'
,'cosine'
,'euclidean'
,'hellinger'
,'inner_product'
,'jaccard'
,'l1'
,'l2'
,'linf'
,'lp'
,'manhattan'
,'minkowski'
,'taxicab'
]] (default:'euclidean'
) A known metric’s name or a callable that returns a distance.
- metric_kwds
Mapping
[str
,Any
] (default:mappingproxy({})
) Options for the metric.
- key_added
str
|None
(default:None
) If not specified, the neighbors data is stored in .uns[‘neighbors’], distances and connectivities are stored in .obsp[‘distances’] and .obsp[‘connectivities’] respectively. If specified, the neighbors data is added to .uns[key_added], distances are stored in .obsp[key_added+’_distances’] and connectivities in .obsp[key_added+’_connectivities’].
- copy
bool
(default:False
) Return a copy instead of writing to adata.
- adata
- Return type:
- Returns:
Depending on
copy
, updates or returnsadata
with the following:See
key_added
parameter description for the storage path of connectivities and distances.- connectivitiessparse matrix of dtype
float32
. Weighted adjacency matrix of the neighborhood graph of data points. Weights should be interpreted as connectivities.
- distancessparse matrix of dtype
float32
. Instead of decaying weights, this stores distances for each pair of neighbors.
- connectivitiessparse matrix of dtype