scanpy-GPU#
These functions offer accelerated near drop-in replacements for common tools provided by scanpy [WAT18].
Preprocessing pp#
Filtering of highly-variable genes, batch-effect correction, per-cell normalization.
Any transformation of the data matrix that is not a tool.
Other than tools, preprocessing steps usually don’t return an easily interpretable annotation, but perform a basic transformation on the data matrix.
Basic Preprocessing#
|
Calculates basic qc Parameters [MCLW17]. |
|
Filter cell outliers based on counts and numbers of genes expressed. |
|
Filter genes based on number of cells or counts. |
|
Normalizes rows in matrix so they sum to |
|
Logarithmize the data matrix. |
|
Annotate highly variable genes [AH19, LBK21, SFG+15, SBH+19, ZTB+17]. |
|
Use linear regression to adjust for the effects of unwanted noise and variation. |
|
Scales matrix to unit variance and clips values |
|
Principal component analysis using GPU acceleration [HMT09, TQOA24]. |
|
Applies analytic Pearson residual normalization [LBK21]. |
|
Flags a gene or gene_family in .var with boolean. |
|
Filters the |
Batch effect correction#
|
Integrate different experiments using the Harmony algorithm [KMF+19, PYM+26]. |
Doublet detection#
|
Predict doublets using Scrublet [WLK19]. |
|
Simulate doublets by adding the counts of random observed transcriptome pairs. |
Neighbors#
|
Compute a neighborhood graph of observations [ONN+24]. |
|
Batch balanced KNN [PYM+19], altering the KNN procedure to identify each cell's top neighbours in each batch separately instead of the entire cell pool with no accounting for batch. |
Tools: tl#
tools offers tools for the accelerated processing of AnnData. For visualization use scanpy.pl.
Embedding#
|
|
|
|
|
|
|
|
|
Calculate the density of cells in an embedding (per condition). Gaussian kernel density estimation is used to calculate the density of cells in an embedded space. This can be performed per category over a categorical cell annotation. The cell density can be plotted using the |
Clustering#
|
Cluster cells into subgroups using the Louvain algorithm [BGLL08]. |
|
Cluster cells into subgroups using the Leiden algorithm [TWvE19]. |
|
KMeans is a basic but powerful clustering method which is optimized via Expectation Maximization. |
Gene scores, Cell cycle#
|
|
|
Score cell cycle genes [SNS+15]. |
Marker genes#
|
Rank genes for characterizing groups using GPU acceleration. |
Plotting#
For plotting please use scanpy’s plotting API scanpy.pl.