scanpy-GPU

scanpy-GPU#

These functions offer accelerated near drop-in replacements for common tools provided by scanpy [WAT18].

Preprocessing `pp`#

Filtering of highly-variable genes, batch-effect correction, per-cell normalization.

Any transformation of the data matrix that is not a tool. Other than tools, preprocessing steps usually don’t return an easily interpretable annotation, but perform a basic transformation on the data matrix.

Basic Preprocessing#

`pp.calculate_qc_metrics`(adata, *[, ...])	Calculates basic qc Parameters [MCLW17].
`pp.filter_cells`(data, *[, min_counts, ...])	Filter cell outliers based on counts and numbers of genes expressed.
`pp.filter_genes`(data, *[, min_counts, ...])	Filter genes based on number of cells or counts.
`pp.normalize_total`(adata, *[, target_sum, ...])	Normalizes rows in matrix so they sum to `target_sum`.
`pp.log1p`(data, *[, base, layer, obsm, ...])	Logarithmize the data matrix.
`pp.sqrt`(data, *[, layer, obsm, inplace, copy])	Take the square root of the data matrix.
`pp.highly_variable_genes`(adata, *[, layer, ...])	Annotate highly variable genes [AH19, LBK21, SFG+15, SBH+19, ZTB+17].
`pp.regress_out`(adata, keys, *[, layer, ...])	Use linear regression to adjust for the effects of unwanted noise and variation.
`pp.scale`(data, *[, zero_center, max_value, ...])	Scales matrix to unit variance and clips values
`pp.pca`(data[, n_comps, layer, zero_center, ...])	Principal component analysis using GPU acceleration [HMT09, TQOA24].
`pp.normalize_pearson_residuals`(adata, *[, ...])	Applies analytic Pearson residual normalization [LBK21].

Batch effect correction#

pp.harmony_integrate(adata, key, *[, basis, ...])

Integrate different experiments using the Harmony algorithm [KMF+19, PYM+26].

Doublet detection#

`pp.scrublet`(adata[, adata_sim, batch_key, ...])	Predict doublets using Scrublet [WLK19].
`pp.scrublet_simulate_doublets`(adata, *[, ...])	Simulate doublets by adding the counts of random observed transcriptome pairs.

Neighbors#

`pp.neighbors`(adata[, n_neighbors, n_pcs, ...])	Compute a neighborhood graph of observations [ONN+24].
`pp.bbknn`(adata[, neighbors_within_batch, ...])	Batch balanced KNN [PYM+19], altering the KNN procedure to identify each cell's top neighbours in each batch separately instead of the entire cell pool with no accounting for batch.

Tools: `tl`#

tools offers tools for the accelerated processing of AnnData. For visualization use scanpy.pl.

Embedding#

`tl.umap`(adata, *[, min_dist, spread, ...])	Embed the neighborhood graph using UMAP [MHM18] [NLR+21].
`tl.tsne`(adata[, n_pcs, use_rep, perplexity, ...])	t-SNE [vdMH08] [CRHC18].
`tl.diffmap`(adata[, n_comps, neighbors_key, ...])	Diffusion Maps [CLL+05, HBT15].
`tl.draw_graph`(adata, *[, init_pos, ...])	Force-directed graph drawing [FR91, JVHB14].
`tl.embedding_density`(adata[, basis, ...])	Calculate the density of cells in an embedding (per condition).

Data integration#

tl.ingest(adata, adata_ref, *[, obs, ...])

Map labels and embeddings from reference data to new data.

Clustering#

`tl.louvain`(adata[, resolution, restrict_to, ...])	Cluster cells into subgroups using the Louvain algorithm [BGLL08].
`tl.leiden`(adata[, resolution, random_state, ...])	Cluster cells into subgroups using the Leiden algorithm [TWvE19].

Gene scores, Cell cycle#

`tl.score_genes`(adata, gene_list, *[, ...])	Score a set of genes [SFG+15, TIP+16].
`tl.score_genes_cell_cycle`(adata, *, s_genes, ...)	Score cell cycle genes [SNS+15].

Marker genes#

tl.rank_genes_groups(adata, groupby, *[, ...])

Rank genes for characterizing groups using GPU acceleration.

Plotting#

For plotting please use scanpy’s plotting API scanpy.pl.