scanpy-GPU#

These functions offer accelerated near drop-in replacements for common tools provided by scanpy [WAT18].

Preprocessing pp#

Filtering of highly-variable genes, batch-effect correction, per-cell normalization.

Any transformation of the data matrix that is not a tool. Other than tools, preprocessing steps usually don’t return an easily interpretable annotation, but perform a basic transformation on the data matrix.

Basic Preprocessing#

pp.calculate_qc_metrics(adata, *[, ...])

Calculates basic qc Parameters [MCLW17].

pp.filter_cells(data, *[, min_counts, ...])

Filter cell outliers based on counts and numbers of genes expressed.

pp.filter_genes(data, *[, min_counts, ...])

Filter genes based on number of cells or counts.

pp.normalize_total(adata, *[, target_sum, ...])

Normalizes rows in matrix so they sum to target_sum.

pp.log1p(adata, *[, base, layer, obsm, ...])

Logarithmize the data matrix.

pp.highly_variable_genes(adata, *[, layer, ...])

Annotate highly variable genes [AH19, LBK21, SFG+15, SBH+19, ZTB+17].

pp.regress_out(adata, keys, *[, layer, ...])

Use linear regression to adjust for the effects of unwanted noise and variation.

pp.scale(adata, *[, zero_center, max_value, ...])

Scales matrix to unit variance and clips values

pp.pca(adata[, n_comps, layer, zero_center, ...])

Principal component analysis using GPU acceleration [HMT09, TQOA24].

pp.normalize_pearson_residuals(adata, *[, ...])

Applies analytic Pearson residual normalization [LBK21].

pp.flag_gene_family(adata, *, gene_family_name)

Flags a gene or gene_family in .var with boolean.

pp.filter_highly_variable(adata)

Filters the AnnData object for highly_variable genes.

Batch effect correction#

pp.harmony_integrate(adata, key, *[, basis, ...])

Integrate different experiments using the Harmony algorithm [KMF+19, PYM+26].

Doublet detection#

pp.scrublet(adata[, adata_sim, batch_key, ...])

Predict doublets using Scrublet [WLK19].

pp.scrublet_simulate_doublets(adata, *[, ...])

Simulate doublets by adding the counts of random observed transcriptome pairs.

Neighbors#

pp.neighbors(adata[, n_neighbors, n_pcs, ...])

Compute a neighborhood graph of observations [ONN+24].

pp.bbknn(adata[, neighbors_within_batch, ...])

Batch balanced KNN [PYM+19], altering the KNN procedure to identify each cell's top neighbours in each batch separately instead of the entire cell pool with no accounting for batch.

Tools: tl#

tools offers tools for the accelerated processing of AnnData. For visualization use scanpy.pl.

Embedding#

tl.umap(adata, *[, min_dist, spread, ...])

Embed the neighborhood graph using UMAP [MHM18] [NLR+21].

tl.tsne(adata[, n_pcs, use_rep, perplexity, ...])

t-SNE [vdMH08] [CRHC18].

tl.diffmap(adata[, n_comps, neighbors_key, ...])

Diffusion Maps [CLL+05, HBT15].

tl.draw_graph(adata, *[, init_pos, ...])

Force-directed graph drawing [FR91, JVHB14].

tl.embedding_density(adata[, basis, ...])

Calculate the density of cells in an embedding (per condition). Gaussian kernel density estimation is used to calculate the density of cells in an embedded space. This can be performed per category over a categorical cell annotation. The cell density can be plotted using the pl.embedding_density function. Note that density values are scaled to be between 0 and 1. Thus, the density value at each cell is only comparable to densities in the same category. This function was written by Sophie Tritschler and implemented into Scanpy by Malte Luecken. :type adata: AnnData :param adata: The annotated data matrix. :type basis: str (default: 'umap') :param basis: The embedding over which the density will be calculated. This embedded representation should be found in adata.obsm['X_[basis]']`. :type groupby: str | None (default: None) :param groupby: Key for categorical observation/cell annotation for which densities are calculated per category. :type key_added: str | None (default: None) :param key_added: Name of the .obs covariate that will be added with the density estimates. :type components: str | Sequence[str] (default: None) :param components: The embedding dimensions over which the density should be calculated. This is limited to two components.

Clustering#

tl.louvain(adata[, resolution, restrict_to, ...])

Cluster cells into subgroups using the Louvain algorithm [BGLL08].

tl.leiden(adata[, resolution, random_state, ...])

Cluster cells into subgroups using the Leiden algorithm [TWvE19].

tl.kmeans(adata[, n_clusters, n_pcs, ...])

KMeans is a basic but powerful clustering method which is optimized via Expectation Maximization.

Gene scores, Cell cycle#

tl.score_genes(adata, gene_list, *[, ...])

Score a set of genes [SFG+15, TIP+16].

tl.score_genes_cell_cycle(adata, *, s_genes, ...)

Score cell cycle genes [SNS+15].

Marker genes#

tl.rank_genes_groups(adata, groupby, *[, ...])

Rank genes for characterizing groups using GPU acceleration.

Plotting#

For plotting please use scanpy’s plotting API scanpy.pl.