rapids_singlecell.get.aggregate#

rapids_singlecell.get.aggregate(adata, by, func, *, axis=None, mask=None, dof=1, layer=None, obsm=None, varm=None, return_sparse=False)[source]#

Aggregate data matrix based on some categorical grouping.

This function is useful for pseudobulking as well as plotting.

Aggregation to perform is specified by func, which can be a single metric or a list of metrics. Each metric is computed over the group and results in a new layer in the output AnnData object.

If none of layer, obsm, or varm are passed in, X will be used for aggregation data. If func only has length 1 or is just an AggType, then aggregation data is written to X. Otherwise, it is written to layers or xxxm as appropriate for the dimensions of the aggregation data.

Params#

adata: AnnData to be aggregated.
by: Key of the column to be grouped-by.
func: How to aggregate.
axis: Axis on which to find group by column.
mask: Boolean mask (or key to column containing mask) to apply along the axis.
dof: Degrees of freedom for variance. Defaults to 1.
layer: If not None, key for aggregation data.
obsm: If not None, key for aggregation data.
varm: If not None, key for aggregation data.
return_sparse: Whether to return a sparse matrix. Only works for sparse input data.

rtype:: AnnData
returns:: Aggregated AnnData.

Examples

Calculating mean expression and number of nonzero entries per cluster:

>>> import scanpy as sc, pandas as pd
>>> import rapids_singlecell as rsc
>>> pbmc = sc.datasets.pbmc3k_processed().raw.to_adata()
>>> rsc.get.anndata_to_GPU(pbmc)
>>> pbmc.shape
(2638, 13714)
>>> aggregated = rsc.get.aggregate(pbmc, by="louvain", func=["mean", "count_nonzero"])
>>> aggregated
AnnData object with n_obs × n_vars = 8 × 13714
    obs: 'louvain'
    var: 'n_cells'
    layers: 'mean', 'count_nonzero'

We can group over multiple columns:

>>> pbmc.obs["percent_mito_binned"] = pd.cut(pbmc.obs["percent_mito"], bins=5)
>>> rsc.get.aggregate(pbmc, by=["louvain", "percent_mito_binned"], func=["mean", "count_nonzero"])
AnnData object with n_obs × n_vars = 40 × 13714
    obs: 'louvain', 'percent_mito_binned'
    var: 'n_cells'
    layers: 'mean', 'count_nonzero'

Note that this filters out any combination of groups that wasn’t present in the original data.

rapids_singlecell.get.aggregate

Contents

rapids_singlecell.get.aggregate#

Params#