rapids_singlecell.get.aggregate#
- rapids_singlecell.get.aggregate(adata, by, func, *, axis=None, mask=None, dof=1, layer=None, obsm=None, varm=None, return_sparse=False)[source]#
Aggregate data matrix based on some categorical grouping.
This function is useful for pseudobulking as well as plotting.
Aggregation to perform is specified by
func
, which can be a single metric or a list of metrics. Each metric is computed over the group and results in a new layer in the outputAnnData
object.If none of
layer
,obsm
, orvarm
are passed in,X
will be used for aggregation data. Iffunc
only has length 1 or is just anAggType
, then aggregation data is written toX
. Otherwise, it is written tolayers
orxxxm
as appropriate for the dimensions of the aggregation data.Params#
- adata
AnnData
to be aggregated.- by
Key of the column to be grouped-by.
- func
How to aggregate.
- axis
Axis on which to find group by column.
- mask
Boolean mask (or key to column containing mask) to apply along the axis.
- dof
Degrees of freedom for variance. Defaults to 1.
- layer
If not None, key for aggregation data.
- obsm
If not None, key for aggregation data.
- varm
If not None, key for aggregation data.
- return_sparse
Whether to return a sparse matrix. Only works for sparse input data.
Examples
Calculating mean expression and number of nonzero entries per cluster:
>>> import scanpy as sc, pandas as pd >>> import rapids_singlecell as rsc >>> pbmc = sc.datasets.pbmc3k_processed().raw.to_adata() >>> rsc.get.anndata_to_GPU(pbmc) >>> pbmc.shape (2638, 13714) >>> aggregated = rsc.get.aggregate(pbmc, by="louvain", func=["mean", "count_nonzero"]) >>> aggregated AnnData object with n_obs × n_vars = 8 × 13714 obs: 'louvain' var: 'n_cells' layers: 'mean', 'count_nonzero'
We can group over multiple columns:
>>> pbmc.obs["percent_mito_binned"] = pd.cut(pbmc.obs["percent_mito"], bins=5) >>> rsc.get.aggregate(pbmc, by=["louvain", "percent_mito_binned"], func=["mean", "count_nonzero"]) AnnData object with n_obs × n_vars = 40 × 13714 obs: 'louvain', 'percent_mito_binned' var: 'n_cells' layers: 'mean', 'count_nonzero'
Note that this filters out any combination of groups that wasn’t present in the original data.