rapids_singlecell.dcg.ulm

rapids_singlecell.dcg.ulm#

rapids_singlecell.dcg.ulm = <rapids_singlecell.decoupler_gpu._helper._Method.Method object>[source]#

Univariate Linear Model (ULM).

This approach uses the molecular features from one observation as the population of samples and it fits a linear model with a single covariate, which is the feature weights of a set \(F\).

\[y_i = \beta_0 + \beta_1 x_i + \varepsilon, \quad i = 1, 2, \ldots, n\]

Where:

\(y_i\) is the observed feature statistic (e.g. gene expression, \(log_{2}FC\), etc.) for feature \(i\)
\(x_i\) is the weight of feature \(i\) in feature set \(F\). For unweighted sets, membership in the set is indicated by 1, and non-membership by 0.
\(\beta_0\) is the intercept
\(\beta_1\) is the slope coefficient
\(\varepsilon\) is the error term for feature \(i\)

The enrichment score \(ES\) is then calculated as the t-value of the slope coefficient.

\[ES = t_{\beta_1} = \frac{\hat{\beta}_1}{\mathrm{SE}(\hat{\beta}_1)}\]

Where:

\(t_{\beta_1}\) is the t-value of the slope
\(\mathrm{SE}(\hat{\beta}_1)\) is the standard error of the slope

Next, \(p_{value}\) are obtained by evaluating the two-sided survival function (\(sf\)) of the Student’s t-distribution.

\[p_{value} = 2 \times \mathrm{sf}(|ES|, \text{df})\]

Finally, the obtained \(p_{value}\) are adjusted by Benjamini-Hochberg correction.

Parameters:

data: AnnData instance, DataFrame or tuple of [matrix, samples, features].
net: Dataframe in long format. Must include source and target columns, and optionally a weight column.
tmin default: 5: Minimum number of targets per source. Sources with fewer targets will be removed.
layer: Layer key name of an anndata.AnnData instance.
raw default: False: Whether to use the .raw attribute of anndata.AnnData.
empty default: True: Whether to remove empty observations (rows) or features (columns).
bsize default: 5000: For large datasets in sparse format, this parameter controls how many observations are processed at once. Increasing this value speeds up computation but uses more memory.
verbose default: False: Whether to display progress messages and additional execution details.
pre_load default: False: Whether to pre-load the data into memory. If True, the data will be pre-loaded into memory before processing.
adj_pv_gpu default: False: Whether to use GPU for adjusting p-values.
tval: Whether to return the t-value (tval=True) the coefficient of the fitted model (tval=False).

Returns:

Enrichment scores \(ES\) and, if applicable, adjusted \(p_{value}\) by Benjamini-Hochberg.

Example

import decoupler as dc

adata, net = dc.ds.toy()
rsc.dcg.ulm(adata, net, tmin=3)

rapids_singlecell.dcg.ulm

Contents

rapids_singlecell.dcg.ulm#