rapids_singlecell.dcg.mlm#
- rapids_singlecell.dcg.mlm = <rapids_singlecell.decoupler_gpu._helper._Method.Method object>[source]#
Multivariate Linear Model (MLM).
This approach uses the molecular features from one observation as the population of samples and it fits a linear model with with multiple covariates, which are the weights of all feature sets \(F\).
\[y^i = \beta_0 + \beta_1 x_{1}^{i} + \beta_2 x_{2}^{i} + \cdots + \beta_p x_{p}^{i} + \varepsilon\]Where:
\(y^i\) is the observed feature statistic (e.g. gene expression, \(log_{2}FC\), etc.) for feature \(i\)
\(x_{p}^{i}\) is the weight of feature \(i\) in feature set \(F_p\). For unweighted sets, membership in the set is indicated by 1, and non-membership by 0.
\(\beta_0\) is the intercept
\(\beta_p\) is the slope coefficient for feature set \(F_p\)
\(\varepsilon\) is the error term for feature \(i\)
The enrichment score \(ES\) for each \(F\) is then calculated as the t-value of the slope coefficients.
\[ES = t_{\beta_1} = \frac{\hat{\beta}_1}{\mathrm{SE}(\hat{\beta}_1)}\]Where:
\(t_{\beta_1}\) is the t-value of the slope
\(\mathrm{SE}(\hat{\beta}_1)\) is the standard error of the slope
Next, \(p_{value}\) are obtained by evaluating the two-sided survival function (\(sf\)) of the Student’s t-distribution.
\[p_{value} = 2 \times \mathrm{sf}(|ES|, \text{df})\]- Parameters:
- data
AnnData instance, DataFrame or tuple of [matrix, samples, features].
- net
Dataframe in long format. Must include
sourceandtargetcolumns, and optionally aweightcolumn.- tmin default:
5 Minimum number of targets per source. Sources with fewer targets will be removed.
- layer
Layer key name of an
anndata.AnnDatainstance.- raw default:
False Whether to use the
.rawattribute ofanndata.AnnData.- empty default:
True Whether to remove empty observations (rows) or features (columns).
- bsize default:
5000 For large datasets in sparse format, this parameter controls how many observations are processed at once. Increasing this value speeds up computation but uses more memory.
- verbose default:
False Whether to display progress messages and additional execution details.
- pre_load default:
False Whether to pre-load the data into memory. If
True, the data will be pre-loaded into memory before processing.- adj_pv_gpu default:
False Whether to use GPU for adjusting p-values.
- tval
Whether to return the t-value (
tval=True) the coefficient of the fitted model (tval=False).
- Returns:
Enrichment scores \(ES\) and, if applicable, adjusted \(p_{value}\) by Benjamini-Hochberg.
Example
import decoupler as dc adata, net = dc.ds.toy() rsc.dcg.mlm(adata, net, tmin=3)