Ligrec Benchmark#
This notebook benchmarks gr.ligrec
for squidpy and rapids-singlecell.
To run this notebook please make sure you have a working rapids environment with all nessaray dependencies. Run the data_downloader notebook first to create the AnnData object we are working with. In this example workflow we’ll be looking at a dataset of ca. 90000 cells from Quin et al., Cell Research 2020.
import scanpy as sc
import squidpy as sq
import cupy as cp
import rapids_singlecell as rsc
import warnings
warnings.filterwarnings("ignore")
import rmm
from rmm.allocators.cupy import rmm_cupy_allocator
rmm.reinitialize(
managed_memory=False, # Allows oversubscription
pool_allocator=False, # default is False
devices=0, # GPU device IDs to register. By default registers only GPU 0.
)
cp.cuda.set_allocator(rmm_cupy_allocator)
Load and Prepare Data#
We load the sparse count matrix from an h5ad
file using Scanpy. The sparse count matrix will then be placed on the GPU and run basic preprocessing for rsc.gr.ligrec
%%time
adata = sc.read("h5/adata.raw.h5ad")
CPU times: user 2.09 s, sys: 147 ms, total: 2.23 s
Wall time: 2.23 s
rsc.get.anndata_to_GPU(adata)
(93575, 33694)
%%time
rsc.pp.flag_gene_family(adata, gene_family_name="MT", gene_family_prefix="MT-")
CPU times: user 4.78 ms, sys: 0 ns, total: 4.78 ms
Wall time: 4.77 ms
%%time
rsc.pp.calculate_qc_metrics(adata, qc_vars=["MT"])
CPU times: user 82.6 ms, sys: 0 ns, total: 82.6 ms
Wall time: 82.2 ms
%%time
adata = adata[adata.obs["n_genes_by_counts"] < 5000]
adata.shape
CPU times: user 107 ms, sys: 24.2 ms, total: 131 ms
Wall time: 130 ms
(92666, 33694)
%%time
adata = adata[adata.obs["pct_counts_MT"] < 20]
adata.shape
CPU times: user 9.85 ms, sys: 15.3 ms, total: 25.1 ms
Wall time: 24.7 ms
(91180, 33694)
%%time
rsc.pp.filter_genes(adata, min_count=3)
filtered out 8034 genes based on n_cells_by_counts
CPU times: user 64.7 ms, sys: 32.2 ms, total: 96.9 ms
Wall time: 96.4 ms
%%time
rsc.pp.normalize_total(adata, target_sum=1e4)
CPU times: user 1.03 ms, sys: 319 µs, total: 1.35 ms
Wall time: 863 µs
%%time
rsc.pp.log1p(adata)
CPU times: user 0 ns, sys: 7 ms, total: 7 ms
Wall time: 6.58 ms
%%time
rsc.get.anndata_to_CPU(adata)
adata.raw = adata
CPU times: user 121 ms, sys: 36.2 ms, total: 157 ms
Wall time: 156 ms
adata
AnnData object with n_obs × n_vars = 91180 × 25660
obs: 'nGene', 'nUMI', 'CellFromTumor', 'PatientNumber', 'TumorType', 'TumorSite', 'CellType', 'n_genes_by_counts', 'total_counts', 'log1p_n_genes_by_counts', 'log1p_total_counts', 'total_counts_MT', 'pct_counts_MT', 'log1p_total_counts_MT'
var: 'gene_ids', 'MT', 'n_cells_by_counts', 'total_counts', 'mean_counts', 'pct_dropout_by_counts', 'log1p_total_counts', 'log1p_mean_counts'
uns: 'log1p'
Ligrec Benchmark#
First we download the interactions so that both function get evaluated in the same way
interactions = rsc.squidpy_gpu._ligrec._get_interactions()
Next, we execute the function using both the rapids-singlecell and squidpy versions for comparison
%%time
res_rsc = rsc.gr.ligrec(
adata,
n_perms=1000,
interactions=interactions,
cluster_key="CellType",
copy=True,
use_raw=True,
)
CPU times: user 3.45 s, sys: 316 ms, total: 3.77 s
Wall time: 3.77 s
res_rsc["means"].iloc[:10, :10]
cluster_1 | Alveolar | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
cluster_2 | Alveolar | B_cell | Cancer | EC | Epithelial | Erythroblast | Fibroblast | Mast_cell | Myeloid | T_cell | |
source | target | ||||||||||
EPOR | TRPC3 | 0.000000 | 0.020600 | 0.000000 | 0.020986 | 0.000000 | 0.0 | 0.023505 | 0.000000 | 0.000000 | 0.021047 |
JAK2 | 0.030588 | 0.027678 | 0.027384 | 0.039786 | 0.036642 | 0.0 | 0.042807 | 0.030903 | 0.059531 | 0.033056 | |
FYN | JAK2 | 0.021167 | 0.018256 | 0.017962 | 0.030365 | 0.027220 | 0.0 | 0.033385 | 0.021481 | 0.050110 | 0.023634 |
CCL2 | JAK2 | 0.168153 | 0.165242 | 0.164949 | 0.177351 | 0.174207 | 0.0 | 0.180372 | 0.168468 | 0.197096 | 0.170620 |
KIT | JAK2 | 0.013606 | 0.010695 | 0.010402 | 0.022804 | 0.019660 | 0.0 | 0.025825 | 0.013920 | 0.042549 | 0.016073 |
EPO | JAK2 | 0.010124 | 0.007213 | 0.006920 | 0.019322 | 0.016178 | 0.0 | 0.022343 | 0.010439 | 0.039067 | 0.012591 |
IFNG | JAK2 | 0.018772 | 0.015861 | 0.015568 | 0.027970 | 0.024826 | 0.0 | 0.030991 | 0.019086 | 0.047715 | 0.021239 |
KITLG | JAK2 | 0.054305 | 0.051394 | 0.051101 | 0.063503 | 0.060359 | 0.0 | 0.066524 | 0.054620 | 0.083248 | 0.056772 |
NRG1 | JAK2 | 0.029253 | 0.026343 | 0.026049 | 0.038451 | 0.035307 | 0.0 | 0.041472 | 0.029568 | 0.058196 | 0.031721 |
IL4R | JAK2 | 0.053766 | 0.050856 | 0.050562 | 0.062964 | 0.059820 | 0.0 | 0.065985 | 0.054081 | 0.082709 | 0.056234 |
res_rsc["pvalues"].iloc[:10, :10]
cluster_1 | Alveolar | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
cluster_2 | Alveolar | B_cell | Cancer | EC | Epithelial | Erythroblast | Fibroblast | Mast_cell | Myeloid | T_cell | |
source | target | ||||||||||
EPOR | TRPC3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
JAK2 | 0.516 | 0.942 | 0.970 | 0.0 | 0.164 | NaN | 0.000 | 0.484 | 0.000 | 0.074 | |
FYN | JAK2 | 1.000 | 1.000 | 1.000 | 1.0 | 1.000 | NaN | 1.000 | 1.000 | 1.000 | 1.000 |
CCL2 | JAK2 | 0.000 | 0.000 | 0.000 | 0.0 | 0.000 | NaN | 0.000 | 0.000 | 0.000 | 0.000 |
KIT | JAK2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
EPO | JAK2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
IFNG | JAK2 | 1.000 | 1.000 | 1.000 | 1.0 | 1.000 | NaN | 1.000 | 1.000 | 1.000 | 1.000 |
KITLG | JAK2 | 0.000 | 0.000 | 0.000 | 0.0 | 0.000 | NaN | 0.000 | 0.000 | 0.000 | 0.000 |
NRG1 | JAK2 | 0.011 | 0.079 | 0.081 | 0.0 | 0.037 | NaN | 0.000 | 0.071 | 0.000 | 0.000 |
IL4R | JAK2 | 1.000 | 1.000 | 1.000 | 1.0 | 0.992 | NaN | 0.984 | 1.000 | 0.005 | 1.000 |
%%time
res_sq = sq.gr.ligrec(
adata,
n_perms=1000,
interactions=interactions,
cluster_key="CellType",
copy=True,
use_raw=True,
n_jobs=32,
)
CPU times: user 20.4 s, sys: 2.02 s, total: 22.5 s
Wall time: 52.3 s
res_sq["means"].iloc[:10, :10]
cluster_1 | Alveolar | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
cluster_2 | Alveolar | B_cell | Cancer | EC | Epithelial | Erythroblast | Fibroblast | Mast_cell | Myeloid | T_cell | |
source | target | ||||||||||
EPOR | TRPC3 | 0.000000 | 0.020600 | 0.000000 | 0.020986 | 0.000000 | 0.0 | 0.023505 | 0.000000 | 0.000000 | 0.021047 |
JAK2 | 0.030588 | 0.027678 | 0.027384 | 0.039786 | 0.036642 | 0.0 | 0.042807 | 0.030903 | 0.059531 | 0.033056 | |
FYN | JAK2 | 0.021167 | 0.018256 | 0.017962 | 0.030365 | 0.027220 | 0.0 | 0.033385 | 0.021481 | 0.050110 | 0.023634 |
CCL2 | JAK2 | 0.168153 | 0.165242 | 0.164949 | 0.177351 | 0.174207 | 0.0 | 0.180372 | 0.168468 | 0.197096 | 0.170620 |
KIT | JAK2 | 0.013606 | 0.010695 | 0.010402 | 0.022804 | 0.019660 | 0.0 | 0.025825 | 0.013920 | 0.042549 | 0.016073 |
EPO | JAK2 | 0.010124 | 0.007213 | 0.006920 | 0.019322 | 0.016178 | 0.0 | 0.022343 | 0.010439 | 0.039067 | 0.012591 |
IFNG | JAK2 | 0.018772 | 0.015861 | 0.015568 | 0.027970 | 0.024826 | 0.0 | 0.030991 | 0.019086 | 0.047715 | 0.021239 |
KITLG | JAK2 | 0.054305 | 0.051394 | 0.051101 | 0.063503 | 0.060359 | 0.0 | 0.066524 | 0.054620 | 0.083248 | 0.056772 |
NRG1 | JAK2 | 0.029253 | 0.026343 | 0.026049 | 0.038451 | 0.035307 | 0.0 | 0.041472 | 0.029568 | 0.058196 | 0.031721 |
IL4R | JAK2 | 0.053766 | 0.050856 | 0.050562 | 0.062964 | 0.059820 | 0.0 | 0.065985 | 0.054081 | 0.082709 | 0.056234 |
res_sq["pvalues"].iloc[:10, :10]
cluster_1 | Alveolar | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
cluster_2 | Alveolar | B_cell | Cancer | EC | Epithelial | Erythroblast | Fibroblast | Mast_cell | Myeloid | T_cell | |
source | target | ||||||||||
EPOR | TRPC3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
JAK2 | 0.497 | 0.922 | 0.964 | 0.000 | 0.135 | NaN | 0.000 | 0.444 | 0.000 | 0.06 | |
FYN | JAK2 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | NaN | 1.000 | 1.000 | 1.000 | 1.00 |
CCL2 | JAK2 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | NaN | 0.000 | 0.000 | 0.000 | 0.00 |
KIT | JAK2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
EPO | JAK2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
IFNG | JAK2 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | NaN | 1.000 | 1.000 | 1.000 | 1.00 |
KITLG | JAK2 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | NaN | 0.000 | 0.000 | 0.000 | 0.00 |
NRG1 | JAK2 | 0.007 | 0.070 | 0.080 | 0.000 | 0.037 | NaN | 0.000 | 0.054 | 0.000 | 0.00 |
IL4R | JAK2 | 1.000 | 1.000 | 1.000 | 0.999 | 0.993 | NaN | 0.977 | 1.000 | 0.003 | 1.00 |