Ligrec Benchmark

Ligrec Benchmark#

This notebook benchmarks gr.ligrec for squidpy and rapids-singlecell.

To run this notebook please make sure you have a working rapids environment with all nessaray dependencies. Run the data_downloader notebook first to create the AnnData object we are working with. In this example workflow we’ll be looking at a dataset of ca. 90000 cells from Quin et al., Cell Research 2020.

import scanpy as sc
import squidpy as sq
import cupy as cp
import rapids_singlecell as rsc

import warnings

warnings.filterwarnings("ignore")
import rmm
from rmm.allocators.cupy import rmm_cupy_allocator

rmm.reinitialize(
    managed_memory=False,  # Allows oversubscription
    pool_allocator=False,  # default is False
    devices=0,  # GPU device IDs to register. By default registers only GPU 0.
)
cp.cuda.set_allocator(rmm_cupy_allocator)

Load and Prepare Data#

We load the sparse count matrix from an h5ad file using Scanpy. The sparse count matrix will then be placed on the GPU and run basic preprocessing for rsc.gr.ligrec

%%time
adata = sc.read("h5/adata.raw.h5ad")
CPU times: user 2.09 s, sys: 147 ms, total: 2.23 s
Wall time: 2.23 s
rsc.get.anndata_to_GPU(adata)
(93575, 33694)
%%time
rsc.pp.flag_gene_family(adata, gene_family_name="MT", gene_family_prefix="MT-")
CPU times: user 4.78 ms, sys: 0 ns, total: 4.78 ms
Wall time: 4.77 ms
%%time
rsc.pp.calculate_qc_metrics(adata, qc_vars=["MT"])
CPU times: user 82.6 ms, sys: 0 ns, total: 82.6 ms
Wall time: 82.2 ms
%%time
adata = adata[adata.obs["n_genes_by_counts"] < 5000]
adata.shape
CPU times: user 107 ms, sys: 24.2 ms, total: 131 ms
Wall time: 130 ms
(92666, 33694)
%%time
adata = adata[adata.obs["pct_counts_MT"] < 20]
adata.shape
CPU times: user 9.85 ms, sys: 15.3 ms, total: 25.1 ms
Wall time: 24.7 ms
(91180, 33694)
%%time
rsc.pp.filter_genes(adata, min_count=3)
filtered out 8034 genes based on n_cells_by_counts
CPU times: user 64.7 ms, sys: 32.2 ms, total: 96.9 ms
Wall time: 96.4 ms
%%time
rsc.pp.normalize_total(adata, target_sum=1e4)
CPU times: user 1.03 ms, sys: 319 µs, total: 1.35 ms
Wall time: 863 µs
%%time
rsc.pp.log1p(adata)
CPU times: user 0 ns, sys: 7 ms, total: 7 ms
Wall time: 6.58 ms
%%time
rsc.get.anndata_to_CPU(adata)
adata.raw = adata
CPU times: user 121 ms, sys: 36.2 ms, total: 157 ms
Wall time: 156 ms
adata
AnnData object with n_obs × n_vars = 91180 × 25660
    obs: 'nGene', 'nUMI', 'CellFromTumor', 'PatientNumber', 'TumorType', 'TumorSite', 'CellType', 'n_genes_by_counts', 'total_counts', 'log1p_n_genes_by_counts', 'log1p_total_counts', 'total_counts_MT', 'pct_counts_MT', 'log1p_total_counts_MT'
    var: 'gene_ids', 'MT', 'n_cells_by_counts', 'total_counts', 'mean_counts', 'pct_dropout_by_counts', 'log1p_total_counts', 'log1p_mean_counts'
    uns: 'log1p'

Ligrec Benchmark#

First we download the interactions so that both function get evaluated in the same way

interactions = rsc.squidpy_gpu._ligrec._get_interactions()

Next, we execute the function using both the rapids-singlecell and squidpy versions for comparison

%%time
res_rsc = rsc.gr.ligrec(
    adata,
    n_perms=1000,
    interactions=interactions,
    cluster_key="CellType",
    copy=True,
    use_raw=True,
)
CPU times: user 3.45 s, sys: 316 ms, total: 3.77 s
Wall time: 3.77 s
res_rsc["means"].iloc[:10, :10]
cluster_1 Alveolar
cluster_2 Alveolar B_cell Cancer EC Epithelial Erythroblast Fibroblast Mast_cell Myeloid T_cell
source target
EPOR TRPC3 0.000000 0.020600 0.000000 0.020986 0.000000 0.0 0.023505 0.000000 0.000000 0.021047
JAK2 0.030588 0.027678 0.027384 0.039786 0.036642 0.0 0.042807 0.030903 0.059531 0.033056
FYN JAK2 0.021167 0.018256 0.017962 0.030365 0.027220 0.0 0.033385 0.021481 0.050110 0.023634
CCL2 JAK2 0.168153 0.165242 0.164949 0.177351 0.174207 0.0 0.180372 0.168468 0.197096 0.170620
KIT JAK2 0.013606 0.010695 0.010402 0.022804 0.019660 0.0 0.025825 0.013920 0.042549 0.016073
EPO JAK2 0.010124 0.007213 0.006920 0.019322 0.016178 0.0 0.022343 0.010439 0.039067 0.012591
IFNG JAK2 0.018772 0.015861 0.015568 0.027970 0.024826 0.0 0.030991 0.019086 0.047715 0.021239
KITLG JAK2 0.054305 0.051394 0.051101 0.063503 0.060359 0.0 0.066524 0.054620 0.083248 0.056772
NRG1 JAK2 0.029253 0.026343 0.026049 0.038451 0.035307 0.0 0.041472 0.029568 0.058196 0.031721
IL4R JAK2 0.053766 0.050856 0.050562 0.062964 0.059820 0.0 0.065985 0.054081 0.082709 0.056234
res_rsc["pvalues"].iloc[:10, :10]
cluster_1 Alveolar
cluster_2 Alveolar B_cell Cancer EC Epithelial Erythroblast Fibroblast Mast_cell Myeloid T_cell
source target
EPOR TRPC3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
JAK2 0.516 0.942 0.970 0.0 0.164 NaN 0.000 0.484 0.000 0.074
FYN JAK2 1.000 1.000 1.000 1.0 1.000 NaN 1.000 1.000 1.000 1.000
CCL2 JAK2 0.000 0.000 0.000 0.0 0.000 NaN 0.000 0.000 0.000 0.000
KIT JAK2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
EPO JAK2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
IFNG JAK2 1.000 1.000 1.000 1.0 1.000 NaN 1.000 1.000 1.000 1.000
KITLG JAK2 0.000 0.000 0.000 0.0 0.000 NaN 0.000 0.000 0.000 0.000
NRG1 JAK2 0.011 0.079 0.081 0.0 0.037 NaN 0.000 0.071 0.000 0.000
IL4R JAK2 1.000 1.000 1.000 1.0 0.992 NaN 0.984 1.000 0.005 1.000
%%time
res_sq = sq.gr.ligrec(
    adata,
    n_perms=1000,
    interactions=interactions,
    cluster_key="CellType",
    copy=True,
    use_raw=True,
    n_jobs=32,
)
CPU times: user 20.4 s, sys: 2.02 s, total: 22.5 s
Wall time: 52.3 s
res_sq["means"].iloc[:10, :10]
cluster_1 Alveolar
cluster_2 Alveolar B_cell Cancer EC Epithelial Erythroblast Fibroblast Mast_cell Myeloid T_cell
source target
EPOR TRPC3 0.000000 0.020600 0.000000 0.020986 0.000000 0.0 0.023505 0.000000 0.000000 0.021047
JAK2 0.030588 0.027678 0.027384 0.039786 0.036642 0.0 0.042807 0.030903 0.059531 0.033056
FYN JAK2 0.021167 0.018256 0.017962 0.030365 0.027220 0.0 0.033385 0.021481 0.050110 0.023634
CCL2 JAK2 0.168153 0.165242 0.164949 0.177351 0.174207 0.0 0.180372 0.168468 0.197096 0.170620
KIT JAK2 0.013606 0.010695 0.010402 0.022804 0.019660 0.0 0.025825 0.013920 0.042549 0.016073
EPO JAK2 0.010124 0.007213 0.006920 0.019322 0.016178 0.0 0.022343 0.010439 0.039067 0.012591
IFNG JAK2 0.018772 0.015861 0.015568 0.027970 0.024826 0.0 0.030991 0.019086 0.047715 0.021239
KITLG JAK2 0.054305 0.051394 0.051101 0.063503 0.060359 0.0 0.066524 0.054620 0.083248 0.056772
NRG1 JAK2 0.029253 0.026343 0.026049 0.038451 0.035307 0.0 0.041472 0.029568 0.058196 0.031721
IL4R JAK2 0.053766 0.050856 0.050562 0.062964 0.059820 0.0 0.065985 0.054081 0.082709 0.056234
res_sq["pvalues"].iloc[:10, :10]
cluster_1 Alveolar
cluster_2 Alveolar B_cell Cancer EC Epithelial Erythroblast Fibroblast Mast_cell Myeloid T_cell
source target
EPOR TRPC3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
JAK2 0.497 0.922 0.964 0.000 0.135 NaN 0.000 0.444 0.000 0.06
FYN JAK2 1.000 1.000 1.000 1.000 1.000 NaN 1.000 1.000 1.000 1.00
CCL2 JAK2 0.000 0.000 0.000 0.000 0.000 NaN 0.000 0.000 0.000 0.00
KIT JAK2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
EPO JAK2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
IFNG JAK2 1.000 1.000 1.000 1.000 1.000 NaN 1.000 1.000 1.000 1.00
KITLG JAK2 0.000 0.000 0.000 0.000 0.000 NaN 0.000 0.000 0.000 0.00
NRG1 JAK2 0.007 0.070 0.080 0.000 0.037 NaN 0.000 0.054 0.000 0.00
IL4R JAK2 1.000 1.000 1.000 0.999 0.993 NaN 0.977 1.000 0.003 1.00