Demo Workflow & Decoupler#

Author: Severin Dicks

To run this notebook please make sure you have a working rapids environment with all nessaray dependencies. Run the data_downloader notebook first to create the AnnData object we are working with. In this example workflow we’ll be looking at a dataset of ca. 90000 cells from Quin et al., Cell Research 2020.

import scanpy as sc
import cupy as cp

import time
import rapids_singlecell as rsc

import warnings

warnings.filterwarnings("ignore")

import rmm
from rmm.allocators.cupy import rmm_cupy_allocator

rmm.reinitialize(
    managed_memory=False,  # Allows oversubscription
    pool_allocator=False,  # default is False
    devices=0,  # GPU device IDs to register. By default registers only GPU 0.
)
cp.cuda.set_allocator(rmm_cupy_allocator)

Load and Prepare Data#

We load the sparse count matrix from an h5ad file using Scanpy. The sparse count matrix will then be placed on the GPU.

data_load_start = time.time()

%%time
adata = sc.read("h5/adata.raw.h5ad")

CPU times: user 2.34 s, sys: 146 ms, total: 2.49 s
Wall time: 2.53 s

%%time
rsc.get.anndata_to_GPU(adata)

CPU times: user 57.9 ms, sys: 205 ms, total: 263 ms
Wall time: 262 ms

data_load_time = time.time()
print("Total data load and format time: %s" % (data_load_time - data_load_start))

Total data load and format time: 2.810950517654419

Preprocessing#

preprocess_start = time.time()

Quality Control#

We perform a basic qulitiy control and plot the results

%%time
rsc.pp.flag_gene_family(adata, gene_family_name="MT", gene_family_prefix="MT")

CPU times: user 5.76 ms, sys: 561 µs, total: 6.33 ms
Wall time: 6.19 ms

%%time
rsc.pp.flag_gene_family(adata, gene_family_name="RIBO", gene_family_prefix="RPS")

CPU times: user 5.68 ms, sys: 0 ns, total: 5.68 ms
Wall time: 5.58 ms

%%time
rsc.pp.calculate_qc_metrics(adata, qc_vars=["MT", "RIBO"])

CPU times: user 87.1 ms, sys: 6.5 ms, total: 93.6 ms
Wall time: 168 ms

sc.pl.scatter(adata, x="total_counts", y="pct_counts_MT")
sc.pl.scatter(adata, x="total_counts", y="n_genes_by_counts")

../_images/4eff10cb99490cb487bbe42d6d387c447d3b7595f78f51e56fb98d7da841c902.png

../_images/de830ab1ab6e60f461acda83be8741d35f775ea52c830daa1af13037e82ba322.png

sc.pl.violin(adata, "n_genes_by_counts", jitter=0.4, groupby="PatientNumber")
sc.pl.violin(adata, "total_counts", jitter=0.4, groupby="PatientNumber")
sc.pl.violin(adata, "pct_counts_MT", jitter=0.4, groupby="PatientNumber")

../_images/76f86c11a6688d9c0826f9d888c1f15ee5bd531e38a3d04f30a90d1dfe63bdf4.png

../_images/53d56c2cfdc598eee37f3cfc63396c4b245eee366a567402a8aad2f79e586aeb.png

../_images/7f34f051f19892ead80e73c5d268a2f36f40aead8e9b38e0a06b81033e9f2a1c.png

Filter#

We filter the count matrix to remove cells with an extreme number of genes expressed. We also filter out cells with a mitchondrial countent of more than 20%.

%%time
adata = adata[adata.obs["n_genes_by_counts"] < 5000]
adata = adata[adata.obs["pct_counts_MT"] < 20]

CPU times: user 16.8 ms, sys: 4.56 ms, total: 21.4 ms
Wall time: 20.6 ms

We also filter out genes that are expressed in less than 3 cells.

%%time
rsc.pp.filter_genes(adata, min_count=3)

filtered out 8034 genes based on n_cells_by_counts
CPU times: user 54.7 ms, sys: 27.6 ms, total: 82.3 ms
Wall time: 91.3 ms

The size of our count matrix is now reduced.

adata.shape

(91068, 25660)

Normalize#

We normalize the count matrix so that the total counts in each cell sum to 1e4.

%%time
rsc.pp.normalize_total(adata, target_sum=1e4)

CPU times: user 239 µs, sys: 1.09 ms, total: 1.33 ms
Wall time: 7.03 ms

Next, we data transform the count matrix.

%%time
rsc.pp.log1p(adata)

CPU times: user 7 µs, sys: 3.01 ms, total: 3.02 ms
Wall time: 2.92 ms

Select Most Variable Genes#

Now we search for highly variable genes. This function only supports the flavors cell_ranger seurat seurat_v3 and pearson_residuals. As you can in scanpy you can filter based on cutoffs or select the top n cells. You can also use a batch_key to reduce batcheffects.

In this example we use cell_ranger for selecting highly variable genes based on the log normalized counts in .X

%%time
rsc.pp.highly_variable_genes(adata, n_top_genes=5000, flavor="cell_ranger")

CPU times: user 256 ms, sys: 27.6 ms, total: 284 ms
Wall time: 316 ms

Now we safe this version of the AnnData as adata.raw.

%%time
adata.raw = adata

CPU times: user 64.4 ms, sys: 52.9 ms, total: 117 ms
Wall time: 116 ms

Now we restrict our AnnData object to the highly variable genes.

%%time
rsc.pp.filter_highly_variable(adata)

CPU times: user 116 ms, sys: 118 ms, total: 234 ms
Wall time: 234 ms

Next we regress out effects of counts per cell and the mitochondrial content of the cells. As you can with scanpy you can use every numerical column in .obs for this.

%%time
rsc.pp.regress_out(adata, keys=["total_counts", "pct_counts_MT"])

CPU times: user 531 ms, sys: 663 ms, total: 1.19 s
Wall time: 1.23 s

Scale#

Finally, we scale the count matrix to obtain a z-score and apply a cutoff value of 10 standard deviations.

%%time
rsc.pp.scale(adata, max_value=10)

CPU times: user 29 ms, sys: 3.58 ms, total: 32.6 ms
Wall time: 45.4 ms

Now we move .X out of the GPU.

%%time
rsc.get.anndata_to_CPU(adata)

CPU times: user 153 ms, sys: 103 ms, total: 256 ms
Wall time: 256 ms

preprocess_time = time.time()
print("Total Preprocessing time: %s" % (preprocess_time - preprocess_start))

Total Preprocessing time: 6.157665252685547

We have now finished the preprocessing of the data.

Clustering and Visualization#

Principal component analysis#

We use PCA to reduce the dimensionality of the matrix to its top 100 principal components. We use the PCA implementation from cuMLs.

%%time
rsc.tl.pca(adata, n_comps=100)

CPU times: user 1.41 s, sys: 582 ms, total: 1.99 s
Wall time: 2 s

We can use scanpy pca_variance_ratio plot to inspect the contribution of single PCs to the total variance in the data.

sc.pl.pca_variance_ratio(adata, log=True, n_pcs=100)

../_images/9d98306d1bf75d2ef07694076e1f8c46fd840093ba7f6433f440cced37a2a874.png

Computing the neighborhood graph and UMAP#

Next we compute the neighborhood graph using rsc.

Scanpy CPU implementation of nearest neighbor uses an approximation, while the GPU version calculates the exact graph. Both methods are valid, but you might see differences.

%%time
rsc.pp.neighbors(adata, n_neighbors=15, n_pcs=40)

CPU times: user 243 ms, sys: 27.9 ms, total: 271 ms
Wall time: 278 ms

Next we calculate the UMAP embedding using rapdis.

%%time
rsc.tl.umap(adata)

CPU times: user 250 ms, sys: 13.3 ms, total: 263 ms
Wall time: 263 ms

Clustering#

Next, we use the Louvain and Leiden algorithm for graph-based clustering.

%%time
rsc.tl.louvain(adata, resolution=0.6)

CPU times: user 3.36 s, sys: 815 ms, total: 4.18 s
Wall time: 6.06 s

%%time
rsc.tl.leiden(adata, resolution=0.6)

CPU times: user 414 ms, sys: 564 ms, total: 979 ms
Wall time: 981 ms

%%time
sc.pl.umap(adata, color=["louvain", "leiden"], legend_loc="on data")

../_images/ea5ad6594c9ca36ffda8c3a08587a392960a99d5d8c6c7159bc223e690410e43.png

CPU times: user 709 ms, sys: 161 ms, total: 869 ms
Wall time: 691 ms

sc.pl.umap(adata, color=["CellType", "PatientNumber"])

../_images/7b63b57fcc78d6922cc9445d54af24bb11be63e6e6d998aae4e0978203b7706c.png

TSNE + k-Means#

Next we use TSNE on the GPU to visualize the cells in two dimensions. We also perform k-Means clustering of the cells into 8 clusters.

%%time
rsc.tl.tsne(adata, n_pcs=40, perplexity=30, early_exaggeration=12, learning_rate=200)

[W] [17:02:59.730800] # of Nearest Neighbors should be at least 3 * perplexity. Your results might be a bit strange...
CPU times: user 1.36 s, sys: 1.35 s, total: 2.71 s
Wall time: 2.71 s

rsc.tl.kmeans(adata, n_clusters=8)

%%time
sc.pl.tsne(adata, color=["kmeans"], legend_loc="on data")

../_images/7f0dbf770f900d6e88a109f5d5ddaf1154a720e3b6ddd2b11053a5e411188686.png

CPU times: user 300 ms, sys: 156 ms, total: 456 ms
Wall time: 274 ms

sc.pl.tsne(adata, color=["CellType", "PatientNumber"])

../_images/abd4acfdd03965197b3d350a196279f9babda114296f4eea1b838b4f757f32e1.png

Differential expression analysis#

We now use logistic regression to compute a ranking for highly differential genes in each Louvain cluster.

We use logistic regression to identify the top 50 genes distinguishing each cluster.

%%time
rsc.tl.rank_genes_groups_logreg(adata, groupby="CellType", use_raw=False)

[W] [17:03:06.404040] L-BFGS stopped, because the line search failed to advance (step delta = 0.000000)
CPU times: user 2.24 s, sys: 638 ms, total: 2.88 s
Wall time: 2.91 s

sc.pl.rank_genes_groups(adata)

../_images/9cf2c4f7f6db4aec08eb89122269143d32ec8cab95596911fdb86897ad7abdf1.png

post_time = time.time()

print("Total Postprocessing time: %s" % (post_time - preprocess_time))

Total Postprocessing time: 18.99196195602417

Diffusion Maps#

With cupy 9 its possible to compute Eigenvalues of sparse matrixes. We now create a Diffusion Map of the T-Cells to look at trajectories.

First we create a subset of only the T-Cells

tdata = adata[adata.obs["CellType"] == "T_cell", :].copy()

We can repeat the dimension reduction, clustering and visulatization.

%%time
rsc.pp.pca(tdata, n_comps=50)
sc.pl.pca_variance_ratio(tdata, log=True, n_pcs=50)

../_images/45e5357fff640c4c2e70dc3ef2abcfc8280ad17bf8349ccc8ea819cbaab5a4c9.png

CPU times: user 833 ms, sys: 497 ms, total: 1.33 s
Wall time: 1.15 s

%%time
rsc.pp.neighbors(tdata, n_neighbors=15, n_pcs=20)
rsc.tl.umap(tdata)
rsc.tl.louvain(tdata)

CPU times: user 397 ms, sys: 322 ms, total: 719 ms
Wall time: 721 ms

sc.pl.umap(tdata, color=["louvain"])

../_images/e86dd802ecd03ff62ae0921822c7530b81e74361fc12e655301257957bcc00ee.png

As stated before Diffusion Maps have become an integral part of single cell analysis.

%%time
rsc.tl.diffmap(tdata)

CPU times: user 347 ms, sys: 1.29 s, total: 1.63 s
Wall time: 192 ms

sc.pl.diffmap(tdata, color="louvain")

../_images/7500f055dc6a8cd4b206a9e454cfbd83319faa5704911b31805151a2d6815422.png

%%time
rsc.tl.draw_graph(tdata)

CPU times: user 333 ms, sys: 9.48 ms, total: 342 ms
Wall time: 346 ms

sc.pl.draw_graph(tdata, color="louvain")

../_images/45a37eac23843f32b744f9f751fe3b1607e5a77508a605df0cd0076ab333cdcc.png

After this you can use X_diffmap for sc.pp.neighbors and other functions.

print("Total Processing time: %s" % (time.time() - preprocess_start))

Total Processing time: 29.676725387573242

Decoupler-GPU#

Here I introduce 2 functions of the decoupler package that have been accelerated with cupy: mlmand wsum. You can use the same nets that you would use with the CPU implementation.

import decoupler as dc

net = dc.get_dorothea(organism="human", levels=["A", "B", "C"])

%%time
rsc.dcg.run_mlm(
    mat=adata, net=net, source="source", target="target", weight="weight", verbose=True
)

4 features of mat are empty, they will be removed.
Running mlm on mat with 91068 samples and 25656 targets for 297 sources.
CPU times: user 8.68 s, sys: 2.93 s, total: 11.6 s
Wall time: 11.7 s

acts_mlm = dc.get_acts(adata, obsm_key="mlm_estimate")
sc.pl.umap(acts_mlm, color=["KLF5", "FOXA1", "CellType"], cmap="coolwarm", vcenter=0)

../_images/cba13251606a23ac01b133854ea0d02214ac0f74d0c166654620e05f60bbe871.png

model = dc.get_progeny(organism="human", top=100)

%%time
rsc.dcg.run_wsum(
    mat=adata,
    net=model,
    source="source",
    target="target",
    weight="weight",
    verbose=True,
)

4 features of mat are empty, they will be removed.
Running wsum on mat with 91068 samples and 25656 targets for 14 sources.
CPU times: user 19.3 s, sys: 4.38 s, total: 23.6 s
Wall time: 23.8 s

acts_wsum = dc.get_acts(adata, obsm_key="wsum_estimate")
sc.pl.umap(acts_wsum, color=["p53", "Trail", "CellType"], cmap="coolwarm", vcenter=0)

../_images/63d178fce6e50c5d1e9fbbca54241a1e7028a3680d41f9dcbd9fc1740746c049.png

Demo Workflow & Decoupler

Contents