Here, we demonstrate BANKSY analysis on 10x Visium data of the human dorsolateral prefrontal cortex from Maynard et al (2018). The data comprise 12 samples obtained from 3 subjects, with manual annotation of the layers in each sample. We will focus on 4 of the 12 samples from subject 3, demonstrating multi-sample analysis with BANKSY.
library(Banksy)
library(SummarizedExperiment)
library(SpatialExperiment)
library(Seurat)
library(scater)
library(cowplot)
library(ggplot2)
We fetch the data for all 12 DLPFC samples with the spatialLIBD package. This might take awhile.
library(spatialLIBD)
library(ExperimentHub)
ehub <- ExperimentHub::ExperimentHub()
spe <- spatialLIBD::fetch_data(type = "spe", eh = ehub)
After the download is completed, we trim the SpatialExperiment object, retaining only the counts and some metadata such as the sample identifier and pathology annotations. This saves some memory.
imgData(spe) <- NULL
assay(spe, "logcounts") <- NULL
reducedDims(spe) <- NULL
rowData(spe) <- NULL
colData(spe) <- DataFrame(
sample_id = spe$sample_id,
clust_annotation = factor(
addNA(spe$layer_guess_reordered_short),
exclude = NULL, labels = seq(8)
),
in_tissue = spe$in_tissue,
row.names = colnames(spe)
)
invisible(gc())
Next, subset spe
to samples from the last subject
(samples 151673
, 151674
, 151675
,
151676
). This stores each sample in its own
SpatialExperiment object, all placed in a list.
sample_names <- as.character(151673:151676)
spe_list <- lapply(sample_names, function(x) spe[, spe$sample_id == x])
rm(spe)
invisible(gc())
Using Seurat, we perform basic normalisation of the data, and select the top 2000 highly variable features from each sample. Other methods for normalisation and feature selection may also be used. We take the union of these features for downstream analysis.
#' Normalize data
seu_list <- lapply(spe_list, function(x) {
x <- as.Seurat(x, data = NULL)
NormalizeData(x, scale.factor = 5000, normalization.method = 'RC')
})
#' Compute HVGs
hvgs <- lapply(seu_list, function(x) {
VariableFeatures(FindVariableFeatures(x, nfeatures = 2000))
})
hvgs <- Reduce(union, hvgs)
#' Add data to SpatialExperiment and subset to HVGs
aname <- "normcounts"
spe_list <- Map(function(spe, seu) {
assay(spe, aname) <- GetAssayData(seu)
spe[hvgs,]
}, spe_list, seu_list)
rm(seu_list)
invisible(gc())
To run BANKSY across multiple samples, we first compute the BANKSY
neighborhood feature matrices for each sample separately. We use
k_geom=6
corresponding to the first-order neighbors in 10x
Visium assays (k_geom=18
corresponding to first and
second-order neighbors may also be used).
compute_agf <- FALSE
k_geom <- 6
spe_list <- lapply(spe_list, computeBanksy, assay_name = aname,
compute_agf = compute_agf, k_geom = k_geom)
We then merge the samples to perform joint dimensional reduction and clustering:
When running multi-sample BANKSY PCA, the group
argument
may be provided. This specifies the grouping variable for the cells or
spots across the samples. Features belonging to cells or spots
corresponding to each level of the grouping variable will be z-scaled
separately. In this case, sample_id
in
colData(spe_joint)
gives the grouping based on the sample
of origin.
lambda <- 0.2
use_agf <- FALSE
spe_joint <- runBanksyPCA(spe_joint, use_agf = use_agf, lambda = lambda, group = "sample_id", seed = 1000)
Run UMAP on the BANKSY embedding:
spe_joint <- runBanksyUMAP(spe_joint, use_agf = use_agf, lambda = lambda, seed = 1000)
Finally, we obtain cluster labels for spots across all 4 samples. We
use connectClusters
for visual comparison of the manual
annotations and BANKSY clusters.
res <- 0.7
spe_joint <- clusterBanksy(spe_joint, use_agf = use_agf, lambda = lambda, resolution = res, seed = 1000)
cnm <- sprintf("clust_M%s_lam%s_k50_res%s", as.numeric(use_agf), lambda, res)
spe_joint <- connectClusters(spe_joint)
Once joint clustering is performed, we split the samples into their
own SpatialExperiment
objects:
spe_list <- lapply(sample_names, function(x) spe_joint[, spe_joint$sample_id == x])
rm(spe_joint)
invisible(gc())
As an optional step, we smooth the cluster labels of each sample separately. This can be done if smooth spatial domains are expected in the biological sample or tissue in question.
spe_list <- lapply(spe_list, smoothLabels, cluster_names = cnm, k = 6L, verbose = FALSE)
names(spe_list) <- paste0("sample_", sample_names)
The raw and smoothed cluster labels are stored in the
colData
slot of each SingleCellExperiment
or
SpatialExperiment
object.
#> DataFrame with 6 rows and 5 columns
#> sample_id clust_annotation in_tissue
#> <character> <factor> <logical>
#> AAACAAGTATCTCCCA-1 151673 1 TRUE
#> AAACAATCTACTAGCA-1 151673 7 TRUE
#> AAACACCAATAACTGC-1 151673 2 TRUE
#> AAACAGAGCGACTCCT-1 151673 1 TRUE
#> AAACAGCTTTCAGAAG-1 151673 5 TRUE
#> AAACAGGGTCTATATT-1 151673 3 TRUE
#> clust_M0_lam0.2_k50_res0.7 clust_M0_lam0.2_k50_res0.7_smooth
#> <factor> <factor>
#> AAACAAGTATCTCCCA-1 4 1
#> AAACAATCTACTAGCA-1 7 7
#> AAACACCAATAACTGC-1 2 2
#> AAACAGAGCGACTCCT-1 1 1
#> AAACAGCTTTCAGAAG-1 3 3
#> AAACAGGGTCTATATT-1 2 2
We can compare BANKSY clusters to pathology annotations using several
cluster comparison measures such as the adjusted Rand index (ARI) or
normalized mutual information (NMI) with compareClusters
.
The function computes the selected comparison measure between all pairs
of cluster labels:
compareClusters(spe_list$sample_151673, func = 'ARI')
#> clust_annotation clust_M0_lam0.2_k50_res0.7
#> clust_annotation 1.000 0.548
#> clust_M0_lam0.2_k50_res0.7 0.548 1.000
#> clust_M0_lam0.2_k50_res0.7_smooth 0.564 0.917
#> clust_M0_lam0.2_k50_res0.7_smooth
#> clust_annotation 0.564
#> clust_M0_lam0.2_k50_res0.7 0.917
#> clust_M0_lam0.2_k50_res0.7_smooth 1.000
We evaluate the ARI and NMI for each sample:
ari <- sapply(spe_list, function(x) as.numeric(tail(compareClusters(x, func = "ARI")[, 1], n = 1)))
ari
#> sample_151673 sample_151674 sample_151675 sample_151676
#> 0.564 0.655 0.624 0.587
nmi <- sapply(spe_list, function(x) as.numeric(tail(compareClusters(x, func = "NMI")[, 1], n = 1)))
nmi
#> sample_151673 sample_151674 sample_151675 sample_151676
#> 0.704 0.750 0.717 0.694
Visualise pathology annotation and BANKSY cluster on spatial coordinates with the ggspavis package:
# Use scater:::.get_palette('tableau10medium')
pal <- c(
"#729ECE", "#FF9E4A", "#67BF5C", "#ED665D", "#AD8BC9",
"#A8786E", "#ED97CA", "#A2A2A2", "#CDCC5D", "#6DCCDA"
)
plot_bank <- lapply(spe_list, function(x) {
df <- cbind.data.frame(
clust=colData(x)[[sprintf("%s_smooth", cnm)]], spatialCoords(x))
ggplot(df, aes(x=pxl_row_in_fullres, y=pxl_col_in_fullres, col=clust)) +
geom_point(size = 0.5) +
scale_color_manual(values = pal) +
theme_classic() +
theme(
legend.position = "none",
axis.text.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank()) +
labs(title = "BANKSY clusters") +
coord_equal()
})
plot_anno <- lapply(spe_list, function(x) {
df <- cbind.data.frame(
clust=colData(x)[['clust_annotation']], spatialCoords(x))
ggplot(df, aes(x=pxl_row_in_fullres, y=pxl_col_in_fullres, col=clust)) +
geom_point(size = 0.5) +
scale_color_manual(values = pal) +
theme_classic() +
theme(
legend.position = "none",
axis.text.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank()) +
labs(title = sprintf("Sample %s", x$sample_id[1])) +
coord_equal()
})
plot_list <- c(plot_anno, plot_bank)
plot_grid(plotlist = plot_list, ncol = 4, byrow = TRUE)
Visualize joint UMAPs for each sample:
umap_bank <- lapply(spe_list, function(x) {
plotReducedDim(x,
"UMAP_M0_lam0.2",
colour_by = sprintf("%s_smooth", cnm),
point_size = 0.5
) +
theme(legend.position = "none") +
labs(title = "BANKSY clusters")
})
umap_anno <- lapply(spe_list, function(x) {
plotReducedDim(x,
"UMAP_M0_lam0.2",
colour_by = "clust_annotation",
point_size = 0.5
) +
theme(legend.position = "none") +
labs(title = sprintf("Sample %s", x$sample_id[1]))
})
umap_list <- c(umap_anno, umap_bank)
plot_grid(plotlist = umap_list, ncol = 4, byrow = TRUE)
Vignette runtime:
#> Time difference of 1.640723 mins
sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: aarch64-apple-darwin20 (64-bit)
#> Running under: macOS Sonoma 14.2.1
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: America/Detroit
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] ExperimentHub_2.10.0 AnnotationHub_3.10.0
#> [3] BiocFileCache_2.10.1 dbplyr_2.4.0
#> [5] spatialLIBD_1.14.1 cowplot_1.1.3
#> [7] scater_1.30.1 ggplot2_3.4.4
#> [9] scuttle_1.12.0 Seurat_5.0.1
#> [11] SeuratObject_5.0.1 sp_2.1-3
#> [13] SpatialExperiment_1.12.0 SingleCellExperiment_1.24.0
#> [15] SummarizedExperiment_1.32.0 Biobase_2.62.0
#> [17] GenomicRanges_1.54.1 GenomeInfoDb_1.38.6
#> [19] IRanges_2.36.0 S4Vectors_0.40.2
#> [21] BiocGenerics_0.48.1 MatrixGenerics_1.14.0
#> [23] matrixStats_1.2.0 Banksy_0.99.12
#> [25] BiocStyle_2.30.0
#>
#> loaded via a namespace (and not attached):
#> [1] fs_1.6.3 spatstat.sparse_3.0-3
#> [3] bitops_1.0-7 doParallel_1.0.17
#> [5] httr_1.4.7 RColorBrewer_1.1-3
#> [7] tools_4.3.2 sctransform_0.4.1
#> [9] utf8_1.2.4 R6_2.5.1
#> [11] DT_0.31 lazyeval_0.2.2
#> [13] uwot_0.1.16 withr_3.0.0
#> [15] gridExtra_2.3 progressr_0.14.0
#> [17] cli_3.6.2 textshaping_0.3.7
#> [19] spatstat.explore_3.2-6 fastDummies_1.7.3
#> [21] labeling_0.4.3 sass_0.4.8
#> [23] spatstat.data_3.0-4 ggridges_0.5.6
#> [25] pbapply_1.7-2 pkgdown_2.0.7
#> [27] Rsamtools_2.18.0 systemfonts_1.0.5
#> [29] dbscan_1.1-12 aricode_1.0.3
#> [31] sessioninfo_1.2.2 parallelly_1.37.0
#> [33] attempt_0.3.1 maps_3.4.2
#> [35] limma_3.58.1 rstudioapi_0.15.0
#> [37] RSQLite_2.3.5 BiocIO_1.12.0
#> [39] generics_0.1.3 ica_1.0-3
#> [41] spatstat.random_3.2-2 dplyr_1.1.4
#> [43] Matrix_1.6-5 ggbeeswarm_0.7.2
#> [45] fansi_1.0.6 abind_1.4-5
#> [47] lifecycle_1.0.4 edgeR_4.0.15
#> [49] yaml_2.3.8 SparseArray_1.2.4
#> [51] Rtsne_0.17 paletteer_1.6.0
#> [53] grid_4.3.2 blob_1.2.4
#> [55] promises_1.2.1 crayon_1.5.2
#> [57] miniUI_0.1.1.1 lattice_0.22-5
#> [59] beachmat_2.18.0 KEGGREST_1.42.0
#> [61] magick_2.8.2 pillar_1.9.0
#> [63] knitr_1.45 rjson_0.2.21
#> [65] future.apply_1.11.1 codetools_0.2-19
#> [67] leiden_0.4.3.1 glue_1.7.0
#> [69] data.table_1.15.0 vctrs_0.6.5
#> [71] png_0.1-8 spam_2.10-0
#> [73] gtable_0.3.4 rematch2_2.1.2
#> [75] cachem_1.0.8 xfun_0.42
#> [77] S4Arrays_1.2.0 mime_0.12
#> [79] survival_3.5-7 RcppHungarian_0.3
#> [81] iterators_1.0.14 fields_15.2
#> [83] statmod_1.5.0 interactiveDisplayBase_1.40.0
#> [85] ellipsis_0.3.2 fitdistrplus_1.1-11
#> [87] ROCR_1.0-11 nlme_3.1-164
#> [89] bit64_4.0.5 filelock_1.0.3
#> [91] RcppAnnoy_0.0.22 bslib_0.6.1
#> [93] irlba_2.3.5.1 vipor_0.4.7
#> [95] KernSmooth_2.23-22 colorspace_2.1-0
#> [97] DBI_1.2.1 tidyselect_1.2.0
#> [99] bit_4.0.5 compiler_4.3.2
#> [101] curl_5.2.0 BiocNeighbors_1.20.2
#> [103] desc_1.4.3 DelayedArray_0.28.0
#> [105] plotly_4.10.4 rtracklayer_1.62.0
#> [107] bookdown_0.37 scales_1.3.0
#> [109] lmtest_0.9-40 rappdirs_0.3.3
#> [111] stringr_1.5.1 digest_0.6.34
#> [113] goftest_1.2-3 spatstat.utils_3.0-4
#> [115] rmarkdown_2.25 benchmarkmeData_1.0.4
#> [117] XVector_0.42.0 htmltools_0.5.7
#> [119] pkgconfig_2.0.3 sparseMatrixStats_1.14.0
#> [121] highr_0.10 fastmap_1.1.1
#> [123] rlang_1.1.3 htmlwidgets_1.6.4
#> [125] shiny_1.8.0 DelayedMatrixStats_1.24.0
#> [127] farver_2.1.1 jquerylib_0.1.4
#> [129] zoo_1.8-12 jsonlite_1.8.8
#> [131] BiocParallel_1.36.0 mclust_6.0.1
#> [133] config_0.3.2 BiocSingular_1.18.0
#> [135] RCurl_1.98-1.14 magrittr_2.0.3
#> [137] GenomeInfoDbData_1.2.11 dotCall64_1.1-1
#> [139] patchwork_1.2.0 munsell_0.5.0
#> [141] Rcpp_1.0.12 viridis_0.6.5
#> [143] reticulate_1.35.0 leidenAlg_1.1.2
#> [145] stringi_1.8.3 zlibbioc_1.48.0
#> [147] MASS_7.3-60.0.1 plyr_1.8.9
#> [149] parallel_4.3.2 listenv_0.9.1
#> [151] ggrepel_0.9.5 deldir_2.0-2
#> [153] Biostrings_2.70.2 sccore_1.0.4
#> [155] splines_4.3.2 tensor_1.5
#> [157] locfit_1.5-9.8 igraph_2.0.1.1
#> [159] spatstat.geom_3.2-8 RcppHNSW_0.6.0
#> [161] reshape2_1.4.4 ScaledMatrix_1.10.0
#> [163] XML_3.99-0.16.1 BiocVersion_3.18.1
#> [165] evaluate_0.23 golem_0.4.1
#> [167] BiocManager_1.30.22 foreach_1.5.2
#> [169] httpuv_1.6.14 RANN_2.6.1
#> [171] tidyr_1.3.1 purrr_1.0.2
#> [173] polyclip_1.10-6 benchmarkme_1.0.8
#> [175] future_1.33.1 scattermore_1.2
#> [177] rsvd_1.0.5 xtable_1.8-4
#> [179] restfulr_0.0.15 RSpectra_0.16-1
#> [181] later_1.3.2 viridisLite_0.4.2
#> [183] ragg_1.2.7 tibble_3.2.1
#> [185] GenomicAlignments_1.38.2 memoise_2.0.1
#> [187] beeswarm_0.4.0 AnnotationDbi_1.64.1
#> [189] cluster_2.1.6 shinyWidgets_0.8.1
#> [191] globals_0.16.2