vignettes/domain-segment.Rmd
domain-segment.Rmd
Here, we demonstrate BANKSY domain segmentation on a STARmap PLUS dataset of the mouse brain from Shi et al. (2022).
library(Banksy)
library(data.table)
library(SummarizedExperiment)
library(SpatialExperiment)
library(scater)
library(cowplot)
library(ggplot2)
Data from the study is available from the Single
Cell Portal. We analyze data from well11
. The data
comprise 1,022 genes profiled at subcellular resolution in 43,341
cells.
#' Change paths accordingly
gcm_path <- "../data/well11processed_expression_pd.csv.gz"
mdata_path <- "../data/well11_spatial.csv.gz"
#' Gene cell matrix
gcm <- fread(gcm_path)
genes <- gcm$GENE
gcm <- as.matrix(gcm[, -1])
rownames(gcm) <- genes
#' Spatial coordinates and metadata
mdata <- fread(mdata_path, skip = 1)
headers <- names(fread(mdata_path, nrows = 0))
colnames(mdata) <- headers
#' Orient spatial coordinates
xx <- mdata$X
yy <- mdata$Y
mdata$X <- max(yy) - yy
mdata$Y <- max(xx) - xx
mdata <- data.frame(mdata)
rownames(mdata) <- colnames(gcm)
locs <- as.matrix(mdata[, c("X", "Y", "Z")])
#' Create SpatialExperiment
se <- SpatialExperiment(
assay = list(processedExp = gcm),
spatialCoords = locs,
colData = mdata
)
Run BANKSY in domain segmentation mode with lambda=0.8
.
This places larger weights on the mean neighborhood expression and
azimuthal Gabor filter in constructing the BANKSY matrix. We adjust the
resolution to yield 23 clusters based on the results from Maher et
al. (2023) (see Fig. 1, 2).
Note that the parameter values for domain segmentation for datasets generated using the older Visium v1 / v2 55um technologies are
lambda = 0.2
andk_geom = 18
. See the note in the tutorial on the main page for more details.
lambda <- 0.8
k_geom <- 30
npcs <- 50
aname <- "processedExp"
se <- Banksy::computeBanksy(se, assay_name = aname, k_geom = k_geom)
set.seed(1000)
se <- Banksy::runBanksyPCA(se, lambda = lambda, npcs = npcs)
set.seed(1000)
se <- Banksy::clusterBanksy(se, lambda = lambda, npcs = npcs, resolution = 0.8)
Cluster labels are stored in the colData
slot:
head(colData(se))
#> DataFrame with 6 rows and 4 columns
#> X Y clust_M1_lam0.8_k50_res0.8 sample_id
#> <numeric> <numeric> <factor> <character>
#> 1 24225.5 23984.2 10 sample01
#> 2 24849.2 22679.1 10 sample01
#> 3 24488.3 22970.3 10 sample01
#> 4 24371.4 23727.5 10 sample01
#> 5 24362.2 23300.6 10 sample01
#> 6 24644.5 23112.8 10 sample01
Visualize clustering results:
cnames <- colnames(colData(se))
cnames <- cnames[grep("^clust", cnames)]
plotColData(se, x = "X", y = "Y", point_size = 0.01, colour_by = cnames[1]) +
scale_color_manual(values = pals::glasbey()) +
coord_equal() +
theme(legend.position = "none")
sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: aarch64-apple-darwin20 (64-bit)
#> Running under: macOS Sonoma 14.2.1
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: America/Detroit
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] cowplot_1.1.3 scater_1.30.1
#> [3] ggplot2_3.4.4 scuttle_1.12.0
#> [5] SpatialExperiment_1.12.0 SingleCellExperiment_1.24.0
#> [7] SummarizedExperiment_1.32.0 Biobase_2.62.0
#> [9] GenomicRanges_1.54.1 GenomeInfoDb_1.38.6
#> [11] IRanges_2.36.0 S4Vectors_0.40.2
#> [13] BiocGenerics_0.48.1 MatrixGenerics_1.14.0
#> [15] matrixStats_1.2.0 data.table_1.15.0
#> [17] Banksy_0.99.12 BiocStyle_2.30.0
#>
#> loaded via a namespace (and not attached):
#> [1] bitops_1.0-7 gridExtra_2.3
#> [3] rlang_1.1.3 magrittr_2.0.3
#> [5] compiler_4.3.2 sccore_1.0.4
#> [7] DelayedMatrixStats_1.24.0 systemfonts_1.0.5
#> [9] vctrs_0.6.5 stringr_1.5.1
#> [11] pkgconfig_2.0.3 crayon_1.5.2
#> [13] fastmap_1.1.1 magick_2.8.2
#> [15] XVector_0.42.0 utf8_1.2.4
#> [17] rmarkdown_2.25 ggbeeswarm_0.7.2
#> [19] ragg_1.2.7 purrr_1.0.2
#> [21] xfun_0.42 zlibbioc_1.48.0
#> [23] cachem_1.0.8 beachmat_2.18.0
#> [25] jsonlite_1.8.8 DelayedArray_0.28.0
#> [27] BiocParallel_1.36.0 irlba_2.3.5.1
#> [29] parallel_4.3.2 aricode_1.0.3
#> [31] R6_2.5.1 bslib_0.6.1
#> [33] stringi_1.8.3 leidenAlg_1.1.2
#> [35] jquerylib_0.1.4 Rcpp_1.0.12
#> [37] bookdown_0.37 knitr_1.45
#> [39] Matrix_1.6-5 igraph_2.0.1.1
#> [41] tidyselect_1.2.0 viridis_0.6.5
#> [43] rstudioapi_0.15.0 abind_1.4-5
#> [45] yaml_2.3.8 codetools_0.2-19
#> [47] lattice_0.22-5 tibble_3.2.1
#> [49] withr_3.0.0 evaluate_0.23
#> [51] desc_1.4.3 mclust_6.0.1
#> [53] pillar_1.9.0 BiocManager_1.30.22
#> [55] generics_0.1.3 dbscan_1.1-12
#> [57] RCurl_1.98-1.14 sparseMatrixStats_1.14.0
#> [59] munsell_0.5.0 scales_1.3.0
#> [61] glue_1.7.0 tools_4.3.2
#> [63] BiocNeighbors_1.20.2 ScaledMatrix_1.10.0
#> [65] fs_1.6.3 grid_4.3.2
#> [67] colorspace_2.1-0 GenomeInfoDbData_1.2.11
#> [69] RcppHungarian_0.3 beeswarm_0.4.0
#> [71] BiocSingular_1.18.0 vipor_0.4.7
#> [73] cli_3.6.2 rsvd_1.0.5
#> [75] textshaping_0.3.7 fansi_1.0.6
#> [77] viridisLite_0.4.2 S4Arrays_1.2.0
#> [79] dplyr_1.1.4 uwot_0.1.16
#> [81] gtable_0.3.4 sass_0.4.8
#> [83] digest_0.6.34 ggrepel_0.9.5
#> [85] SparseArray_1.2.4 rjson_0.2.21
#> [87] memoise_2.0.1 htmltools_0.5.7
#> [89] pkgdown_2.0.7 lifecycle_1.0.4