library(Seurat)
Attaching SeuratObject
We load the combined Seurat object :
sobj <- readRDS("/shared/projects/form_2022_32/SingleCellRNASeq/Normalization/Scaled_Normalized_Seurat_Object.RDS")
sobj
An object of class Seurat
6941 features across 3984 samples within 1 assay
Active assay: RNA (6941 features, 2000 variable features)
We remove unused columns in metadata :
sobj@meta.data[,c("nCount", "nFeature", "log_nCount", "log_nFeature")] <- NULL
We run a PCA on the HVG, to compute 50 principal components :
sobj <- RunPCA(object = sobj, npcs = 50, verbose = FALSE)
Elbow plot
Elbow plot helps to choose the number of relevant PC to use for the 2D projection.
ElbowPlot(sobj, ndims = 50)
We run a UMAP on 20 PC :
sobj <- RunUMAP(object = sobj, dims = 1:20)
Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session
00:33:33 UMAP embedding parameters a = 0.9922 b = 1.112
00:33:33 Read 3984 rows and found 20 numeric columns
00:33:33 Using Annoy for neighbor search, n_neighbors = 30
00:33:33 Building Annoy index with metric = cosine, n_trees = 50
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
00:33:33 Writing NN index file to temp file /shared/home/bjob/MYRTMP/RtmpN6wejG/file126b53afe461
00:33:33 Searching Annoy index using 1 thread, search_k = 3000
00:33:34 Annoy recall = 100%
00:33:35 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
00:33:35 Initializing from normalized Laplacian + noise (using irlba)
00:33:35 Commencing optimization for 500 epochs, with 163890 positive edges
00:33:41 Optimization finished
We check whether there is a batch-effect or not :
DimPlot(object = sobj, reduction = "umap", group.by = "orig.ident", cols = c("gold2", "purple"))+ggplot2::labs(title = "Sample id")
Some corrections are required because the batch effect is obvious.
We are going to use the FastMNN (Fast Mutual Nearest Neighbors) algorithm
set.seed(42)
sobj <- harmony::RunHarmony(sobj, group.by.vars = "orig.ident")
Harmony 1/10
Harmony 2/10
Harmony converged after 2 iterations
Warning: Invalid name supplied, making object name syntactically valid. New
object name is Seurat..ProjectDim.RNA.harmony; see ?make.names for more details
on syntax validity
sobj
An object of class Seurat
6941 features across 3984 samples within 1 assay
Active assay: RNA (6941 features, 2000 variable features)
3 dimensional reductions calculated: pca, umap, harmony
We compute again a UMAP on this batch effect corrected reduced space :
sobj <- RunUMAP(object = sobj, dims = 1:20, reduction = "harmony")
00:33:45 UMAP embedding parameters a = 0.9922 b = 1.112
00:33:45 Read 3984 rows and found 20 numeric columns
00:33:45 Using Annoy for neighbor search, n_neighbors = 30
00:33:45 Building Annoy index with metric = cosine, n_trees = 50
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
00:33:46 Writing NN index file to temp file /shared/home/bjob/MYRTMP/RtmpN6wejG/file126b7b176b9d
00:33:46 Searching Annoy index using 1 thread, search_k = 3000
00:33:47 Annoy recall = 100%
00:33:47 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
00:33:48 Initializing from normalized Laplacian + noise (using irlba)
00:33:48 Commencing optimization for 500 epochs, with 166752 positive edges
00:33:54 Optimization finished
Is there still a batch effet ?
DimPlot(object = sobj, reduction = "umap", group.by = "orig.ident", cols = c("gold2", "purple"))+ggplot2::labs(title = "Sample id")
We can split by sample ID :
DimPlot(object = sobj, reduction = "umap", group.by = "orig.ident", cols = c("gold2", "purple"), split.by = "orig.ident", pt.size = 2)+ggplot2::labs(title = "Sample id")
DimPlot(object = sobj, reduction = "umap", group.by = "cc_seurat.Phase")
FeaturePlot(object = sobj, features = "cc_seurat.SmG2M.Score", cols = c("blue", "snow2", "red"))
FeaturePlot(object = sobj, features = "subsets_Mito_percent")
We cluster cells :
sobj <- FindNeighbors(object = sobj, dims = 1:20, reduction = "harmony")
Computing nearest neighbor graph
Computing SNN
sobj <- FindClusters(object = sobj, resolution = c(.6, 0.8, 1))
We can visualize clusters :
DimPlot(object = sobj, reduction = "umap", group.by = c("RNA_snn_res.0.6", "RNA_snn_res.0.8", "RNA_snn_res.1"), label = TRUE, repel = TRUE)
DimPlot(object = sobj, reduction = "umap", group.by = c("RNA_snn_res.0.6", "RNA_snn_res.0.8", "RNA_snn_res.1"), split.by = "orig.ident", label = TRUE, repel = TRUE, pt.size = 1, label.size = 9)
table(sobj$RNA_snn_res.0.8)
0 1 2 3 4 5 6 7 8 9 10
827 717 568 529 379 332 321 151 65 49 46
round(100*prop.table(table(sobj$orig.ident, sobj$RNA_snn_res.0.8), margin = 2),2)
0 1 2 3 4 5 6 7 8 9 10
TD3A 43.17 33.89 60.56 48.02 48.02 56.93 91.28 24.50 87.69 93.88 10.87
TDCT 56.83 66.11 39.44 51.98 51.98 43.07 8.72 75.50 12.31 6.12 89.13
path <- "/shared/projects/form_2022_32/SingleCellRNASeq/Normalization/"
saveRDS(object = sobj,
file = paste0(path, "/Scaled_Normalized_Harmony_Clustering_Seurat_Object.RDS"))
sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS
Matrix products: default
BLAS/LAPACK: /shared/ifbstor1/software/miniconda/envs/r-4.2.1/lib/libopenblasp-r0.3.21.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] SeuratObject_4.1.3 Seurat_4.2.1
loaded via a namespace (and not attached):
[1] Rtsne_0.16 colorspace_2.0-3 deldir_1.0-6
[4] ellipsis_0.3.2 ggridges_0.5.4 rstudioapi_0.14
[7] spatstat.data_3.0-0 farver_2.1.1 leiden_0.4.3
[10] listenv_0.8.0 ggrepel_0.9.2 fansi_1.0.3
[13] codetools_0.2-18 splines_4.2.1 cachem_1.0.6
[16] knitr_1.40 polyclip_1.10-4 jsonlite_1.8.3
[19] ica_1.0-3 cluster_2.1.4 png_0.1-7
[22] uwot_0.1.14 shiny_1.7.3 sctransform_0.3.5
[25] spatstat.sparse_3.0-0 compiler_4.2.1 httr_1.4.4
[28] assertthat_0.2.1 Matrix_1.5-3 fastmap_1.1.0
[31] lazyeval_0.2.2 cli_3.4.1 later_1.3.0
[34] htmltools_0.5.3 tools_4.2.1 igraph_1.3.5
[37] gtable_0.3.1 glue_1.6.2 RANN_2.6.1
[40] reshape2_1.4.4 dplyr_1.0.10 Rcpp_1.0.9
[43] scattermore_0.8 jquerylib_0.1.4 vctrs_0.5.0
[46] nlme_3.1-160 spatstat.explore_3.0-5 progressr_0.11.0
[49] lmtest_0.9-40 spatstat.random_3.0-1 xfun_0.34
[52] stringr_1.4.1 globals_0.16.1 mime_0.12
[55] miniUI_0.1.1.1 lifecycle_1.0.3 irlba_2.3.5.1
[58] goftest_1.2-3 future_1.29.0 MASS_7.3-58.1
[61] zoo_1.8-11 scales_1.2.1 promises_1.2.0.1
[64] spatstat.utils_3.0-1 parallel_4.2.1 RColorBrewer_1.1-3
[67] yaml_2.3.6 reticulate_1.26 pbapply_1.5-0
[70] gridExtra_2.3 ggplot2_3.4.0 sass_0.4.2
[73] stringi_1.7.8 highr_0.9 harmony_0.1.1
[76] rlang_1.0.6 pkgconfig_2.0.3 matrixStats_0.62.0
[79] evaluate_0.18 lattice_0.20-45 tensor_1.5
[82] ROCR_1.0-11 purrr_0.3.5 labeling_0.4.2
[85] patchwork_1.1.2 htmlwidgets_1.5.4 cowplot_1.1.1
[88] tidyselect_1.2.0 parallelly_1.32.1 RcppAnnoy_0.0.20
[91] plyr_1.8.8 magrittr_2.0.3 R6_2.5.1
[94] generics_0.1.3 DBI_1.1.3 withr_2.5.0
[97] pillar_1.8.1 fitdistrplus_1.1-8 survival_3.4-0
[100] abind_1.4-5 sp_1.5-1 tibble_3.1.8
[103] future.apply_1.10.0 KernSmooth_2.23-20 utf8_1.2.2
[106] spatstat.geom_3.0-3 plotly_4.10.1 rmarkdown_2.18
[109] grid_4.2.1 data.table_1.14.4 digest_0.6.30
[112] xtable_1.8-4 tidyr_1.2.1 httpuv_1.6.6
[115] munsell_0.5.0 viridisLite_0.4.1 bslib_0.4.1