library(Seurat)
Attaching SeuratObject

We load the combined Seurat object :

sobj <- readRDS("/shared/projects/form_2022_32/SingleCellRNASeq/Normalization/Scaled_Normalized_Seurat_Object.RDS")
sobj
An object of class Seurat 
6941 features across 3984 samples within 1 assay 
Active assay: RNA (6941 features, 2000 variable features)

We remove unused columns in metadata :

sobj@meta.data[,c("nCount", "nFeature", "log_nCount", "log_nFeature")] <- NULL

Dimentionality Reduction

Linear Dimentionality reduction (PCA)

We run a PCA on the HVG, to compute 50 principal components :

sobj <- RunPCA(object = sobj, npcs = 50, verbose = FALSE)

Chosing number of relevant principal components

Elbow plot

Elbow plot helps to choose the number of relevant PC to use for the 2D projection.

ElbowPlot(sobj, ndims = 50)

Non-Linear Dimentionality reduction (UMAP)

We run a UMAP on 20 PC :

sobj <- RunUMAP(object = sobj, dims = 1:20)
Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session
00:33:33 UMAP embedding parameters a = 0.9922 b = 1.112
00:33:33 Read 3984 rows and found 20 numeric columns
00:33:33 Using Annoy for neighbor search, n_neighbors = 30
00:33:33 Building Annoy index with metric = cosine, n_trees = 50
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
00:33:33 Writing NN index file to temp file /shared/home/bjob/MYRTMP/RtmpN6wejG/file126b53afe461
00:33:33 Searching Annoy index using 1 thread, search_k = 3000
00:33:34 Annoy recall = 100%
00:33:35 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
00:33:35 Initializing from normalized Laplacian + noise (using irlba)
00:33:35 Commencing optimization for 500 epochs, with 163890 positive edges
00:33:41 Optimization finished

We check whether there is a batch-effect or not :

DimPlot(object = sobj, reduction = "umap", group.by = "orig.ident", cols = c("gold2", "purple"))+ggplot2::labs(title = "Sample id")

Some corrections are required because the batch effect is obvious.

We are going to use the FastMNN (Fast Mutual Nearest Neighbors) algorithm

set.seed(42)
sobj <- harmony::RunHarmony(sobj, group.by.vars = "orig.ident")
Harmony 1/10
Harmony 2/10
Harmony converged after 2 iterations
Warning: Invalid name supplied, making object name syntactically valid. New
object name is Seurat..ProjectDim.RNA.harmony; see ?make.names for more details
on syntax validity
sobj
An object of class Seurat 
6941 features across 3984 samples within 1 assay 
Active assay: RNA (6941 features, 2000 variable features)
 3 dimensional reductions calculated: pca, umap, harmony

We compute again a UMAP on this batch effect corrected reduced space :

sobj <- RunUMAP(object = sobj, dims = 1:20, reduction = "harmony")
00:33:45 UMAP embedding parameters a = 0.9922 b = 1.112
00:33:45 Read 3984 rows and found 20 numeric columns
00:33:45 Using Annoy for neighbor search, n_neighbors = 30
00:33:45 Building Annoy index with metric = cosine, n_trees = 50
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
00:33:46 Writing NN index file to temp file /shared/home/bjob/MYRTMP/RtmpN6wejG/file126b7b176b9d
00:33:46 Searching Annoy index using 1 thread, search_k = 3000
00:33:47 Annoy recall = 100%
00:33:47 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
00:33:48 Initializing from normalized Laplacian + noise (using irlba)
00:33:48 Commencing optimization for 500 epochs, with 166752 positive edges
00:33:54 Optimization finished

Is there still a batch effet ?

DimPlot(object = sobj, reduction = "umap", group.by = "orig.ident", cols = c("gold2", "purple"))+ggplot2::labs(title = "Sample id")

We can split by sample ID :

DimPlot(object = sobj, reduction = "umap", group.by = "orig.ident", cols = c("gold2", "purple"), split.by = "orig.ident", pt.size = 2)+ggplot2::labs(title = "Sample id")

Projecting diverse information onto the umap space

  • Cell Cycle Phases
DimPlot(object = sobj, reduction = "umap", group.by = "cc_seurat.Phase")

  • G2M score
FeaturePlot(object = sobj, features = "cc_seurat.SmG2M.Score", cols = c("blue", "snow2", "red"))

  • Percentage of mitochondrial genes
FeaturePlot(object = sobj, features = "subsets_Mito_percent")

Clustering

We cluster cells :

sobj <- FindNeighbors(object = sobj, dims = 1:20, reduction = "harmony")
Computing nearest neighbor graph
Computing SNN
sobj <- FindClusters(object = sobj, resolution = c(.6, 0.8, 1))

We can visualize clusters :

DimPlot(object = sobj, reduction = "umap", group.by = c("RNA_snn_res.0.6", "RNA_snn_res.0.8", "RNA_snn_res.1"), label = TRUE, repel = TRUE)

DimPlot(object = sobj, reduction = "umap", group.by = c("RNA_snn_res.0.6", "RNA_snn_res.0.8", "RNA_snn_res.1"), split.by = "orig.ident", label = TRUE, repel = TRUE, pt.size = 1, label.size = 9)

table(sobj$RNA_snn_res.0.8)

  0   1   2   3   4   5   6   7   8   9  10 
827 717 568 529 379 332 321 151  65  49  46 
round(100*prop.table(table(sobj$orig.ident, sobj$RNA_snn_res.0.8), margin = 2),2)
      
           0     1     2     3     4     5     6     7     8     9    10
  TD3A 43.17 33.89 60.56 48.02 48.02 56.93 91.28 24.50 87.69 93.88 10.87
  TDCT 56.83 66.11 39.44 51.98 51.98 43.07  8.72 75.50 12.31  6.12 89.13

Save

path <- "/shared/projects/form_2022_32/SingleCellRNASeq/Normalization/"
saveRDS(object = sobj,
        file = paste0(path, "/Scaled_Normalized_Harmony_Clustering_Seurat_Object.RDS"))

Session

sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS

Matrix products: default
BLAS/LAPACK: /shared/ifbstor1/software/miniconda/envs/r-4.2.1/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] SeuratObject_4.1.3 Seurat_4.2.1      

loaded via a namespace (and not attached):
  [1] Rtsne_0.16             colorspace_2.0-3       deldir_1.0-6          
  [4] ellipsis_0.3.2         ggridges_0.5.4         rstudioapi_0.14       
  [7] spatstat.data_3.0-0    farver_2.1.1           leiden_0.4.3          
 [10] listenv_0.8.0          ggrepel_0.9.2          fansi_1.0.3           
 [13] codetools_0.2-18       splines_4.2.1          cachem_1.0.6          
 [16] knitr_1.40             polyclip_1.10-4        jsonlite_1.8.3        
 [19] ica_1.0-3              cluster_2.1.4          png_0.1-7             
 [22] uwot_0.1.14            shiny_1.7.3            sctransform_0.3.5     
 [25] spatstat.sparse_3.0-0  compiler_4.2.1         httr_1.4.4            
 [28] assertthat_0.2.1       Matrix_1.5-3           fastmap_1.1.0         
 [31] lazyeval_0.2.2         cli_3.4.1              later_1.3.0           
 [34] htmltools_0.5.3        tools_4.2.1            igraph_1.3.5          
 [37] gtable_0.3.1           glue_1.6.2             RANN_2.6.1            
 [40] reshape2_1.4.4         dplyr_1.0.10           Rcpp_1.0.9            
 [43] scattermore_0.8        jquerylib_0.1.4        vctrs_0.5.0           
 [46] nlme_3.1-160           spatstat.explore_3.0-5 progressr_0.11.0      
 [49] lmtest_0.9-40          spatstat.random_3.0-1  xfun_0.34             
 [52] stringr_1.4.1          globals_0.16.1         mime_0.12             
 [55] miniUI_0.1.1.1         lifecycle_1.0.3        irlba_2.3.5.1         
 [58] goftest_1.2-3          future_1.29.0          MASS_7.3-58.1         
 [61] zoo_1.8-11             scales_1.2.1           promises_1.2.0.1      
 [64] spatstat.utils_3.0-1   parallel_4.2.1         RColorBrewer_1.1-3    
 [67] yaml_2.3.6             reticulate_1.26        pbapply_1.5-0         
 [70] gridExtra_2.3          ggplot2_3.4.0          sass_0.4.2            
 [73] stringi_1.7.8          highr_0.9              harmony_0.1.1         
 [76] rlang_1.0.6            pkgconfig_2.0.3        matrixStats_0.62.0    
 [79] evaluate_0.18          lattice_0.20-45        tensor_1.5            
 [82] ROCR_1.0-11            purrr_0.3.5            labeling_0.4.2        
 [85] patchwork_1.1.2        htmlwidgets_1.5.4      cowplot_1.1.1         
 [88] tidyselect_1.2.0       parallelly_1.32.1      RcppAnnoy_0.0.20      
 [91] plyr_1.8.8             magrittr_2.0.3         R6_2.5.1              
 [94] generics_0.1.3         DBI_1.1.3              withr_2.5.0           
 [97] pillar_1.8.1           fitdistrplus_1.1-8     survival_3.4-0        
[100] abind_1.4-5            sp_1.5-1               tibble_3.1.8          
[103] future.apply_1.10.0    KernSmooth_2.23-20     utf8_1.2.2            
[106] spatstat.geom_3.0-3    plotly_4.10.1          rmarkdown_2.18        
[109] grid_4.2.1             data.table_1.14.4      digest_0.6.30         
[112] xtable_1.8-4           tidyr_1.2.1            httpuv_1.6.6          
[115] munsell_0.5.0          viridisLite_0.4.1      bslib_0.4.1