A unified interface for standardized single-cell RNA-seq preprocessing.
It accepts raw counts (matrix/data.frame), AnnData (via anndata or anndataR), or existing Seurat objects.
The analysis flow is fully customizable via a character string pipeline and a configuration list params.
Usage
SCPreProcess(sc, ...)
# Default S3 method
SCPreProcess(
sc = NULL,
...,
pipeline = "onsvpcetu",
params = list(o = list(project = "SC_Screen_Proj", min.cells = 400L), n = list(), s =
list(), v = list(), p = list(), e = list(), c = list(resolution = 0.6), t = list(), u
= list()),
quality_control = list(pattern = c("^MT-")),
data_filter = list(nFeature_thresh = c(200L, 6000L), nCount_thresh = c(500L, 50000L),
percent.mt = 20L, percent.rp = 60L),
column2only_tumor = NULL
)
# S3 method for class 'R6'
SCPreProcess(
sc,
...,
pipeline = "onsvpcetu",
params = list(o = list(project = "SC_Screen_Proj", min.cells = 400L), n = list(), s =
list(), v = list(), p = list(), e = list(), c = list(resolution = 0.6), t = list(), u
= list()),
quality_control = list(pattern = c("^MT-")),
data_filter = list(nFeature_thresh = c(200L, 6000L), nCount_thresh = c(500L, 50000L),
percent.mt = 20L, percent.rp = 60L),
column2only_tumor = NULL
)
# S3 method for class 'Seurat'
SCPreProcess(sc, column2only_tumor = NULL, ...)Arguments
- sc
Input data. Can be:
Matrix/Data frame: Raw count matrix (genes x cells).
AnnData: Python AnnData object (read via
anndataoranndataRpackages).Seurat: A Seurat object (automatically validated and repaired if necessary).
- ...
Additional arguments for backward compatibility (mapped to
params) or verbose control.- pipeline
A character string defining the processing steps and order. Characters map to Seurat functions:
'o':CreateSeuratObject(Must be the first step and cannot be deleted)'n':NormalizeData's':ScaleData'v':FindVariableFeatures'p':RunPCA'e':FindNeighbors(Because "n" is used)'c':FindClusters't':RunTSNE'u':RunUMAP'r':SCTransform(Alternative to n/s/v)
Default is
"onsvpcetu".- params
A named list of lists containing arguments for each pipeline step. Keys match the pipeline characters (e.g.,
params$nforNormalizeData). Default structure:- quality_control
A list containing regex patterns for QC metric calculation. (See QCPatternDetect) Default:
list(pattern = "^MT-"). Detected metrics (e.g., percent.mt) are added to meta.data.- data_filter
A list of thresholds for cell filtering. Default:
assay("RNA"),nFeature_RNA(200-6000),nCount_RNA(500-50000),percent.mt(<20),percent.rp(<60). Only metrics detected viaquality_controlare filtered, i.e., nFeature_RNA, nCount_RNA and percent.mt.- column2only_tumor
Optional character. Column name in metadata to filter for tumor cells (matches "Tumor", "Cancer", "Malignant", etc.). If
NULL, no filtering is performed.
Details
Pipeline Strategy:
The function parses the pipeline string and executes corresponding Seurat functions in order.
To use SCTransform, simply change the pipeline string (e.g., "orpetu") and provide parameters in params$r.
Quality Control & Filtering:
QC metrics are generated based on regex patterns in quality_control.
Cells are then filtered based on thresholds in data_filter.
Column names for filtering are auto-generated (e.g., pattern "^MT-" -> filter "percent.mt").
If confused about the column name, use SigBridgeR:::Pattern2Colname().
See also
Other single_cell_preprocess:
FilterTumorCell(),
FindRobustElbow(),
Pattern2Colname(),
QCPatternDetect(),
RegisterSeuratMethod(),
SCAnnotate(),
SCIntegrate(),
SCPreProcessStrategy,
compatible_with_3.0.2()
Other input_preprocess:
BulkPreProcess(),
PhenoMap(),
PhenoPreProcess()
Examples
if (FALSE) { # \dontrun{
# 1. Standard pipeline (LogNormalize -> Scale -> PCA -> UMAP)
obj <- SCPreProcess(
sc = counts_matrix,
pipeline = "onsvpcetu",
params = list(c = list(resolution = 0.8))
)
# 2. SCTransform pipeline
obj_sct <- SCPreProcess(
sc = counts_matrix,
pipeline = "orpcu", # Create -> SCT -> PCA -> Clusters -> UMAP
quality_control = list(pattern = c("^MT-", "^RP[LS]")),
params = list(
r = list(vars.to.regress = "percent.mt")
)
)
# 3. Start from AnnData with tumor filtering
adata_object <- anndataR::read_h5ad("data.h5ad")
obj_ad <- SCPreProcess(
sc = adata_object,
column2only_tumor = "tissue"
)
} # }