A generic function for standardized preprocessing of single-cell RNA-seq data from multiple sources. Handles data.frame/matrix, AnnData, and Seurat inputs with tumor cell filtering. Implements a complete analysis pipeline from raw data to clustered embeddings.
Usage
SCPreProcess(sc, ...)
SCPreProcess(sc, ...)
# Default S3 method
SCPreProcess(sc, ...)
# S3 method for class 'matrix'
SCPreProcess(
sc,
meta_data = NULL,
column2only_tumor = NULL,
project = glue::glue("{TimeStamp()}_SC_Screening_Proj"),
min_cells = 400,
min_features = 0,
quality_control = TRUE,
quality_control.pattern = c("^MT-", "^mt-"),
data_filter = TRUE,
data_filter.nFeature_RNA_thresh = c(200, 6000),
data_filter.percent.mt = 20,
normalization_method = "LogNormalize",
scale_factor = 10000,
scale_features = NULL,
selection_method = "vst",
resolution = 0.6,
dims = 1:10,
verbose = TRUE,
...
)
# S3 method for class 'data.frame'
SCPreProcess(
sc,
meta_data = NULL,
column2only_tumor = NULL,
project = glue::glue("{TimeStamp()}_SC_Screening_Proj"),
min_cells = 400,
min_features = 0,
quality_control = TRUE,
quality_control.pattern = c("^MT-", "^mt-"),
data_filter = TRUE,
data_filter.nFeature_RNA_thresh = c(200, 6000),
data_filter.percent.mt = 20,
normalization_method = "LogNormalize",
scale_factor = 10000,
scale_features = NULL,
selection_method = "vst",
resolution = 0.6,
dims = 1:10,
verbose = TRUE,
...
)
# S3 method for class 'dgCMatrix'
SCPreProcess(
sc,
meta_data = NULL,
column2only_tumor = NULL,
project = glue::glue("{TimeStamp()}_SC_Screening_Proj"),
min_cells = 400,
min_features = 0,
quality_control = TRUE,
quality_control.pattern = "^MT-",
data_filter = TRUE,
data_filter.nFeature_RNA_thresh = c(200, 6000),
data_filter.percent.mt = 20,
normalization_method = "LogNormalize",
scale_factor = 10000,
scale_features = NULL,
selection_method = "vst",
resolution = 0.6,
dims = 1:10,
verbose = TRUE,
...
)
# S3 method for class 'AnnDataR6'
SCPreProcess(
sc,
meta_data = NULL,
column2only_tumor = NULL,
project = glue::glue("{TimeStamp()}_SC_Screening_Proj"),
min_cells = 400,
min_features = 0,
quality_control = TRUE,
quality_control.pattern = c("^MT-", "^mt-"),
data_filter = TRUE,
data_filter.nFeature_RNA_thresh = c(200, 6000),
data_filter.percent.mt = 20,
normalization_method = "LogNormalize",
scale_factor = 10000,
scale_features = NULL,
selection_method = "vst",
resolution = 0.6,
dims = 1:10,
verbose = TRUE,
...
)
# S3 method for class 'Seurat'
SCPreProcess(sc, column2only_tumor = NULL, verbose = TRUE, ...)
Arguments
- sc
Input data, one of: -
data.frame/matrix/dgCMatrix
: Raw count matrix (features x cells) -AnnDataR6
: Python AnnData object via reticulate -Seurat
: Preprocessed Seurat object- ...
Method-specific arguments (see below)
- meta_data
Optional metadata dataframe (rows = cells, columns = attributes)
- column2only_tumor
Metadata column used for filtering tumor cells, matching patterns such as "tumor" (or "tumour"), "cancer", "malignant", or "neoplasm" (case-insensitive).
- project
Project name for Seurat object
- min_cells
Minimum cells per gene to retain (features present in at least this many cells)
- min_features
Minimum features per cell to retain (cells with at least this many features)
- quality_control
Logical indicating whether to perform mitochondrial QC
- quality_control.pattern
Regex pattern to identify mitochondrial genes (e.g., "^MT-" for human)
- data_filter
Logical indicating whether to filter low-quality cells
- data_filter.nFeature_RNA_thresh
Numeric vector of length 2 specifying (min, max) features per cell
- data_filter.percent.mt
Maximum mitochondrial percentage allowed (0-100)
- normalization_method
Normalization method ("LogNormalize", "CLR", or "RC")
- scale_factor
Scaling factor for normalization
- scale_features
Scale features to unit variance
- selection_method
Variable feature selection method ("vst", "mvp", or "disp")
- resolution
Cluster resolution (higher for more clusters)
- dims
PCA dimensions to use
- verbose
Print progress messages