A generic function for standardized preprocessing of single-cell RNA-seq data from multiple sources. Handles data.frame/matrix, AnnData, and Seurat inputs with tumor cell filtering. Implements a complete analysis pipeline from raw data to clustered embeddings.
Usage
SCPreProcess(sc, ...)
SCPreProcess(sc, ...)
# Default S3 method
SCPreProcess(
sc,
meta_data = NULL,
column2only_tumor = NULL,
project = "SC_Screening_Proj",
min_cells = 400L,
min_features = 0L,
quality_control = TRUE,
quality_control.pattern = c("^MT-"),
data_filter = TRUE,
data_filter.nFeature_RNA_thresh = c(200L, 6000L),
data_filter.percent.mt = 20L,
normalization_method = "LogNormalize",
scale_factor = 10000L,
scale_features = NULL,
selection_method = "vst",
resolution = 0.6,
dims = NULL,
verbose = TRUE,
...
)
# S3 method for class 'matrix'
SCPreProcess(
sc,
meta_data = NULL,
column2only_tumor = NULL,
project = "SC_Screening_Proj",
min_cells = 400L,
min_features = 0L,
quality_control = TRUE,
quality_control.pattern = c("^MT-", "^mt-"),
data_filter = TRUE,
data_filter.nFeature_RNA_thresh = c(200L, 6000L),
data_filter.percent.mt = 20L,
normalization_method = "LogNormalize",
scale_factor = 10000L,
scale_features = NULL,
selection_method = "vst",
resolution = 0.6,
dims = NULL,
verbose = TRUE,
...
)
# S3 method for class 'data.frame'
SCPreProcess(
sc,
meta_data = NULL,
column2only_tumor = NULL,
project = "SC_Screening_Proj",
min_cells = 400L,
min_features = 0L,
quality_control = TRUE,
quality_control.pattern = c("^MT-", "^mt-"),
data_filter = TRUE,
data_filter.nFeature_RNA_thresh = c(200L, 6000L),
data_filter.percent.mt = 20L,
normalization_method = "LogNormalize",
scale_factor = 10000L,
scale_features = NULL,
selection_method = "vst",
resolution = 0.6,
dims = NULL,
verbose = TRUE,
...
)
# S3 method for class 'dgCMatrix'
SCPreProcess(
sc,
meta_data = NULL,
column2only_tumor = NULL,
project = "SC_Screening_Proj",
min_cells = 400L,
min_features = 0L,
quality_control = TRUE,
quality_control.pattern = c("^MT-", "^mt-"),
data_filter = TRUE,
data_filter.nFeature_RNA_thresh = c(200L, 6000L),
data_filter.percent.mt = 20L,
normalization_method = "LogNormalize",
scale_factor = 10000L,
scale_features = NULL,
selection_method = "vst",
resolution = 0.6,
dims = NULL,
verbose = TRUE,
...
)
# S3 method for class 'AnnDataR6'
SCPreProcess(
sc,
meta_data = NULL,
column2only_tumor = NULL,
project = "SC_Screening_Proj",
min_cells = 400L,
min_features = 0L,
quality_control = TRUE,
quality_control.pattern = c("^MT-", "^mt-"),
data_filter = TRUE,
data_filter.nFeature_RNA_thresh = c(200L, 6000L),
data_filter.percent.mt = 20L,
normalization_method = "LogNormalize",
scale_factor = 10000L,
scale_features = NULL,
selection_method = "vst",
resolution = 0.6,
dims = NULL,
verbose = TRUE,
...
)
# S3 method for class 'Seurat'
SCPreProcess(sc, column2only_tumor = NULL, verbose = TRUE, ...)Arguments
- sc
Input data, one of:
data.frame/matrix/dgCMatrix: Raw count matrix (features x cells)AnnDataR6: Python AnnData object via reticulateSeurat: Preprocessed Seurat object
- ...
Additional arguments passed to specific methods. Currently unused.
- meta_data
A data.frame containing metadata for each cell. It will be added to the Seurat object as
@meta.data. IfNULL, it will be extracted from the input object if possible.- column2only_tumor
A character of column names in
meta_data, used to filter the Seurat object to only tumor cells. IfNULL, no filtering is performed.- project
A character of project name, used to name the Seurat object.
- min_cells
Minimum number of cells that must express a feature for it to be included in the analysis. Defaults to
400.- min_features
Minimum number of features that must be detected in a cell for it to be included in the analysis. Defaults to
0.- quality_control
Logical indicating whether to perform mitochondrial percentage quality control. Defaults to
TRUE.- quality_control.pattern
Character pattern to identify mitochondrial genes, ribosomal protein genes, or other unwanted genes, as well as combinations of these genes. Customized patterns are supported. Defaults to
"^MT-".- data_filter
Logical indicating whether to filter cells based on quality metrics. Defaults to
TRUE.- data_filter.nFeature_RNA_thresh
Numeric vector of length 2 specifying the minimum and maximum number of features per cell. Defaults to
c(200, 6000).- data_filter.percent.mt
Maximum mitochondrial percentage allowed. Defaults to
20.- normalization_method
Method for normalization: "LogNormalize", "CLR", or "RC". Defaults to
"LogNormalize".- scale_factor
Scaling factor for normalization. Defaults to
10000.- scale_features
Features to use for scaling. If NULL, uses all variable features. Defaults to
NULL.- selection_method
Method for variable feature selection: "vst", "mvp", or "disp". Defaults to
"vst".- resolution
Resolution parameter for clustering. Higher values lead to more clusters. Defaults to
0.6.- dims
Dimensions to use for clustering and dimensionality reduction. If NULL, automatically determined by elbow method. Defaults to
NULL.- verbose
Logical indicating whether to print progress messages. Defaults to
TRUE.
Value
A Seurat object containing:
Data filter and quality control
Normalized and scaled expression data
Variable features
PCA/tSNE/UMAP reductions
Cluster identities
When tumor cells filtered: original dimensions in
@misc$raw_dimFinal dimensions in
@misc$self_dim
Examples
if (FALSE) { # \dontrun{
# Example with matrix input
counts_matrix <- matrix(rpois(1000, 5), nrow = 100, ncol = 10)
rownames(counts_matrix) <- paste0("Gene", 1:100)
colnames(counts_matrix) <- paste0("Cell", 1:10)
seurat_obj <- SCPreProcess(
sc = counts_matrix,
project = "TestProject",
min_features = 50,
resolution = 0.8
)
# Example with tumor cell filtering
metadata <- data.frame(
cell_type = c(rep("Tumor", 5), rep("Normal", 5)),
row.names = paste0("Cell", 1:10)
)
tumor_seurat <- SCPreProcess(
sc = counts_matrix,
meta_data = metadata,
column2only_tumor = "cell_type",
project = "TumorAnalysis"
)
} # }