Skip to contents

A generic function for standardized preprocessing of single-cell RNA-seq data from multiple sources. Handles data.frame/matrix, AnnData, and Seurat inputs with tumor cell filtering. Implements a complete analysis pipeline from raw data to clustered embeddings.

Usage

SCPreProcess(sc, ...)

SCPreProcess(sc, ...)

# Default S3 method
SCPreProcess(
  sc,
  meta_data = NULL,
  column2only_tumor = NULL,
  project = "SC_Screening_Proj",
  min_cells = 400L,
  min_features = 0L,
  quality_control = TRUE,
  quality_control.pattern = c("^MT-"),
  data_filter = TRUE,
  data_filter.nFeature_RNA_thresh = c(200L, 6000L),
  data_filter.percent.mt = 20L,
  normalization_method = "LogNormalize",
  scale_factor = 10000L,
  scale_features = NULL,
  selection_method = "vst",
  resolution = 0.6,
  dims = NULL,
  verbose = TRUE,
  ...
)

# S3 method for class 'matrix'
SCPreProcess(
  sc,
  meta_data = NULL,
  column2only_tumor = NULL,
  project = "SC_Screening_Proj",
  min_cells = 400L,
  min_features = 0L,
  quality_control = TRUE,
  quality_control.pattern = c("^MT-", "^mt-"),
  data_filter = TRUE,
  data_filter.nFeature_RNA_thresh = c(200L, 6000L),
  data_filter.percent.mt = 20L,
  normalization_method = "LogNormalize",
  scale_factor = 10000L,
  scale_features = NULL,
  selection_method = "vst",
  resolution = 0.6,
  dims = NULL,
  verbose = TRUE,
  ...
)

# S3 method for class 'data.frame'
SCPreProcess(
  sc,
  meta_data = NULL,
  column2only_tumor = NULL,
  project = "SC_Screening_Proj",
  min_cells = 400L,
  min_features = 0L,
  quality_control = TRUE,
  quality_control.pattern = c("^MT-", "^mt-"),
  data_filter = TRUE,
  data_filter.nFeature_RNA_thresh = c(200L, 6000L),
  data_filter.percent.mt = 20L,
  normalization_method = "LogNormalize",
  scale_factor = 10000L,
  scale_features = NULL,
  selection_method = "vst",
  resolution = 0.6,
  dims = NULL,
  verbose = TRUE,
  ...
)

# S3 method for class 'dgCMatrix'
SCPreProcess(
  sc,
  meta_data = NULL,
  column2only_tumor = NULL,
  project = "SC_Screening_Proj",
  min_cells = 400L,
  min_features = 0L,
  quality_control = TRUE,
  quality_control.pattern = c("^MT-", "^mt-"),
  data_filter = TRUE,
  data_filter.nFeature_RNA_thresh = c(200L, 6000L),
  data_filter.percent.mt = 20L,
  normalization_method = "LogNormalize",
  scale_factor = 10000L,
  scale_features = NULL,
  selection_method = "vst",
  resolution = 0.6,
  dims = NULL,
  verbose = TRUE,
  ...
)

# S3 method for class 'AnnDataR6'
SCPreProcess(
  sc,
  meta_data = NULL,
  column2only_tumor = NULL,
  project = "SC_Screening_Proj",
  min_cells = 400L,
  min_features = 0L,
  quality_control = TRUE,
  quality_control.pattern = c("^MT-", "^mt-"),
  data_filter = TRUE,
  data_filter.nFeature_RNA_thresh = c(200L, 6000L),
  data_filter.percent.mt = 20L,
  normalization_method = "LogNormalize",
  scale_factor = 10000L,
  scale_features = NULL,
  selection_method = "vst",
  resolution = 0.6,
  dims = NULL,
  verbose = TRUE,
  ...
)

# S3 method for class 'Seurat'
SCPreProcess(sc, column2only_tumor = NULL, verbose = TRUE, ...)

Arguments

sc

Input data, one of:

  • data.frame/matrix/dgCMatrix: Raw count matrix (features x cells)

  • AnnDataR6: Python AnnData object via reticulate

  • Seurat: Preprocessed Seurat object

...

Additional arguments passed to specific methods. Currently unused.

meta_data

A data.frame containing metadata for each cell. It will be added to the Seurat object as @meta.data. If NULL, it will be extracted from the input object if possible.

column2only_tumor

A character of column names in meta_data, used to filter the Seurat object to only tumor cells. If NULL, no filtering is performed.

project

A character of project name, used to name the Seurat object.

min_cells

Minimum number of cells that must express a feature for it to be included in the analysis. Defaults to 400.

min_features

Minimum number of features that must be detected in a cell for it to be included in the analysis. Defaults to 0.

quality_control

Logical indicating whether to perform mitochondrial percentage quality control. Defaults to TRUE.

quality_control.pattern

Character pattern to identify mitochondrial genes, ribosomal protein genes, or other unwanted genes, as well as combinations of these genes. Customized patterns are supported. Defaults to "^MT-".

data_filter

Logical indicating whether to filter cells based on quality metrics. Defaults to TRUE.

data_filter.nFeature_RNA_thresh

Numeric vector of length 2 specifying the minimum and maximum number of features per cell. Defaults to c(200, 6000).

data_filter.percent.mt

Maximum mitochondrial percentage allowed. Defaults to 20.

normalization_method

Method for normalization: "LogNormalize", "CLR", or "RC". Defaults to "LogNormalize".

scale_factor

Scaling factor for normalization. Defaults to 10000.

scale_features

Features to use for scaling. If NULL, uses all variable features. Defaults to NULL.

selection_method

Method for variable feature selection: "vst", "mvp", or "disp". Defaults to "vst".

resolution

Resolution parameter for clustering. Higher values lead to more clusters. Defaults to 0.6.

dims

Dimensions to use for clustering and dimensionality reduction. If NULL, automatically determined by elbow method. Defaults to NULL.

verbose

Logical indicating whether to print progress messages. Defaults to TRUE.

Value

A Seurat object containing:

  • Data filter and quality control

  • Normalized and scaled expression data

  • Variable features

  • PCA/tSNE/UMAP reductions

  • Cluster identities

  • When tumor cells filtered: original dimensions in @misc$raw_dim

  • Final dimensions in @misc$self_dim

Examples

if (FALSE) { # \dontrun{
# Example with matrix input
counts_matrix <- matrix(rpois(1000, 5), nrow = 100, ncol = 10)
rownames(counts_matrix) <- paste0("Gene", 1:100)
colnames(counts_matrix) <- paste0("Cell", 1:10)

seurat_obj <- SCPreProcess(
  sc = counts_matrix,
  project = "TestProject",
  min_features = 50,
  resolution = 0.8
)

# Example with tumor cell filtering
metadata <- data.frame(
  cell_type = c(rep("Tumor", 5), rep("Normal", 5)),
  row.names = paste0("Cell", 1:10)
)

tumor_seurat <- SCPreProcess(
  sc = counts_matrix,
  meta_data = metadata,
  column2only_tumor = "cell_type",
  project = "TumorAnalysis"
)
} # }