Skip to contents

A generic function for standardized preprocessing of single-cell RNA-seq data from multiple sources. Handles data.frame/matrix, AnnData, and Seurat inputs with tumor cell filtering. Implements a complete analysis pipeline from raw data to clustered embeddings.

Usage

SCPreProcess(sc, ...)

SCPreProcess(sc, ...)

# Default S3 method
SCPreProcess(sc, ...)

# S3 method for class 'matrix'
SCPreProcess(
  sc,
  meta_data = NULL,
  column2only_tumor = NULL,
  project = glue::glue("{TimeStamp()}_SC_Screening_Proj"),
  min_cells = 400,
  min_features = 0,
  quality_control = TRUE,
  quality_control.pattern = c("^MT-", "^mt-"),
  data_filter = TRUE,
  data_filter.nFeature_RNA_thresh = c(200, 6000),
  data_filter.percent.mt = 20,
  normalization_method = "LogNormalize",
  scale_factor = 10000,
  scale_features = NULL,
  selection_method = "vst",
  resolution = 0.6,
  dims = 1:10,
  verbose = TRUE,
  ...
)

# S3 method for class 'data.frame'
SCPreProcess(
  sc,
  meta_data = NULL,
  column2only_tumor = NULL,
  project = glue::glue("{TimeStamp()}_SC_Screening_Proj"),
  min_cells = 400,
  min_features = 0,
  quality_control = TRUE,
  quality_control.pattern = c("^MT-", "^mt-"),
  data_filter = TRUE,
  data_filter.nFeature_RNA_thresh = c(200, 6000),
  data_filter.percent.mt = 20,
  normalization_method = "LogNormalize",
  scale_factor = 10000,
  scale_features = NULL,
  selection_method = "vst",
  resolution = 0.6,
  dims = 1:10,
  verbose = TRUE,
  ...
)

# S3 method for class 'dgCMatrix'
SCPreProcess(
  sc,
  meta_data = NULL,
  column2only_tumor = NULL,
  project = glue::glue("{TimeStamp()}_SC_Screening_Proj"),
  min_cells = 400,
  min_features = 0,
  quality_control = TRUE,
  quality_control.pattern = "^MT-",
  data_filter = TRUE,
  data_filter.nFeature_RNA_thresh = c(200, 6000),
  data_filter.percent.mt = 20,
  normalization_method = "LogNormalize",
  scale_factor = 10000,
  scale_features = NULL,
  selection_method = "vst",
  resolution = 0.6,
  dims = 1:10,
  verbose = TRUE,
  ...
)

# S3 method for class 'AnnDataR6'
SCPreProcess(
  sc,
  meta_data = NULL,
  column2only_tumor = NULL,
  project = glue::glue("{TimeStamp()}_SC_Screening_Proj"),
  min_cells = 400,
  min_features = 0,
  quality_control = TRUE,
  quality_control.pattern = c("^MT-", "^mt-"),
  data_filter = TRUE,
  data_filter.nFeature_RNA_thresh = c(200, 6000),
  data_filter.percent.mt = 20,
  normalization_method = "LogNormalize",
  scale_factor = 10000,
  scale_features = NULL,
  selection_method = "vst",
  resolution = 0.6,
  dims = 1:10,
  verbose = TRUE,
  ...
)

# S3 method for class 'Seurat'
SCPreProcess(sc, column2only_tumor = NULL, verbose = TRUE, ...)

Arguments

sc

Input data, one of: - data.frame/matrix/dgCMatrix: Raw count matrix (features x cells) - AnnDataR6: Python AnnData object via reticulate - Seurat: Preprocessed Seurat object

...

Method-specific arguments (see below)

meta_data

Optional metadata dataframe (rows = cells, columns = attributes)

column2only_tumor

Metadata column used for filtering tumor cells, matching patterns such as "tumor" (or "tumour"), "cancer", "malignant", or "neoplasm" (case-insensitive).

project

Project name for Seurat object

min_cells

Minimum cells per gene to retain (features present in at least this many cells)

min_features

Minimum features per cell to retain (cells with at least this many features)

quality_control

Logical indicating whether to perform mitochondrial QC

quality_control.pattern

Regex pattern to identify mitochondrial genes (e.g., "^MT-" for human)

data_filter

Logical indicating whether to filter low-quality cells

data_filter.nFeature_RNA_thresh

Numeric vector of length 2 specifying (min, max) features per cell

data_filter.percent.mt

Maximum mitochondrial percentage allowed (0-100)

normalization_method

Normalization method ("LogNormalize", "CLR", or "RC")

scale_factor

Scaling factor for normalization

scale_features

Scale features to unit variance

selection_method

Variable feature selection method ("vst", "mvp", or "disp")

resolution

Cluster resolution (higher for more clusters)

dims

PCA dimensions to use

verbose

Print progress messages

Value

A Seurat object containing:

  • Data filter and quality control

  • Normalized and scaled expression data

  • Variable features

  • PCA/tSNE/UMAP reductions

  • Cluster identities

  • When tumor cells filtered: original dimensions in @misc$raw_dim

  • Final dimensions in @misc$self_dim