Skip to contents

Performs single-cell phenotype screening using gene set enrichment analysis. Can either use a fixed probability threshold or automatically find the optimal threshold by testing multiple values and maximizing NES difference.

Usage

ScPP.optimized(sc_dataset, geneList, probs = c(0.2, NULL), verbose)

Arguments

sc_dataset

A Seurat object containing single-cell RNA-seq data. Must have RNA assay with normalized data.

geneList

A named list containing gene sets:

  • gene_pos - Genes associated with positive phenotype (required)

  • gene_neg - Genes associated with negative phenotype (required)

  • genes_sort - Named numeric vector of ranked genes (required for optimization mode)

probs

Numeric value or vector of probability thresholds:

  • Single value (e.g., 0.2): Performs phenotype profiling with fixed threshold

  • Multiple values (e.g., seq(0.2, 0.45, by = 0.05)): Finds optimal threshold

  • NULL (default): Automatically searches optimal threshold using seq(0.2, 0.45, by = 0.05)

Value

A list with three components:

  • metadata - Data frame with cell metadata including scPP_AUCup, scPP_AUCdown, scPP

  • Genes_pos - Genes upregulated in Positive vs Negative

  • Genes_neg - Genes upregulated in Negative vs Positive

Details

This function operates in two modes based on the probs parameter:

Fixed Threshold Mode (length(probs) == 1):

  1. Computes AUCell scores for positive and negative gene sets

  2. Classifies cells based on the specified threshold

  3. Identifies differential markers between phenotype groups

  4. Returns complete results with metadata and marker genes

Optimization Mode (length(probs) > 1 or NULL):

  1. Tests multiple probability thresholds

  2. For each threshold, classifies cells and finds markers

  3. Runs GSEA to calculate NES for marker sets

  4. Returns threshold with maximum NES difference (Positive - Negative)

  5. Requires genes_sort in geneList for GSEA analysis

Note

  • Fixed threshold mode: Faster, returns detailed results

  • Optimization mode: Slower, requires genes_sort, but robust

Examples

if (FALSE) { # \dontrun{
# Fixed threshold mode
result <- ScPP.optimized(
  sc_dataset = seurat_obj,
  geneList = list(
    gene_pos = c("CD4", "IL7R"),
    gene_neg = c("CD8A", "CD8B")
  ),
  probs = 0.2
)

# Optimization mode
result <- ScPP.optimized(
  sc_dataset = seurat_obj,
  geneList = list(
    gene_pos = c("CD4", "IL7R"),
    gene_neg = c("CD8A", "CD8B"),
    genes_sort = ranked_genes
  ),
  probs = NULL  # or seq(0.2, 0.45, by = 0.05)
)
} # }