Performs single-cell phenotype association analysis using the SCIPAC method. Integrates bulk RNA-seq phenotype information with single-cell transcriptomic data to identify cell populations associated with clinical outcomes.
Usage
DoSCIPAC(
matched_bulk,
sc_data,
phenotype,
label_type = "SCIPAC",
phenotype_class = c("binary", "survival", "continuous"),
hvg = 1000L,
do_pca_sc = FALSE,
n_pc = 60L,
sc_batch_col = NULL,
resolution = 2L,
ela_net_alpha = 0.4,
bt_size = 50L,
ncore = 7L,
ci_alpha = 0.05,
nfold = 10L,
...
)Arguments
- matched_bulk
Matrix or data frame of preprocessed bulk RNA-seq expression data (genes × samples). Column names must match names/IDs in
phenotype.- sc_data
A matrix/Matrix (genes × cells) or a Seurat object containing scRNA-seq data to be screened.
- phenotype
Phenotype data, either:
Named vector (names match
matched_bulkcolumns)Data frame with row names matching
matched_bulkcolumns, containing"time"and"status"columns for survival analysis
- label_type
Character. Context info specifying the screening method identifier. Default:
"SCIPAC".- phenotype_class
Type of phenotypic outcome:
"binary"Binary traits (e.g., case/control, responder/non-responder)
"continuous"Continuous measurements (e.g., age, biomarker levels)
"survival"Survival information (time-to-event data)
Partial matching is supported.
- hvg
Integer. Number of highly variable genes to use for preprocessing. Default:
1000L.- do_pca_sc
Logical. if
TRUE, first do PCA onsc.datand use the rotation matrix onbulk.dat; ifFalse, first do PCA onbulk.dataand use the rotation matrix onsc.dat. The default isFALSE.- n_pc
Integer. Number of principal components to use. Default:
60L.- sc_batch_col
Character or vector. Batch variable for single-cell data. If character, should be a column name in
sc_data@metadata. If vector, should be a batch assignment vector matching cell order. Default:NULL(no batch correction).- resolution
Integer. Clustering resolution for cell type identification. Higher values produce more clusters. Default:
2L.- ela_net_alpha
Numeric. Elastic net mixing parameter (0 = ridge, 1 = lasso). Default:
0.4.- bt_size
Integer. Bootstrap sample size for stability assessment. Default:
50L.- ncore
Integer. Number of CPU cores for parallel computation. Default:
7L.- ci_alpha
Numeric. Significance level for confidence intervals. Default:
0.05.- nfold
Integer. Number of folds for cross-validation for regression models. Default:
10L.- ...
Additional arguments (support
assay(character),verbose(logical) &seed(integer)).
Value
A named list containing:
scRNA_dataModified Seurat object with SCIPAC screening results added as metadata columns. Includes association statistics, p-values, and confidence intervals for each cell cluster.
pca_resPCA rotation results from
SCIPAC::sc.bulk.pca()cluster_resClustering results from
SCIPAC::seurat.ct()
Workflow
Gene overlap: Identifies common genes between scRNA-seq and bulk data
Preprocessing: Selects HVGs and preprocesses both datasets
PCA rotation: Aligns scRNA-seq and bulk data in shared PCA space
Clustering: Identifies cell populations using Seurat clustering
Association testing: Tests cluster-phenotype associations using elastic net regression with bootstrap validation
Result integration: Adds screening results to Seurat object metadata
Requirements
R package:
SCIPAC(install fromExceret/SCIPAC)Gene symbols must be consistent between scRNA-seq and bulk datasets
Sufficient overlapping genes (typically >100) for reliable rotation
Notes
For survival analysis,
phenotypemust be a data frame with"time"and"status"columnsBinary phenotypes are automatically converted to factors
Continuous phenotypes use Gaussian family in elastic net regression
Screening parameters are stored in
scRNA_data@misc$SCIPAC_paramsfor reproducibility
See also
SCIPAC, sc.bulk.pca,
seurat.ct, RegisterScreenMethod
Other screen_method:
DoDEGAS(),
DoLP_SGL(),
DoPIPET(),
DoSIDISH(),
DoScissor(),
DoscAB(),
DoscPAS(),
DoscPP()
Examples
if (FALSE) { # \dontrun{
# Example 1: Binary phenotype (case/control)
library(SCIPAC)
# Prepare data
bulk_expr <- matrix(rpois(1000, 10), nrow = 200, ncol = 5)
colnames(bulk_expr) <- paste0("Sample", 1:5)
rownames(bulk_expr) <- paste0("Gene", 1:200)
seurat_obj <- CreateSeuratObject(counts = matrix(rpois(2000, 5), nrow = 200, ncol = 10))
phenotype_binary <- c(Sample1 = 1, Sample2 = 0, Sample3 = 1, Sample4 = 0, Sample5 = 1)
# Run SCIPAC screening
result <- DoSCIPAC(
matched_bulk = bulk_expr,
sc_data = seurat_obj,
phenotype = phenotype_binary,
phenotype_class = "binary",
hvg = 100,
resolution = 1,
verbose = TRUE
)
# Access results
screened_seurat <- result$scRNA_data
head(screened_seurat[[]])
# Example 2: Survival analysis
survival_data <- data.frame(
time = c(12.5, 24.3, 18.7, 36.2, 30.1),
status = c(1, 0, 1, 1, 0),
row.names = paste0("Sample", 1:5)
)
result <- DoSCIPAC(
matched_bulk = bulk_expr,
sc_data = seurat_obj,
phenotype = survival_data,
phenotype_class = "survival",
ela_net_alpha = 0.5,
bt_size = 100,
ncore = 4
)
# Example 3: Continuous phenotype with custom parameters
continuous_pheno <- c(Sample1 = 25.3, Sample2 = 45.7, Sample3 = 38.2,
Sample4 = 52.1, Sample5 = 31.8)
result <- DoSCIPAC(
matched_bulk = bulk_expr,
sc_data = seurat_obj,
phenotype = continuous_pheno,
phenotype_class = "continuous",
hvg = 500,
n_pc = 30,
resolution = 2,
ela_net_alpha = 0.3,
ci_alpha = 0.01,
seed = 42
)
# View screening parameters used
result$scRNA_data@misc$SCIPAC_params
# Example 4: With batch correction
seurat_obj$batch_id <- sample(c("Batch1", "Batch2"), ncol(seurat_obj), replace = TRUE)
result <- DoSCIPAC(
matched_bulk = bulk_expr,
sc_data = seurat_obj,
phenotype = phenotype_binary,
phenotype_class = "binary",
sc_batch_col = "batch_id",
ncore = 8
)
} # }