Integrates matched bulk expression data and phenotype information to identify phenotype-associated cell populations in single-cell RNA-seq data using one of four computational methods. Ensures consistency between bulk and phenotype data before analysis.
Arguments
- matched_bulk
Matrix or data frame of preprocessed bulk RNA-seq expression data (genes x samples). Column names must match names/IDs in
phenotype.- sc_data
A Seurat object containing scRNA-seq data to be screened.
- phenotype
Phenotype data, either: - Named vector (names match
matched_bulkcolumns), or - Data frame with row names matchingmatched_bulkcolumns- label_type
Character specifying phenotype label type (e.g., "SBS1", "time")
- phenotype_class
Type of phenotypic outcome (must be consistent with input data): -
"binary": Binary traits (e.g., case/control) -"continuous": Continuous measurements -"survival": Survival objects- screen_method
Screening algorithm to use, there are four options: -
"Scissor": see alsoDoScissor()-"scPP": see alsoDoscPP()-"scPAS": see alsoDoscPAS()-"scAB": see alsoDoscAB(), no continuous support -"DEGAS": see alsoDoDEGAS()- ...
Additional method-specific parameters:
- Scissor
- alpha
(numeric or NULL) Significance threshold. When NULL, alpha will keep increasing iteratively until the corresponding cells are screened out, default 0.05
- cutoff
(numeric) A threshold for terminating the iteration of alpha, only work when
alphais NULL, default 0.2- path2load_scissor_cache
(character) default
NULL- path2save_scissor_inputs
(character) A path to save the intermediary data. By using
path2load_scissor_cache, the intermediary data can be loaded from the specified path. default"Scissor_inputs.RData"- reliability_test
(logical) Whether to perform reliability test, default FALSE
- reliability_test.nfold
(integer) Cross-validation folds for reliability test, default 10
- reliability_test.n
(integer) Number of cells to use for reliability test, default 10
- cell_evaluation
(logical) Whether to perform cell evaluation, default FALSE
- cell_evaluation.benchmark_data
.RData Benchmark data for cell evaluation, default NULL
- cell_evaluation.FDR
(numeric) FDR threshold for cell evaluation, default 0.05
- cell_evaluation.bootstrap_n
(integer) Number of bootstrap samples for cell evaluation, default 10
- scPP
- ref_group
(integer or character) Reference group or baseline for binary comparisons, e.g. "Normal" for Tumor/Normal studies and 0 for 0/1 case-control studies. default: 0
- Log2FC_cutoff
(numeric) Minimum log2 fold-change for binary markers, default 0.585
- estimate_cutoff
(numeric) Effect size threshold for continuous traits, default 0.2
- probs
(numeric) Quantile cutoff for cell classification, default 0.2
- scPAS
- assay
(character) Assay to use from sc_data, default "RNA"
- imputation
(logical) Whether to perform imputation, default FALSE
- nfeature
(integer) Number of features to select, default 3000
- alpha
(numeric or NULL) Significance threshold, When NULL, alpha will keep increasing iteratively until the corresponding cells are screened out, default 0.01
- independent
(logical) The background distribution of risk scores is constructed independently of each cell. default: TRUE
- network_class
(character) Network class to use. default: 'SC', indicating gene-gene similarity networks derived from single-cell data. The other one is 'bulk'.
- permutation_times
(integer) Number of permutations, default 2000
- FDR_threshold
(numeric) FDR value threshold for identifying phenotype-associated cells default 0.05
- scAB
- alpha
(numeric) Coefficient of phenotype regularization ,default 0.005
- alpha_2
(numeric) Coefficent of cell-cell similarity regularization, default 5e-05
- maxiter
(integer) NMF optimization iterations, default 2000
- tred
(integer) Z-score threshold, default 2
- DEGAS
- sc_data.pheno_colname
(character) Phenotype column name in sc_data, default "NULL"
- select_fraction
(numeric) Fraction of cells to select for DEGAS, default 0.05
- tmp_dir
(character) Temporary directory for DEGAS, default "NULL"
- env_params
(list) Environment parameters for DEGAS, default "list()"
- degas_params
(list) DEGAS parameters, default "list()"
- normality_test_method
(character) Normality test method for DEGAS, default "jarque-bera"
Value
A list containing:
- scRNA_data
Filtered Seurat object with phenotype-associated cells
- Some screen_result
Important information about the screened result related to the selected method
Data Matching Requirements
matched_bulkcolumn names andphenotypenames/rownames must be identicalPhenotype values must correspond to bulk samples (not directly to single cells)
Mismatches will trigger an error before analysis begins, and there is a built-in pre-run check.
Method Compatibility
| Method | Supported Phenotypes | Additional Parameters |
Scissor | All three types | alpha, cutoff, path2load_scissor_cache, path2save_scissor_inputs, reliability_test, reliability_test.n,reliability_test.nfold, cell_evaluation,cell_evaluation.benchmark_data,cell_evaluation.FDR,cell_evaluation.bootstrap_n |
scPP | All three types | ref_group, Log2FC_cutoff, estimate_cutoff, probs |
scPAS | All three types | n_components ,assay, imputation,nfeature, alpha,network_class,permutation_times,FDR_threshold,independent |
scAB | Binary/Survival | alpha, alpha_2, maxiter, tred |
DEGAS | All three types | sc_data.pheno_colname,select_fraction,tmp_dir,env_params,degas_params,normality_test_method |