Identifies phenotype-associated cell subpopulations in single-cell data using regularized regression on matched bulk expression profiles. Scissor integrates bulk and single-cell RNA-seq data to identify cells that are significantly associated with phenotypic outcomes.
Usage
DoScissor(
matched_bulk,
sc_data,
phenotype,
label_type = "scissor",
alpha = c(0.05, NULL),
cutoff = 0.2,
family = c("gaussian", "binomial", "cox"),
reliability_test = list(
run = FALSE, # whether to run reliability test
n = 10L, # permutation times
nfold = 10L # cross validation folds
),
cell_evaluation = list(
run = FALSE, # whether to run cell evaluation
benchmark_data = "path_to_file.RData", # path to benchmark data
FDR_cutoff = 0.05,
bootstrap_n = 100L
),
path2load_scissor_cache = NULL,
path2save_scissor_inputs = "Scissor_inputs.RData",
...
)Arguments
- matched_bulk
Normalized bulk expression matrix (features × samples). Column names must match
phenotypeidentifiers.- sc_data
Seurat object containing single-cell RNA-seq data.
- phenotype
Clinical outcome data. Can be: - Vector: named with sample IDs - Data frame: with row names matching bulk columns
- label_type
Character specifying phenotype label type (e.g., "SBS1", "time"), stored in
scRNA_data@misc- alpha
Parameter used to balance the effect of the l1 norm and the network-based penalties. It can be a number or a searching vector. If alpha = NULL, a default searching vector is used. The range of alpha is between 0 and 1. A larger alpha lays more emphasis on the l1 norm.
- cutoff
(default:
0.2). Whenalpha=NULL, the cutoff is used to determine the optimal alpha. Higher values increase specificity.- family
Model family for outcome type: - "gaussian": Continuous outcomes - "binomial": Binary outcomes (default) - "cox": Survival outcomes
- reliability_test
List controlling reliability testing:
run: Whether to perform reliability test (default: FALSE)
n: Permutation times (default:
10L)nfold: Cross-validation folds (default:
10L)
- cell_evaluation
List controlling cell evaluation:
run: Whether to perform cell evaluation (default: FALSE)
benchmark_data: Path to benchmark data (RData file)
FDR_cutoff: FDR threshold for evaluation (default:
0.05)bootstrap_n: Bootstrap iterations (default:
100L)
- path2load_scissor_cache
Path to precomputed Scissor inputs (RData file). If provided, skips recomputation (default: NULL).
- path2save_scissor_inputs
Path to save intermediate files (default: "Scissor_inputs.RData").
- ...
Additional arguments. Currently supports:
verbose: Logical indicating whether to print progress messages. Defaults toTRUE.seed: For reproducibility, default is123Lassay: Assay to use for single-cell data. Defaults to `"RNA"
Value
A list containing:
- scRNA_data
A Seurat object with screened cells containing metadata:
- scissor
"Positive"/"Negative"/"Neutral" classification
- label_type
Outcome label used
- scissor_result
Raw Scissor results
- reliability_result
If reliability_test=TRUE, contains:
- statistic
A value between 0 and 1
- p
p-value of the test statistic
- AUC_test_real
10 values of AUC for real data
- AUC_test_back
A list of AUC for background data
- cell_evaluation
If cell_evaluation=TRUE, contains:
- evaluation_res
A data.frame with some supporting information for each Scissor selected cell
LICENSE
Licensed under the GNU General Public License version 3 (GPL-3.0). A copy of the license is available at https://www.gnu.org/licenses/gpl-3.0.en.html.
References
Sun D, Guan X, Moran AE, Wu LY, Qian DZ, Schedin P, et al. Identifying phenotype-associated subpopulations by integrating bulk and single-cell sequencing data. Nat Biotechnol. 2022 Apr;40(4):527–38.
See also
Other screen_method:
DoDEGAS(),
DoLP_SGL(),
DoPIPET(),
DoSCIPAC(),
DoSIDISH(),
DoscAB(),
DoscPAS(),
DoscPP()
Other scissor:
DoScissorCellEval(),
DoScissorRelTest()