Run DEGAS Analysis for Single-Cell and Bulk RNA-seq Data Integration
Source:R/25-DEGAS_Screen.R
DoDEGAS.RdThis function performs DEGAS to integrate single-cell and bulk RNA-seq data, identifying phenotype-associated cells using a bootstrap aggregated multi-task learning approach.
Usage
DoDEGAS(
select_fraction = 0.05,
min_thresh = 0.4,
matched_bulk,
sc_data,
phenotype = NULL,
sc_data.pheno_colname = NULL,
label_type = "DEGAS",
phenotype_class = c("binary", "continuous", "survival"),
tmp_dir = "tmp",
env_params = list(),
degas_params = list(),
normality_test_method = c("jarque-bera", "d'agostino", "kolmogorov-smirnov"),
verbose = TRUE,
...
)Arguments
- select_fraction
The top percentage of selected cells will be considered as Positive cells, without considering how much larger the possible correlation coefficient of the observation group is compared to that of the control group. Only usedl when
phenotype_classis "binary" or "survival". (default: 0.05)- min_thresh
DEGAS will calculate the possible correlation coefficients for each cell related to the phenotype. When the coefficient of the observation group is at least
min_threshlarger than that of the control group, it can be considered related to the phenotype and will be marked as Positive. The priority ofmin_threshis higher than that ofselect_fraction.(default: 0.4)- matched_bulk
Bulk RNA-seq data as matrix or data.frame (rows=genes, columns=samples)
- sc_data
Single-cell data as Seurat object containing RNA assay
- phenotype
Bulk-level phenotype data. For classification: binary matrix with one-hot encoding. For survival: matrix with two columns (time and event status). Can be NULL, matrix, data.frame, or vector.
- sc_data.pheno_colname
Column name for single-cell phenotype in metadata (if available)
- label_type
Label type for DEGAS results (default: "DEGAS")
- phenotype_class
Type of phenotype: "binary" (classification), "continuous", or "survival"
- tmp_dir
Temporary directory for intermediate files (default: "tmp")
- env_params
List of environment parameters for Python setup including:
env.name: environment name (default: "r-reticulate-degas")
env.type: environment type "conda", "environment", or "venv" (default: "environment")
env.method: environment setup method "system", "conda" (default: "system")
env.file: path to environment file (default: system.file("conda/DEGAS_environment.yml", package = "SigBridgeR"))
env.python_version: Python version (default: "3.9.15")
env.packages: named vector of Python packages and versions (default: c("tensorflow" = "2.4.1", "protobuf" = "3.20" ,"numpy" = "any"))
env.recreate: whether to recreate environment (default: FALSE)
env.use_conda_forge: whether to use conda-forge channel (conda only, default: TRUE)
env.verbose: verbose output (default: FALSE)
- degas_params
List of DEGAS algorithm parameters including:
DEGAS.model_type: model type ("BlankClass", "ClassBlank", "ClassClass", "ClassCox", "BlankCox")
DEGAS.architecture: "Standard" (feed forward) or "DenseNet" (dense net)
DEGAS.ff_depth: number of layers in model (>=1, default: 3)
DEGAS.pyloc: path to Python executable (default: NULL, automatic detection)
DEGAS.bag_depth: bootstrap aggregation depth (>=1, default: 5)
DEGAS.train_steps: training steps (default: 2000)
DEGAS.scbatch_sz: single-cell batch size (default: 200)
DEGAS.patbatch_sz: patient batch size (default: 50)
DEGAS.hidden_feats: hidden features (default: 50)
DEGAS.do_prc: dropout percentage (default: 0.5)
DEGAS.lambda1: regularization parameter 1 (default: 3.0)
DEGAS.lambda2: regularization parameter 2 (default: 3.0)
DEGAS.lambda3: regularization parameter 3 (default: 3.0)
DEGAS.seed: random seed (default: 2)
- normality_test_method
Method for normality testing: "jarque-bera", "d'agostino", or "kolmogorov-smirnov"
- verbose
Logical, whether to print messages.
- ...
for future compatibility
Value
A list containing:
scRNA_data: Seurat object with DEGAS labels added to metadata
model: The model trained using the input data, andit can be used for cell classification prediction.
DEGAS_prediction: Data table with DEGAS predictions containing:
Predicted label probabilities for each cell
Cell labels ("Positive"/"Other") based on selection criteria
Difference scores for binary phenotypes
Cell identifiers
Details
The function performs the following steps:
Validates input data and parameters
Sets up Python environment with required dependencies
Trains bootstrap aggregated DEGAS model using
runCCMTLBagGenerates cell-level predictions using
predClassBagApplies statistical testing to identify phenotype-associated cells
Labels cells as "Positive" or "Other" based on selection criteria
Model type is automatically determined:
BlankClass: only bulk phenotype specified (scLab = NULL)
ClassBlank: only single-cell phenotype specified (patLab = NULL)
ClassClass: both single-cell and bulk phenotypes specified
ClassCox: single-cell phenotype + bulk survival data
BlankCox: only bulk survival data specified
References
Johnson TS, Yu CY, Huang Z, Xu S, Wang T, Dong C, et al. Diagnostic Evidence GAuge of Single cells (DEGAS): a flexible deep transfer learning framework for prioritizing cells in relation to disease. Genome Med. 2022 Feb 1;14(1):11.
See also
Vec2sparse for the structure transformation of phenotype
jb.test.modified for modified Jarque-Bera test
mad.test for outlier detection using Median Absolute Deviation
runCCMTLBag.optimized for DEGAS model training
predClassBag.optimized for DEGAS model prediction
LabelBinaryCells for binary classification
LabelSurvivalCells for survival classification
LabelContinuousCells for continuous classification
Other screen_method:
DoScissor(),
DoscAB(),
DoscPAS(),
DoscPP()
Other DEGAS:
LabelBinaryCells(),
LabelContinuousCells(),
LabelSurvivalCells(),
Vec2sparse(),
predClassBag.optimized(),
readOutputFiles.optimized(),
runCCMTL.optimized(),
runCCMTLBag.optimized(),
writeInputFiles.optimized()
Examples
if (FALSE) { # \dontrun{
# Binary classification example
result <- DoDEGAS(
select_fraction = 0.05, # `select_fraction` only used in binary and survival phenotyping
matched_bulk = bulk_matrix,
sc_data = seurat_obj,
phenotype = bulk_phenotype,
phenotype_class = "binary"
)
# Survival analysis example
result <- DoDEGAS(
select_fraction = 0.05, # `select_fraction` only used in binary and survival phenotyping
matched_bulk = bulk_matrix,
sc_data = seurat_obj,
phenotype = survival_data,
phenotype_class = "survival"
)
} # }