Optimized Bootstrap Aggregation for Cross-Condition Multi-Task Learning
Source:R/25-DEGAS_Screen.R
runCCMTLBag.optimized.RdPerforms bootstrap aggregated training of multiple CCMTL models to enhance robustness and reduce variance in predictions. This function trains an ensemble of models with different random seeds and aggregates the results.
Usage
runCCMTLBag.optimized(
scExp,
scLab,
patExp,
patLab,
tmpDir,
model_type,
architecture,
FFdepth,
Bagdepth,
DEGAS.seed,
verbose = TRUE
)Arguments
- scExp
A matrix or data frame containing single-cell expression data for model training.
- scLab
A matrix containing single-cell labels corresponding to the expression data.
- patExp
A matrix or data frame containing patient-level expression data for multi-task learning.
- patLab
A matrix containing patient-level labels corresponding to the patient expression data.
- tmpDir
Character string specifying the temporary directory path for storing intermediate files and model outputs.
- model_type
Character string specifying the type of model to train. Should match available DEGAS model types.
- architecture
Character string specifying the neural network architecture. One of: "DenseNet", "Standard".
- FFdepth
Integer specifying the number of layers in the feed-forward network architecture.
- Bagdepth
Integer specifying the number of bootstrap models to train in the ensemble.
- DEGAS.seed
Integer specifying the base random seed for reproducible model training. Each model in the ensemble uses a derived seed.
- verbose
Logical, whether to print messages.
Value
Returns a list of trained CCMTL model objects from the bootstrap aggregation process. The list contains successful model results with proper error handling for failed training attempts.
Details
This function implements bootstrap aggregated training (bagging) for CCMTL models with the following features:
Note
The bootstrap aggregation process can be computationally intensive, especially for large datasets or deep architectures. The function creates derived seeds for each model (base seed + model index) to ensure reproducibility while maintaining diversity in the ensemble.
References
Johnson TS, Yu CY, Huang Z, Xu S, Wang T, Dong C, et al. Diagnostic Evidence GAuge of Single cells (DEGAS): a flexible deep transfer learning framework for prioritizing cells in relation to disease. Genome Med. 2022 Feb 1;14(1):11.
See also
runCCMTL.optimized for single model training,
purrr::map() for the iterative execution pattern.
Other DEGAS:
DoDEGAS(),
LabelBinaryCells(),
LabelContinuousCells(),
LabelSurvivalCells(),
Vec2sparse(),
predClassBag.optimized(),
readOutputFiles.optimized(),
runCCMTL.optimized(),
writeInputFiles.optimized()
Examples
if (FALSE) { # \dontrun{
# Train an ensemble of 10 CCMTL models
ensemble_models <- runCCMTLBag.optimized(
scExp = sc_expression,
scLab = sc_labels,
patExp = patient_expression,
patLab = patient_labels,
tmpDir = "/tmp/degas_models",
model_type = "classification",
architecture = "DenseNet",
FFdepth = 3,
Bagdepth = 10,
DEGAS.seed = 42
)
# Access individual models from the ensemble
first_model <- ensemble_models[[1]]
} # }