Skip to contents

Performs bootstrap aggregated training of multiple CCMTL models to enhance robustness and reduce variance in predictions. This function trains an ensemble of models with different random seeds and aggregates the results.

Usage

runCCMTLBag.optimized(
  scExp,
  scLab,
  patExp,
  patLab,
  tmpDir,
  model_type,
  architecture,
  FFdepth,
  Bagdepth,
  DEGAS.seed,
  verbose = TRUE
)

Arguments

scExp

A matrix or data frame containing single-cell expression data for model training.

scLab

A matrix containing single-cell labels corresponding to the expression data.

patExp

A matrix or data frame containing patient-level expression data for multi-task learning.

patLab

A matrix containing patient-level labels corresponding to the patient expression data.

tmpDir

Character string specifying the temporary directory path for storing intermediate files and model outputs.

model_type

Character string specifying the type of model to train. Should match available DEGAS model types.

architecture

Character string specifying the neural network architecture. One of: "DenseNet", "Standard".

FFdepth

Integer specifying the number of layers in the feed-forward network architecture.

Bagdepth

Integer specifying the number of bootstrap models to train in the ensemble.

DEGAS.seed

Integer specifying the base random seed for reproducible model training. Each model in the ensemble uses a derived seed.

verbose

Logical, whether to print messages.

Value

Returns a list of trained CCMTL model objects from the bootstrap aggregation process. The list contains successful model results with proper error handling for failed training attempts.

Details

This function implements bootstrap aggregated training (bagging) for CCMTL models with the following features:

Ensemble Training:

  • Trains multiple models with different random seeds derived from the base seed

  • Uses parallel-safe file management to avoid I/O conflicts

  • Implements comprehensive error handling to continue training even if individual models fail

Error Handling:

  • Continues training even if individual models fail

  • Returns only successfully trained models

  • Provides progress feedback for long-running ensemble training

Note

The bootstrap aggregation process can be computationally intensive, especially for large datasets or deep architectures. The function creates derived seeds for each model (base seed + model index) to ensure reproducibility while maintaining diversity in the ensemble.

References

Johnson TS, Yu CY, Huang Z, Xu S, Wang T, Dong C, et al. Diagnostic Evidence GAuge of Single cells (DEGAS): a flexible deep transfer learning framework for prioritizing cells in relation to disease. Genome Med. 2022 Feb 1;14(1):11.

Examples

if (FALSE) { # \dontrun{
# Train an ensemble of 10 CCMTL models
ensemble_models <- runCCMTLBag.optimized(
  scExp = sc_expression,
  scLab = sc_labels,
  patExp = patient_expression,
  patLab = patient_labels,
  tmpDir = "/tmp/degas_models",
  model_type = "classification",
  architecture = "DenseNet",
  FFdepth = 3,
  Bagdepth = 10,
  DEGAS.seed = 42
)

# Access individual models from the ensemble
first_model <- ensemble_models[[1]]
} # }