Label Survival-Associated Phenotype Cells Based on Hazard Scores
Source:R/25-DEGAS_Screen.R
LabelSurvivalCells.RdClassifies cells into survival-associated phenotype groups ("Positive" vs "Other") based on hazard scores using statistical distribution analysis. This function identifies cells with significantly elevated hazard scores that may be associated with survival outcomes, employing adaptive thresholding based on distribution characteristics.
Usage
LabelSurvivalCells(
pred_dt,
select_fraction,
test_method,
min_threshold = 0.7,
verbose = TRUE
)Arguments
- pred_dt
A data.table containing hazard scores for cells. Must contain a column named 'Hazard' with numeric hazard scores.
- select_fraction
Numeric value between 0 and 1 specifying the target fraction of cells to classify as "Positive". The actual fraction may be adjusted based on distribution characteristics and minimum threshold constraints.
- test_method
Character string specifying the statistical test to use for normality assessment of hazard scores. One of:
"jarque-bera","d'agostino","kolmogorov-smirnov".- verbose
Logical, whether to print messages.
Value
The input pred_dt with an additional column:
label- Character vector with cell classifications: "Positive" (high hazard cells) or "Other"
Details
Classification Strategies:
Non-normal distributions (p-value < 0.05): Uses quantile-based selection where the top
select_fractionof cells by hazard score are classified as "Positive", with minimum threshold constraintsNormal distributions (p-value ≥ 0.05): Uses normal distribution quantiles to determine the classification threshold, adjusted to meet minimum requirements
Note
The function assumes the input data.table contains a column named 'Hazard' with numeric values representing hazard scores from upstream analysis. The minimum threshold is internally defined to ensure biological relevance of the identified cell populations.
See also
jb.test.modified(), dagostino.test(), ks.test() for the underlying
normality tests used in the classification process.
Other DEGAS:
DoDEGAS(),
LabelBinaryCells(),
LabelContinuousCells(),
Vec2sparse(),
predClassBag.optimized(),
readOutputFiles.optimized(),
runCCMTL.optimized(),
runCCMTLBag.optimized(),
writeInputFiles.optimized()
Examples
if (FALSE) { # \dontrun{
# Create example hazard score data
hazard_data <- data.table(
cell_id = paste0("cell_", 1:1000),
Hazard = rexp(1000, rate = 2) # Simulated hazard scores
)
# Identify survival-associated cells
result <- LabelSurvivalCells(
pred_dt = hazard_data,
select_fraction = 0.1,
test_method = "jarque-bera"
)
# Check classification results
table(result$label)
# Analyze the hazard scores of positive cells
summary(result[label == "Positive", Hazard])
} # }