Label Binary Phenotype Cells Based on Prediction Scores with Minimum Threshold
Source:R/25-DEGAS_Screen.R
LabelBinaryCells.RdClassifies cells into binary phenotype groups ("Positive" vs "Other") based on prediction score differences between two phenotypic conditions. This enhanced version includes a minimum threshold constraint to ensure biological relevance and provides detailed reporting on classification outcomes.
Usage
LabelBinaryCells(
pred_dt,
pheno_colnames,
select_fraction,
test_method,
min_threshold = 0.7,
verbose = TRUE
)Arguments
- pred_dt
A data.table containing prediction scores for two phenotypic conditions. Must contain columns specified in
pheno_colnames.- pheno_colnames
Character vector of length 2 specifying the column names for the two phenotypic conditions to compare. The second element is used as the reference group if not found with regex matching.
- select_fraction
Numeric value between 0 and 1 specifying the fraction of cells to classify as "Positive". Default selection depends on the distribution characteristics.
- test_method
Character string specifying the statistical test to use for normality assessment. One of:
"jarque-bera","d'agostino","kolmogorov-smirnov".- min_threshold
Numeric value specifying the minimum score difference required for a cell to be considered "Positive". This ensures biological relevance by filtering out weak associations. Default: 0.7.
- verbose
Logical, whether to print messages.
Value
The input pred_dt with three additional columns:
diff- Numeric vector of score differences between the two conditionslabel- Character vector with cell classifications: "Positive" or "Other"
The function also provides detailed console output about the classification process and results.
Details
This function implements a sophisticated approach for binary cell classification that adapts to the underlying distribution of prediction score differences while enforcing a minimum threshold for biological significance:
Classification Strategies with Minimum Threshold:
Non-normal distributions (p-value < 0.05): Uses quantile-based selection where the top
select_fractionof cells by score difference are classified as "Positive", but only if they exceedmin_thresholdNormal distributions (p-value >= 0.05): Uses normal distribution quantiles to determine the classification threshold, adjusted upward if necessary to meet the minimum threshold requirement
Note
The minimum threshold parameter (min_threshold) helps prevent over-
interpretation of weak phenotypic associations and ensures that classified
cells show substantial differences between conditions. The function provides
comprehensive feedback about threshold adjustments and final classification
statistics.
See also
jb.test.modified(), dagostino.test(), ks.test() for the underlying
normality tests used in the classification process.
Other DEGAS:
DoDEGAS(),
LabelContinuousCells(),
LabelSurvivalCells(),
Vec2sparse(),
predClassBag.optimized(),
readOutputFiles.optimized(),
runCCMTL.optimized(),
runCCMTLBag.optimized(),
writeInputFiles.optimized()
Examples
if (FALSE) { # \dontrun{
# Create example prediction data
pred_data <- data.table(
condition_A = runif(1000),
condition_B = runif(1000)
)
# Classify cells using D'Agostino test with minimum threshold
result <- LabelBinaryCells(
pred_dt = pred_data,
pheno_colnames = c("condition_A", "condition_B"),
select_fraction = 0.1,
test_method = "d'agostino",
min_threshold = 0.7
)
# Check classification results
table(result$label)
# View the actual fraction of positive cells
prop.table(table(result$label))
} # }