Skip to contents

Classifies cells into binary phenotype groups ("Positive" vs "Other") based on prediction score differences between two phenotypic conditions. This enhanced version includes a minimum threshold constraint to ensure biological relevance and provides detailed reporting on classification outcomes.

Usage

LabelBinaryCells(
  pred_dt,
  pheno_colnames,
  select_fraction,
  test_method,
  min_threshold = 0.7,
  verbose = TRUE
)

Arguments

pred_dt

A data.table containing prediction scores for two phenotypic conditions. Must contain columns specified in pheno_colnames.

pheno_colnames

Character vector of length 2 specifying the column names for the two phenotypic conditions to compare. The second element is used as the reference group if not found with regex matching.

select_fraction

Numeric value between 0 and 1 specifying the fraction of cells to classify as "Positive". Default selection depends on the distribution characteristics.

test_method

Character string specifying the statistical test to use for normality assessment. One of: "jarque-bera", "d'agostino", "kolmogorov-smirnov".

min_threshold

Numeric value specifying the minimum score difference required for a cell to be considered "Positive". This ensures biological relevance by filtering out weak associations. Default: 0.7.

verbose

Logical, whether to print messages.

Value

The input pred_dt with three additional columns:

  • diff - Numeric vector of score differences between the two conditions

  • label - Character vector with cell classifications: "Positive" or "Other"

The function also provides detailed console output about the classification process and results.

Details

This function implements a sophisticated approach for binary cell classification that adapts to the underlying distribution of prediction score differences while enforcing a minimum threshold for biological significance:

Classification Strategies with Minimum Threshold:

  • Non-normal distributions (p-value < 0.05): Uses quantile-based selection where the top select_fraction of cells by score difference are classified as "Positive", but only if they exceed min_threshold

  • Normal distributions (p-value >= 0.05): Uses normal distribution quantiles to determine the classification threshold, adjusted upward if necessary to meet the minimum threshold requirement

Supported Normality Tests:

  • Jarque-Bera: Tests for skewness and kurtosis deviations from normality

  • D'Agostino: Extended normality test focusing on skewness

  • Kolmogorov-Smirnov: Non-parametric test comparing empirical distribution to normal distribution

Note

The minimum threshold parameter (min_threshold) helps prevent over- interpretation of weak phenotypic associations and ensures that classified cells show substantial differences between conditions. The function provides comprehensive feedback about threshold adjustments and final classification statistics.

Examples

if (FALSE) { # \dontrun{
# Create example prediction data
pred_data <- data.table(
  condition_A = runif(1000),
  condition_B = runif(1000)
)

# Classify cells using D'Agostino test with minimum threshold
result <- LabelBinaryCells(
  pred_dt = pred_data,
  pheno_colnames = c("condition_A", "condition_B"),
  select_fraction = 0.1,
  test_method = "d'agostino",
  min_threshold = 0.7
)

# Check classification results
table(result$label)

# View the actual fraction of positive cells
prop.table(table(result$label))
} # }