Skip to contents

This document introduces a guide for extending SigBridgeR with custom algorithms.

Installation

It’s recommended to install these packages for checking the code

pak::pkg_install(c(
  "tictoc",
  "yonicd/tidycheckUsage",
  "codetools",
  "knitr",
  "lintr"
))

Prepare a custom function

After the v3.2.0 update, SigBridgeR supports registering custom algorithms for screening phenotype-associated cell method into the package. Let’s do this with a detailed example:

my_screen_function <- function(
  matched_bulk,
  sc_data,
  phenotype,
  label_type = NULL,
  phenotype_class = c("binary", "survival", "continuous"),
  ...
) {
  dots <- list(...)
  verbose <- dots$verbose %||% TRUE
  # do something, here we just randomly assign a label to each cell
  modified_sc_data <- SeuratObject::AddMetaData(
    sc_data,
    c(
      rep("Positive", floor(ncol(sc_data) / 2)),
      rep("Negative", ceiling(ncol(sc_data) / 2))
    ),
    col.name = "my_method"
  ) %>%
    # record parameters
    AddMisc(
      my_method_label = label_type,
      phenotype = phenotype_class
    )

  intermediate_var <- "value"

  if (verbose) {
    cli::cli_alert_success("my_screen_function finished")
  }

  list(
    scRNA_data = modified_sc_data,
    intermediate_var = intermediate_var
  )
}

Format requirements for custom extension functions:

  1. The input arguments must include
    • sc_data (required): A Seurat object
    • matched_bulk (required): A data.frame/matrix/Matrix (genes × samples) containing RNA-seq counts:
      • Genes must overlap with those in sc_data.
      • Samples must correspond to those in phenotype.
    • phenotype (required):
      • For survival: a data.frame with columns time and status; rownames must match colnames(matched_bulk)
      • For binary or continuous: a named vector; names must match colnames(matched_bulk)
    • label_type (required): A character used to label cell with study cases.
    • phenotype_class (required): One or more of "binary", "survival", "continuous".
  2. The output must be a list containing at least one elements:
    • scRNA_data (required): A Seurat object with meta.data modified
    • Other elements are optional.

To facilitate format validation, we provide a function ValidateScreenFunc to check the above requirements. The validation output resembles that of rcmdcheck; please ensure there are no errors or warnings, and as few notes as possible.

ValidateScreenFunc(my_screen_function)
# ── Screening Function Validation ──────────────────────────────────────────────────────────────────────
# Start at 2026/01/19 22:04:17

# ✔ All input arguments explicitly specified

# ✔ Verbose control supported

# ✔ Syntax check passed

# ✔ Return value is a list with `scRNA_data` slot

# Duration: 0.095 sec elapsed

# 0 error ✔ | 0 warning ✔ | 0 note ✔

By the way if providing a bad function

bad_fun <- function(x) {
  z <- x + 1
  y
  return(NULL)
}

ValidateScreenFunc(bad_fun)
# ── Screening Function Validation ───────────────────────────────────────────────────────────────────
# Start at 2026/01/19 22:07:43

# ❯ Missing required arguments ... ERROR
#   More arguments should be added:
#   sc_data: A fully preprocessed Seurat object
#   matched_bulk: Bulk RNA-seq matrix (gene * samples):
#                 • Samples match `phenotype` in number/order;
#                 • Senes overlap with `sc_data`
#   phenotype: Phenotype: a named vector or data.frame, names/rownames match `matched_bulk`
#              • For binary/continuous: named vector recommended
#              • For survival: data.frame with `time` (1st col) and `status` (2nd col) recommended.
#   label_type: Labeling phenotype-associated cell with real study identifiers
#   phenotype_class: Phenotype types:
#                    • Must be one or more of `binary`, `continuous` and `survival`

# ❯ Verbose control not supported ... NOTE
#   Consider adding `verbose` control to ease error tracing

# ❯ Syntax error in function ... ERROR

# | line | object | col1 | col2 |   warning_type    |                     warning                     |
# |:----:|:------:|:----:|:----:|:-----------------:|:-----------------------------------------------:|
# |  2   |   z    |  3   |  3   |   unused_local    | local variable ‘z’ assigned but may not be used |
# |  3   |   y    |  3   |  3   | no_global_binding |   no visible binding for global variable ‘y’    |

# ❯ Return value is not a list ... ERROR
# scRNA_data: recommended to be the first element of the return value
#             • Should be of class <Seurat>

# Duration: 0.158 sec elapsed

# 3 errors ✖ | 0 warning ✔ | 1 note ✖

Registering the function

Now we can register the function to the package:

RegisterScreenMethod(
  my_method = my_screen_function,
  supported_phenotypes = c("binary", "survival"),
  parameter_mapper = function(params) {
    params$a <- params$a %||% 123
    params
  },
  registry = ScreenStrategy,
  verbose = TRUE
)
# ✔ Registered `my_method`

Details of the arguments:

  1. my_method = me_screen_function:
    • formatting as key = func. Key is used to name the function. If no name provided, the function name will be used.
  2. supported_phenotypes: The phenotype types supported by the function.
  3. parameter_mapper: A function that transforms the input parameter list before passing it to the executor. Useful for changing parameters from interface function. Receives a named list and must return a modified list.
  4. registry: The registry to register the function.
  5. verbose: Whether to print messages.

Let’s check whether it has indeed been registered

GetExistingStrategy()
# [1] "Scissor"   "scPP"      "LP_SGL"    "my_method" "PIPET"     "DEGAS"     "scAB"      "scPAS"

Use the function

Now we can use the function in the Screen function:

a_seurat <- SeuratObject::CreateSeuratObject(Matrix::Matrix(
  1:100,
  nrow = 10,
  dimnames = list(paste0("gene", 1:10), paste0("cell", 1:10))
))

bulk <- matrix(
  1:100,
  nrow = 10,
  dimnames = list(paste0("gene", 1:10), paste0("sample", 1:10))
)

pheno <- setNames(sample(0:1, 10, TRUE), paste0("sample", 1:10))

my_res <- Screen(
  sc_data = a_seurat,
  matched_bulk = bulk,
  phenotype = pheno,
  label_type = "Test",
  phenotype_class = "binary",
  screen_method = "my_method",
  a = 123
)
# ✔ my_screen_function finished
my_res$scRNA_data |> class()
# [1] "Seurat"
# attr(,"package")
# [1] "SeuratObject"

If you still have questions, please use GitHub Issues or Discussions.