This document introduces a guide for extending SigBridgeR with custom algorithms.
Too Long; Didn’t Read
(auto dispatch)
Extend SCPreProcess: key = func
Register(h = Seurat::RunHarmony)Extend SCAnnotate: key = func
Register(my = my_function)Extend Screen: key = func
TemplateScreenFunc() # create a file
ValidateScreenFunc(my_function)
Register(my = my_function)Extend Screening Methods
Installation
It’s recommended to install these packages for checking the code
pak::pkg_install(c(
"tictoc",
"yonicd/tidycheckUsage",
"codetools",
"knitr",
"lintr"
))Prepare a custom function
After the v3.2.0 update, SigBridgeR supports registering custom algorithms for screening phenotype-associated cell method into the package. Let’s do this with a detailed example:
(Template can be obtained via TemplateScreenFunc, which
will create a file and open it in the editor.)
my_screen_function <- function(
matched_bulk,
sc_data,
phenotype,
label_type = NULL,
phenotype_class = c("binary", "survival", "continuous"),
...
) {
dots <- list(...)
verbose <- dots$verbose %||% TRUE
# do something, here we just randomly assign a label to each cell
modified_sc_data <- SeuratObject::AddMetaData(
sc_data,
c(
rep("Positive", floor(ncol(sc_data) / 2)),
rep("Negative", ceiling(ncol(sc_data) / 2))
),
col.name = "my_method"
) %>%
# record parameters
AddMisc(
my_method_label = label_type,
phenotype = phenotype_class
)
intermediate_var <- "value"
if (verbose) {
cli::cli_alert_success("my_screen_function finished")
}
list(
scRNA_data = modified_sc_data,
intermediate_var = intermediate_var
)
}Format requirements for custom extension functions:
- The input arguments must include
-
sc_data(required): A Seurat object -
matched_bulk(required): A data.frame/matrix/Matrix (genes × samples) containing RNA-seq counts:- Genes must overlap with those in
sc_data. - Samples must correspond to those in
phenotype.
- Genes must overlap with those in
-
phenotype(required):- For survival: a data.frame with columns
timeandstatus; rownames must matchcolnames(matched_bulk) - For binary or continuous: a named vector; names
must match
colnames(matched_bulk)
- For survival: a data.frame with columns
-
label_type(required): A character used to label cell with study cases. -
phenotype_class(required): One or more of"binary","survival","continuous".
-
- The output must be a list containing at least one
elements:
-
scRNA_data(required): A Seurat object with meta.data modified,- For columns added in the meta.data slot, it is recommended to use the method name as a prefix to distinguish them from those added by other methods.
- For assigning labels to each cell, if three categories are used, the
labels should be
"Positive","Negative", and"Neutral", where"Positive"represents cells positively associated with the phenotype,"Negative"represents those negatively associated, and"Neutral"represents unrelated cells. If two categories are used, the labels should be"Positive"and"Other", where"Other"represents cells not positively associated with the phenotype. - The recommended format for storing
label_typeis{method}_type, which is used to record the biological context at the time of screening, e.g.,scissor_type = "relapse". This is to prevent parameter confusion in case the same algorithm is run with different labels. - If you need to store the parameters used for running the algorithm,
use the
Seurat@miscslot and store them as a list with_parasuffix name , e.g.,scissor_para = list(alpha = 0.05, cutoff = 0.2).
It is recommended to use AddMisc as
seurat <- seurat %>% SeuratObject::AddMetaData(rep("test", ncol(seurat)), col.name = "scissor") %>% AddMisc( scissor_type = "relapse", scissor_para = list(alpha = 0.05, cutoff = 0.2), cover = FALSE ) - Other elements are optional. Necessary intermediate data can be returned.
-
- If it is necessary to save intermediate data and results as files,
it is recommended to create a folder named
<method>_resfor storage.
To facilitate format validation, we provide a function
ValidateScreenFunc to check the above requirements. The
validation output resembles that of rcmdcheck; please
ensure there are no errors or warnings, and as few notes as
possible.
ValidateScreenFunc(my_screen_function)
# ── Screening Function Validation ──────────────────────────────────────────────────────────────────────
# Start at 2026/01/19 22:04:17
# ✔ All input arguments explicitly specified
# ✔ Verbose control supported
# ✔ Syntax check passed
# ✔ Return value is a list with `scRNA_data` slot
# Duration: 0.095 sec elapsed
# 0 error ✔ | 0 warning ✔ | 0 note ✔By the way if providing a bad function
bad_fun <- function(x) {
z <- x + 1
y
return(NULL)
}
ValidateScreenFunc(bad_fun)
# ── Screening Function Validation ───────────────────────────────────────────────────────────────────
# Start at 2026/01/19 22:07:43
# ❯ Missing required arguments ... ERROR
# More arguments should be added:
# sc_data: A fully preprocessed Seurat object
# matched_bulk: Bulk RNA-seq matrix (gene * samples):
# • Samples match `phenotype` in number/order;
# • Senes overlap with `sc_data`
# phenotype: Phenotype: a named vector or data.frame, names/rownames match `matched_bulk`
# • For binary/continuous: named vector recommended
# • For survival: data.frame with `time` (1st col) and `status` (2nd col) recommended.
# label_type: Labeling phenotype-associated cell with real study identifiers
# phenotype_class: Phenotype types:
# • Must be one or more of `binary`, `continuous` and `survival`
# ❯ Verbose control not supported ... NOTE
# Consider adding `verbose` control to ease error tracing
# ❯ Syntax error in function ... ERROR
# | line | object | col1 | col2 | warning_type | warning |
# |:----:|:------:|:----:|:----:|:-----------------:|:-----------------------------------------------:|
# | 2 | z | 3 | 3 | unused_local | local variable ‘z’ assigned but may not be used |
# | 3 | y | 3 | 3 | no_global_binding | no visible binding for global variable ‘y’ |
# ❯ Return value is not a list ... ERROR
# scRNA_data: recommended to be the first element of the return value
# • Should be of class <Seurat>
# Duration: 0.158 sec elapsed
# 3 errors ✖ | 0 warning ✔ | 1 note ✖Registering the function
Now we can register the function to the package:
RegisterScreenMethod(
my_method = my_screen_function,
supported_phenotypes = c("binary", "survival"),
parameter_mapper = function(params) {
params$a <- params$a %||% 123
params
},
registry = ScreenStrategy,
verbose = TRUE
)
# ✔ Registered `my_method`Details of the arguments:
-
my_method = me_screen_function:- formatting as key = func. Key is used to name the function. If no name provided, the function name will be used.
-
supported_phenotypes: The phenotype types supported by the function. -
parameter_mapper: A function that transforms the input parameter list before passing it to the executor. Useful for changing parameters from interface function. Receives a named list and must return a modified list. -
registry: The registry to register the function. -
verbose: Whether to print messages.
Let’s check whether it has indeed been registered
tbl <- InterceptStrategy("ScreenStrategy")
# A tibble: 22 × 4
names(ScreenStrategy)
# [1] "Scissor" "scPP" "LP_SGL" "my_method" "PIPET" "DEGAS" "scAB" "scPAS"Use the function
Now we can use the function in the Screen function:
a_seurat <- SeuratObject::CreateSeuratObject(Matrix::Matrix(
1:100,
nrow = 10,
dimnames = list(paste0("gene", 1:10), paste0("cell", 1:10))
))
bulk <- matrix(
1:100,
nrow = 10,
dimnames = list(paste0("gene", 1:10), paste0("sample", 1:10))
)
pheno <- setNames(sample(0:1, 10, TRUE), paste0("sample", 1:10))
my_res <- Screen(
sc_data = a_seurat,
matched_bulk = bulk,
phenotype = pheno,
label_type = "Test",
phenotype_class = "binary",
screen_method = "my_method",
a = 123
)
# ✔ my_screen_function finished
my_res$scRNA_data |> class()
# [1] "Seurat"
# attr(,"package")
# [1] "SeuratObject"If you still have questions, please use GitHub Issues or Discussions.
Extend Seurat Methods
In general, any Seurat function can be registered for use within
SCPreProcess. However, due to the character-length
limitation of the pipeline code string, at most 52
functions can be registered—one for each uppercase and lowercase letter
(A–Z, a–z).
By default, the registry is defined as follows:
Pipeline Code Table:
| Code | Function | Description |
|---|---|---|
| o | CreateSeuratObject |
Required. Must be the first step. |
| n | NormalizeData |
Standard normalization. |
| s | ScaleData |
Scales data for PCA. |
| v | FindVariableFeatures |
Selects highly variable genes. |
| p | RunPCA |
Principal Component Analysis. |
| e | FindNeighbors |
Computes SNN graph. |
| c | FindClusters |
Louvain algorithm clustering. |
| t | RunTSNE |
t-SNE reduction. |
| u | RunUMAP |
UMAP reduction. |
| r | SCTransform |
SCT workflow. Replaces n, s, v. |
For example, if we need to use Seurat::LoadXenium to
load Xenium data for screening, register this function first.
RegisterSeuratMethod(x = Seurat::LoadXenium)
# ✔ Registered xTo check:
names(SCPreProcessStrategy)
# [1] "t" "u" "v" "x" "c" "e" "i" "n" "o" "p" "r" "s"
tbl <- InterceptStrategy("SCPreProcessStrategy")
# tibble [12 × 2]Extend Cell Type Annotation Methods
By default, SigBridgeR includes three built-in cell type annotation
algorithms: SingleR, CellTypist, and
mLLMCelltype—all of which require additional
installation and dependencies (See README).
They can all be invoked through the unified interface function
SCAnnotate.
Extending cell annotation method is also very easy:
my_method <- function(seurat_obj, ...) {
# placeholder
return(seurat_obj)
}
RegisterAnnoMethod(
my = my_method
)
# ✔ Registered `my`To check:
names(SCAnnotateStrategy)
# [1] "my" "mLLMCelltype" "CellTypist" "SingleR"Here are some recommendations for user-defined cell annotation functions.
- The first input and the first output must both be Seurat objects.
- If multiple objects need to be returned, it is recommended to use a
listformat. - Use
tidycheckUsage::tidycheckUsage()to check the function’s syntax.