scPAS: Single-Cell Phenotype-Associated Subpopulations (Optimized)
Source:R/22-scPAS_Screen.R
scPAS.optimized.RdAn optimized implementation of scPAS for identifying phenotype-associated cell subpopulations from single-cell RNA-seq data by integrating bulk transcriptomic data. This version includes performance optimizations, memory-efficient matrix operations, and enhanced statistical testing.
Usage
scPAS.optimized(
bulk_dataset,
sc_dataset,
phenotype,
assay = "RNA",
tag = NULL,
nfeature = NULL,
imputation = TRUE,
imputation_method = c("KNN", "ALRA"),
alpha = NULL,
cutoff = 0.2,
network_class = c("SC", "bulk"),
independent = TRUE,
family = c("gaussian", "binomial", "cox"),
permutation_times = 2000,
FDR.threshold = 0.05,
verbose = TRUE,
...
)Arguments
- bulk_dataset
A matrix or data frame containing bulk expression data. Each row represents a gene and each column represents a sample. Expression values should be continuous
- sc_dataset
A Seurat object or matrix containing single-cell RNA-seq expression data. If a matrix is provided, it will be automatically processed using Seurat's default pipeline.
- phenotype
Phenotype annotation for bulk samples. The format depends on the regression family:
For
family = "gaussian": A continuous numeric vectorFor
family = "binomial": A binary group indicator vector (0/1 encoded) or factor with two levelsFor
family = "cox": A two-column matrix with columns named 'time' and 'status' (1 = event, 0 = censored)
- assay
Character string specifying the assay name in the Seurat object to use for analysis. Default: 'RNA'.
- tag
Optional character vector of length 2 specifying names for each phenotypic group. Used only for logistic regression (
family = "binomial").- nfeature
Numeric value or character vector specifying the number of variable features to select, or a custom set of feature names. If
NULL, all common genes between bulk and single-cell data are used.- imputation
Logical indicating whether to perform imputation on single-cell data. Default:
TRUE.- imputation_method
Character string specifying the imputation method. One of: 'KNN', 'ALRA'. Default: 'KNN'.
- alpha
Numeric value or vector specifying the regularization parameter balancing L1 and network-based penalties. If
NULL, a default sequence from 0.001 to 0.9 is used.- cutoff
When
alpha = NULL, the threshold for selecting the optimal alpha value. Default: 0.2- network_class
Character string specifying the source for constructing the gene-gene similarity network. One of: 'SC' (single-cell data), 'bulk' (bulk data). Default: 'SC'.
- independent
Logical indicating whether to construct background distributions independently for each cell. Default:
TRUE.- family
Character string specifying the regression family. One of: "gaussian" (linear regression), "binomial" (logistic regression), "cox" (Cox regression). Default: "gaussian".
- permutation_times
Numeric value specifying the number of permutations for statistical testing. Default: 2000.
- FDR.threshold
Numeric value specifying the false discovery rate threshold for identifying phenotype-associated cells. Default: 0.05.
- verbose
Logical indicating whether to print progress messages. Default:
TRUE.- ...
Additional arguments to be passed to
scPAS.optimized(). Currently none are supported.
Value
Returns the input Seurat object with the following additions:
Metadata columns:
scPAS_RS- Raw risk scores for each cellscPAS_NRS- Normalized risk scores (Z-statistics)scPAS_Pvalue- P-values from permutation testingscPAS_FDR- False discovery rate adjusted p-valuesscPAS- Cell classification labels: "Positive", "Negative", or "Neutral"
Miscellaneous slot (
sc_dataset@misc$scPAS_para):alpha- Alpha values used in model optimizationlambda- Lambda values used in model optimizationfamily- Regression family usedCoefs- Final model coefficients for each genebulk- Processed bulk expression matrixphenotype- Processed phenotype vectorNetwork- Gene-gene similarity network used
Details
This optimized implementation of scPAS integrates bulk and single-cell transcriptomic data to identify phenotype-associated cell subpopulations through a comprehensive analytical workflow:
Workflow Overview:
Data Preprocessing:
Identifies common genes between bulk and single-cell datasets
Filters ribosomal and mitochondrial genes
Performs quantile normalization on bulk data
Optionally imputes single-cell data using specified methods
Network Construction:
Builds gene-gene similarity networks from either single-cell or bulk data
Uses correlation-based similarity measures
Applies sparse neighborhood network (SNN) construction
Regularized Regression:
Implements network-regularized sparse regression (APML0)
Optimizes alpha and lambda parameters through cross-validation
Supports multiple regression families (gaussian, binomial, cox)
Risk Score Calculation:
Computes phenotype-associated risk scores for each cell
Uses matrix optimizations for efficient computation
Statistical Validation:
Performs permutation testing to assess significance
Calculates Z-statistics and false discovery rates
Classifies cells based on statistical thresholds
Note
The function requires both bulk and single-cell data from related biological
conditions. For survival analysis (family = "cox"), the phenotype must be
a properly formatted survival object or matrix with 'time' and 'status'
columns.
References
Xie A, Wang H, Zhao J, Wang Z, Xu J, Xu Y. scPAS: single-cell phenotype-associated subpopulation identifier. Briefings in Bioinformatics. 2024 Nov 22;26(1):bbae655.
See also
Other scPAS:
DoscPAS()
Examples
if (FALSE) { # \dontrun{
# Example with continuous phenotype (linear regression)
result <- scPAS.optimized(
bulk_dataset = bulk_expr_matrix,
sc_dataset = seurat_obj,
phenotype = continuous_phenotype,
family = "gaussian"
)
# Example with binary phenotype (logistic regression)
result <- scPAS.optimized(
bulk_dataset = bulk_expr_matrix,
sc_dataset = seurat_obj,
phenotype = binary_groups,
family = "binomial",
tag = c("Control", "Disease")
)
# Example with custom parameters
result <- scPAS.optimized(
bulk_dataset = bulk_expr_matrix,
sc_dataset = seurat_obj,
phenotype = survival_data,
family = "cox",
nfeature = 2000,
permutation_times = 5000,
FDR.threshold = 0.01
)
} # }