Skip to contents

This function checks each row of a matrix (including sparse matrices of class dgCMatrix) for zero variance. Rows with zero variance or only NA values are identified, and an error is thrown listing the names of these rows. This is useful for preprocessing data where constant rows may cause issues in analyses (e.g., PCA, regression).

Usage

Check0VarRows(mat, call = rlang::caller_env())

Arguments

mat

A numeric matrix or a sparse matrix of class dgCMatrix (from the Matrix package). Rows represent features (e.g., genes), and columns represent observations.

call

The environment from which the function was called, used for error reporting. Defaults to rlang::caller_env(). Most users can ignore this parameter.

Value

Invisibly returns a numeric vector of row variances. If zero-variance rows are found, the function throws an error with a message listing the problematic row names.

Details

For dense matrices, variance is computed using an optimized rowVars function that efficiently calculates row variances with proper NA handling. For sparse matrices of class dgCMatrix, variance is computed using a mathematical identity that avoids creating large intermediate matrices. Rows with fewer than 2 non-zero observations are treated as zero-variance.

This implementation is memory-efficient and handles large matrices better than the original version.

Examples

if (FALSE) { # \dontrun{
# Dense matrix example
set.seed(123)
mat_dense <- matrix(rnorm(100), nrow = 10)
rownames(mat_dense) <- paste0("Gene", 1:10)
Check0VarRows(mat_dense) # No error if all rows have variance

# Introduce zero variance
mat_dense[1, ] <- rep(5, 10) # First row is constant
Check0VarRows(mat_dense)     # Throws error listing "Gene1"

# Sparse matrix example
library(Matrix)
mat_sparse <- as(matrix(rpois(100, 0.5), nrow = 10), "dgCMatrix")
rownames(mat_sparse) <- paste0("Gene", 1:10)
Check0VarRows(mat_sparse)
} # }