Fix lda.default "group means are numerically identical" error on macOS/Windows#98
Fix lda.default "group means are numerically identical" error on macOS/Windows#98
lda.default "group means are numerically identical" error on macOS/Windows#98Conversation
…S/Windows Agent-Logs-Url: https://github.com/waldronlab/lefser/sessions/63fcf32c-29fa-4538-897f-746dc418b972 Co-authored-by: lwaldron <639942+lwaldron@users.noreply.github.com>
lda.default "group means are numerically identical" error on macOS/Windows
|
@copilot I don't like the idea of masking an error (identical group means) by adding jitter. The error occurs because of true collinearity, and adding jitter only masks that collinearity. A correct fix would be to filter (features?) to actually address the collinearity. The user should be aware of and have control over such filtering. |
Agent-Logs-Url: https://github.com/waldronlab/lefser/sessions/d25741e4-ef4e-4202-a62c-b6861c9d5562 Co-authored-by: lwaldron <639942+lwaldron@users.noreply.github.com>
Replaced the jitter approach with a proper How it works: Before calling This surfaces the collinearity to the user rather than silently masking it. The tests were updated accordingly: the score test now uses non-degenerate data (different within-group variation patterns per feature), and a dedicated test verifies the warning is emitted for genuinely collinear inputs. |
lwaldron
left a comment
There was a problem hiding this comment.
Not a fan of these "fixes" - the real problem was artificial rank-deficient test data (each class is rank 1), relying on a seemingly previously undefined behavior of MASS::lda in such cases but which should have errored. Apparently this is a new feature of MASS to correctly throw this error. I will replace the test data with something non rank-deficient offline.
test_that("ldaFunction correctly identifies classes and calculates scores", {
class_A_data <- data.frame(
feature1 = c(10, 11, 12),
feature2 = c(1, 2, 3), # -> c(2, 0, 4)
class = "A"
)
class_B_data <- data.frame(
feature1 = c(1, 2, 3), # -> c(2, 0, 4)
feature2 = c(10, 11, 12),
class = "B"
)
Newer MASS binaries (macOS/Windows) apply a strict rank check in
lda.defaultthat throws"group means are numerically identical"when the between-class scatter matrix has rank 0 in the sphered space. This is triggered by features that are linearly dependent in the within-group-centred space — including the original unit test data where the within-group deviations offeature1andfeature2are identical, causing the group means to project to the same point after within-class sphering.Changes
R/lefser.R—filterCollinearFeatures: New internal function that detects and removes linearly dependent features before LDA. It computes the within-group-centred feature matrix, runs QR decomposition to identify redundant columns, removes them, and emits a named warning so users know which features were affected and can take action (e.g. usingget_terminal_nodes):R/lefser.R—ldaFunction: CallsfilterCollinearFeatures()on the feature matrix before fitting the LDA model, replacing the earliertryCatch+jitter approach. Also removes the now-redundant commented-outcreateUniqueValuescall inlefser().tests/testthat/test-lefser.R: Updated the score test to use non-degenerate data (5 samples per class with distinct within-group variation patterns for each feature), so no filtering occurs and LDA runs normally. Added a dedicated test"ldaFunction warns and removes within-group collinear features"that verifies the collinearity warning is emitted for the original pathological anti-symmetric data.