error bound tests #3104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

petrelharp merged 3 commits into tskit-dev:main from petrelharp:pca_error

Mar 19, 2025

Contributor

petrelharp commented Mar 19, 2025

Here's a more precise description of what (we think) the error bound implies for precision, and additional tests.

I was having trouble getting this to pass, but then realized it was because the time_window in the test was making it so the matrix was low rank in some genomic windows. So, I removed the time windowing (we don't need to test that part here, as I don't see how it'd interact with the random approximation besides in trivial ways like this).

Also note that the error_bound is labeled "experimental" - which is good, as this is not a rigorous benchmarking.


          error bound tests

79d1ec1

codecov bot commented Mar 19, 2025 •

edited

Loading

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.95%. Comparing base (c78ffbc) to head (d47fa8b).
Report is 3 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3104   +/-   ##
=======================================
  Coverage   89.95%   89.95%           
=======================================
  Files          29       29           
  Lines       32471    32471           
  Branches     5823     5823           
=======================================
  Hits        29209    29209           
  Misses       1860     1860           
  Partials     1402     1402

Flag	Coverage Δ
c-tests	`86.69% <ø> (ø)`
lwt-tests	`80.78% <ø> (ø)`
python-c-tests	`89.23% <ø> (ø)`
python-tests	`98.99% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
python/tskit/trees.py	`98.86% <ø> (ø)`

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jeromekelleher approved these changes

View reviewed changes

Member

jeromekelleher left a comment

LGTM. @hanbin973 can you have a look please?

hanbin973 reviewed

View reviewed changes

python/tests/test_relatedness_vector.py Outdated

    
                  #   = \sum_i b_i (lambda - L_i)^2 + lambda^2 |delta|^2

                  #   < \epsilon   (where epsilon is the spectral norm bound error_bound)

                  # so

                  #  |delta| < sqrt(epsilon / lambda)

Contributor

hanbin973 Mar 19, 2025

lambda should be at the outside of the sqrt.
lambda ^2 |delta| ^2 < \epsilon so |delta| < sqrt(epsilon) / lambda.

python/tests/test_relatedness_vector.py Outdated

    
                  # then let v = \sum_i b_i u_i + delta be the projection of v into U,

                  # and we have that

                  #  |lambda v - U L U* v|^2

                  #   = \sum_i b_i (lambda - L_i)^2 + lambda^2 |delta|^2

Contributor

hanbin973 Mar 19, 2025

b_i should be b_i^2.
See |lambda v- ULU*|^2 = \sum_i (\lambda - L_i)^2 b_i^2 + \lambda^2 |delta|^2.

Contributor Author

petrelharp Mar 19, 2025

whoops - typo!

python/tests/test_relatedness_vector.py Outdated

    
                  #  |delta| < sqrt(epsilon / lambda)

                  # since this is the amount by which the eigenvector v isn't hit by the columns of U.

                  # Then also for each i that if b_i is not small then

                  #  |lambda - L_i| < sqrt(epsilon)

Contributor

hanbin973 Mar 19, 2025

I would like to elaborate this part.
Let m = min_i |lambda - L_i|^2. Then, epsilon > \sum_i (lambda - L_i)^2 b_i^2 + lambda^2 |delta|^2 >= m * \sum_i b_i^2 + lambda^2 |delta|^2 = m * (1-|delta|^2) + lambda^2 |delta|^2.
Hence, min_i |lambda-L_i|^2 = m < (epsilon - lambda^2 |delta|^2) / (1- |delta|^2).

I'm not sure how exactly the RHS of the final equation is strictly smaller than sqrt(epsilon).

Contributor

hanbin973 Mar 19, 2025

The RHS is epsilon + |delta|^2 * (epsilon - \lambda^2) / (1-|delta|^2) and assuming that lambda is suitably larger than epsilon which is roughly lambda_{k+1} (k is the rank of the approximation, the formula in eq 1.11 of the paper), the residual term |delta|^2 * (epsilon - \lambda^2) / (1-|delta|^2) is negative, proving the inequality. This somewhat explains why we can't trust the l-th component that is too close to the k-th component: the singular value of the l-th component should be reasonably larger than the k+1-th component for the bound to hold.

python/tests/test_relatedness_vector.py Outdated

    
                  # Bounds on the error are from equation 1.11 in https://arxiv.org/pdf/0909.4061 -

                  # this gives a bound on reconstruction error (i.e., spectral norm between the GRM

                  # and the low-diml approx). But since the spectral norm is

                  # |X| = sup_v |Xv|^2/|v|^2,

Contributor

hanbin973 Mar 19, 2025

The correct definition is |X| = sup_v |Xv|_2 / |v|_2.

Contributor Author

petrelharp Mar 19, 2025

oh good! I got it in my head that it was squared, which seemed wrong.


          removed erroneous square root

c618728

hanbin973 reviewed

View reviewed changes

python/tests/test_relatedness_vector.py Outdated

    
                  #  epsilon > \sum_i (lambda - L_i)^2 b_i^2 + lambda^2 |delta|^2

                  #   >= m * \sum_i b_i^2 + lambda^2 |delta|^2

                  #   = m * (1-|delta|^2) + lambda^2 |delta|^2.

                  # Hence, min_i |lambda-L_i|^2 = m < (epsilon - lambda^2  |delta|^2) / (1- |delta|^2).

Contributor

hanbin973 Mar 19, 2025

L841 and 844's epsilon both needs a square. Everything else looks okay.


          squared

d47fa8b

Contributor

hanbin973 commented Mar 19, 2025

Let's merge.

petrelharp merged commit f11db43 into tskit-dev:main

19 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet