Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

approximate graph distances #275

Closed
ekg opened this issue Mar 24, 2016 · 4 comments
Closed

approximate graph distances #275

ekg opened this issue Mar 24, 2016 · 4 comments

Comments

@ekg
Copy link
Member

ekg commented Mar 24, 2016

One thing we get easily in a linear reference is distance. How can we derive these efficiently for variation graphs? There are a number of general techniques for deriving approximate graph distances, but maybe these would be overkill. If we have assembled the graph out of alignments, and these are embedded in it (as "threads") then we can convert them into "paths" and measure distances on them. We need everything to be close to some path for this to work. Perhaps both techniques can be useful. We just need a single model for storing the approximate distances.

@edawson
Copy link
Contributor

edawson commented Mar 24, 2016

This seems like it would be easy (if expensive) for DAGified graphs but nigh impossible for non-DAGs. For DAGs couldn't we just BFS from start node -> end node and sum the lengths of the sequences in each possible path? I guess we'd then want to take the min or max or average of that distance as our measure.

@ekg
Copy link
Member Author

ekg commented Mar 24, 2016

Check this out: http://dl.acm.org/citation.cfm?id=1044732

Approximate distance oracles

Let G = (V,E) be an undirected weighted graph with |V| = n and |E| = m. Let k ≥ 1 be an integer. We show that G = (V,E) can be preprocessed in O(kmn1/k) expected time, constructing a data structure of size O(kn1+1/k), such that any subsequent distance query can be answered, approximately, in O(k) time. The approximate distance returned is of stretch at most 2k−1, that is, the quotient obtained by dividing the estimated distance by the actual distance lies between 1 and 2k−1.

@ekg
Copy link
Member Author

ekg commented May 12, 2016

Two ideas.

  1. record chunks of the graph that are about a read pair X some factor wide in such a way that we can pick up that chunk when we're interested in mapping the mate of a given read; overlap them somehow so that each node tells us the chunk that it's most central in.
  2. the distances in the 2d rendering of the graph in graphviz provide approximate distances in terms of bases, and this is particularly true for DAGs; could we use a similar method to derive coordinates for each node/side in the graph? It doesn't need to be just 2d...

@glennhickey
Copy link
Contributor

Quick update here:

In a local xg, I have a hacky (but quick) path-based distance implemented:

Idea: project both nodes to a path, then compute the path distance.
Optionally repeat on multiple paths to find best fit. A node (that's not
on the path) is projected by the position of its (ID-space) neighbours on
the path, which are looked up directly from the existing XGPaths members
vector. Costs an sd_vector rank and select (the only change to xgpath,
being the rank and select helpers for the members vector), and should work
fine on most of our graphs...

While I was at it, I added an expand_context function that works in base
pairs, rather than steps.

The idea being to optionally plug these things into the mapper to see if it
helps us enforce pair consistency a bit better. I've been distracted by
caller bugs, but hope to be able to start testing this in earnest tomorrow
or Monday...

On Thu, May 12, 2016 at 5:07 PM, Erik Garrison notifications@github.com
wrote:

Two ideas.

record chunks of the graph that are about a read pair X some factor
wide in such a way that we can pick up that chunk when we're interested in
mapping the mate of a given read; overlap them somehow so that each node
tells us the chunk that it's most central in.
2.

the distances in the 2d rendering of the graph in graphviz provide
approximate distances in terms of bases, and this is particularly true for
DAGs; could we use a similar method to derive coordinates for each
node/side in the graph? It doesn't need to be just 2d...


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#275 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants