Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress edges #163

Closed
nwlandry opened this issue Sep 1, 2022 · 10 comments
Closed

Compress edges #163

nwlandry opened this issue Sep 1, 2022 · 10 comments

Comments

@nwlandry
Copy link
Collaborator

nwlandry commented Sep 1, 2022

Implement way of collapsing identical elements. Right now, IDView.duplicates() simply chooses the lowest ID as the non-duplicate edge, but there are several ways of

  • weighting the collapsed nodes/edges:
    1. Leave the resulting single node/edge unweighted
    2. Weighting the resulting single node/edge by its original multiplicity
  • labeling the node/edge:
    1. Give the new node/edge the smallest id of the duplicates
    2. Give the new node/edge a tuple ID where the tuple are the nodes/edges that were compressed.
    3. Give the new node/edge a completely new ID
      This would affect the duplicates method and potentially require another method.

Something along these lines is the collapse_identical_elements in HyperNetX.

@nwlandry nwlandry mentioned this issue Sep 1, 2022
@nwlandry
Copy link
Collaborator Author

nwlandry commented Sep 7, 2022

What do we think the syntax should be for calling this? Originally, I was thinking H.edges.collapse(weighted=True, relabel="tuple") and same for nodes, but unfortunately, this would involve modifying the hypergraph from the IDView, which seems to contradict the notion of what a view is. What about H.collapse_edges() and H.collapse_nodes()?

@leotrs
Copy link
Collaborator

leotrs commented Sep 8, 2022

Agreed this method does not belong in the EdgeView. It belongs in Hypergraph. The main objective of Hypergraph is after all to modify the network's structure.

What about H.collapse_edges(weighted=weight_mode, index=index_mode, return_collapse_dict). Here "weight_mode" could be None (no weight) or str (used as the name of the attribute that stores the weight), and "index_mode" could be one of "tuple", "min", or "auto". Auto means assign a completely new ID automatically.

I'm using "index" instead of "label" because elsewhere in the codebase we refer to nodes and edges using the word "ID", not the word "label". What do you think?

@maximelucas
Copy link
Collaborator

Would "merge" rather "collapse" make sense?
Collapse sounds a bit dramatic 😄

@leotrs
Copy link
Collaborator

leotrs commented Sep 27, 2022

As per #177, the simplicial complex class will (might?) have a method that "collapses" things so it might be wise to choose a different name for this one. I agree merge sounds nice.

@leotrs
Copy link
Collaborator

leotrs commented Oct 27, 2022

As per #196, we should keep in mind what happens to the incidence matrix before/after compressing multiple edges.

@nwlandry
Copy link
Collaborator Author

Some things I'm wondering:

  • What should we do with overlapping attributes? Make them a list?
  • Similarly, each edge could, in principle, have a weight. Should these be used in computing the merged edge weight? If that's the case, should we store it in a "weight" attribute?
  • If there isn't a "weight" attribute for non-multiedges, should we add it so that every edge has this attribute?

@maximelucas
Copy link
Collaborator

  • I feel like it might break some things to have some edges "colors" being "red" and other being ["red", "blue"], would it? For example if you decide to use xgi.draw() and want to color edges by their "color" attribute? If there is a way for things no to break, we could have to "attribute merging modes": union and intersection. Union would keep all values of the attributes for the merging edges as a list (set?), intersection would only keep one if it is the value of all merging edges, otherwise None. That could work for numerical attributes too, or we could have "sum", "mean", .., as modes for those, though it might get complicated.

  • Is anything in our codebase using the "weight" attribute explicitly? If not, the actual weight could be stored in the "weight" as a list/sum/mean (see above) and we could generate a new attribute "multiplicity" that would be a integer and could be used in draw() just as "weight" can be.

  • I feel this will be different for numerical and categorical again. If we have a "multiplicity", then I guess we should add it to all other edges (but it might make less sense for a "weight" in general?)

@leotrs
Copy link
Collaborator

leotrs commented Oct 30, 2022

  • In the case when the user wants to keep all attributes inside a list: "red" becomes ["red"] while two duplicate edges with "red" and "blue" become ["red", "blue"].
  • We do have Hypergraph.add_weighted_edges_from.
  • That's a good point RE categorical vs numerical.

My proposal is that we should have a common sense default, for example, two duplicate edges with IDs 1 and 2 become a single edge with ID 1 with the attributes of 1 taking precedence over the attributes of 2. But other than that, we could have a keyword parameter that accepts a lambda to give the user complete freedom of what happens:

# by default, attributes of the lowest ID take precedence
H.collapse_edges()
# -> `{1: {"color: "red"}, 2: {"color": "blue"}}` become `{1: {"color": "red"}}`

H.collapse_edges(attributes="collect")
# -> `{1: {"color": "red"}}` becomes `{1: {"color": ["red"]}}` and `{1: {"color: "red"}, 2: {"color": "blue"}}` become `{1: {"color": ["red", "blue"]}}`

H.collapse_edges(attributes=lambda x,y: x+y)
# ->`{1: {"age": 10}, 2: {"age": 20}}` become `{1: {"age": 30}}`

# If there are multiple attributes, we can accept a dict
H.collapse_edges(attributes={"age": lambda x,y: x+y, "color": "collect"})
# ...

Note that both the default behavior and attributes="collect" work for both categorical and numerical attributes, while in the latter case, the one with the lambda, the user would be responsible for passing a function that behaves well on the given attribute.

Note this discussion is very similar to the one in #203 about what to do with attributes when adding two hypergraphs.

@nwlandry nwlandry mentioned this issue Nov 3, 2022
@maximelucas
Copy link
Collaborator

Is there anything that #210 did not address or should we close?

@nwlandry
Copy link
Collaborator Author

nwlandry commented Dec 5, 2022

Good call. Closing this.

@nwlandry nwlandry closed this as completed Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants