Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add clustering coefficient definitions #316

Merged
merged 17 commits into from
Apr 5, 2023
Merged

Add clustering coefficient definitions #316

merged 17 commits into from
Apr 5, 2023

Conversation

nwlandry
Copy link
Collaborator

@nwlandry nwlandry commented Apr 1, 2023

No description provided.

@codecov
Copy link

codecov bot commented Apr 1, 2023

Codecov Report

Patch coverage: 97.50% and project coverage change: +0.04 🎉

Comparison is base (c79775c) 90.25% compared to head (1ec484c) 90.29%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #316      +/-   ##
==========================================
+ Coverage   90.25%   90.29%   +0.04%     
==========================================
  Files          35       36       +1     
  Lines        2811     2926     +115     
==========================================
+ Hits         2537     2642     +105     
- Misses        274      284      +10     
Impacted Files Coverage Δ
xgi/stats/__init__.py 84.86% <ø> (ø)
xgi/algorithms/clustering.py 96.66% <96.66%> (ø)
xgi/algorithms/__init__.py 100.00% <100.00%> (ø)
xgi/algorithms/centrality.py 97.91% <100.00%> (ø)
xgi/stats/nodestats.py 100.00% <100.00%> (ø)

... and 3 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@nwlandry nwlandry linked an issue Apr 2, 2023 that may be closed by this pull request
@nwlandry
Copy link
Collaborator Author

nwlandry commented Apr 2, 2023

Before I un-draft this PR, I wanted to know what good names for the functions were so that I can also add them to the nodestats module. Any suggestions, @leotrs? I was thinking maybe we could remove "coefficient" from the names.

@maximelucas
Copy link
Collaborator

Incredible work Nich, thanks so much. We needed these different clustering coefficients.

Two comments:

  • some of the variable names are not very explicit and make it harder to understand the code.
  • the description is of all three functions is very similar. Could we briefly describe their specificity, what they actually compute? Maybe describe their behaviour in cases where they differ. Maybe also add their definition with a formula?

Co-authored-by: Maxime Lucas <maximelucas@users.noreply.github.com>
@leotrs
Copy link
Collaborator

leotrs commented Apr 4, 2023

@nwlandry Thanks for your work here. I agree I'd remove clustering from all the names. Other than that, we kinda do need descriptive names so I'm not sure there's a good way of shortening them further.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@nwlandry nwlandry marked this pull request as ready for review April 4, 2023 21:34
@nwlandry
Copy link
Collaborator Author

nwlandry commented Apr 4, 2023

Incredible work Nich, thanks so much. We needed these different clustering coefficients.

Two comments:

  • some of the variable names are not very explicit and make it harder to understand the code.
  • the description is of all three functions is very similar. Could we briefly describe their specificity, what they actually compute? Maybe describe their behaviour in cases where they differ. Maybe also add their definition with a formula?

Thanks :) I added a brief description in the docstrings. Some of the formulas are a bit long so I just described them.

@nwlandry
Copy link
Collaborator Author

nwlandry commented Apr 4, 2023

@nwlandry Thanks for your work here. I agree I'd remove clustering from all the names. Other than that, we kinda do need descriptive names so I'm not sure there's a good way of shortening them further.

After chatting with Max, I decided to keep "coefficient" in the name for maximum clarity, but if we feel it's too clunky down the road, we can remove it.

@nwlandry
Copy link
Collaborator Author

nwlandry commented Apr 4, 2023

I think it should be ready for review!

@nwlandry nwlandry linked an issue Apr 5, 2023 that may be closed by this pull request
xgi/algorithms/clustering.py Outdated Show resolved Hide resolved
@nwlandry
Copy link
Collaborator Author

nwlandry commented Apr 5, 2023

I did a few things - I changed all tests about nan to use np.isnan. I implemented your suggestion and I realized that isolated nodes and leaves currently have a clustering coefficient of np.nan, but I can change it to zero if you would like. I saw this paper (https://arxiv.org/abs/0802.2512) which seems to indicate that people typically use 0 as the default value.

@maximelucas
Copy link
Collaborator

No experience or strong opinion but if 0 seems used more let's go with that I guess

@leotrs
Copy link
Collaborator

leotrs commented Apr 5, 2023

No strong opinion either so I'll let someone else decide this one 😉

@nwlandry
Copy link
Collaborator Author

nwlandry commented Apr 5, 2023

No experience or strong opinion but if 0 seems used more let's go with that I guess

Sounds good! I'll add a note about this as well :)

@nwlandry
Copy link
Collaborator Author

nwlandry commented Apr 5, 2023

Just to leave good documentation on what I did:

  • I added three definition of the clustering coefficient and added these definitions to the algorithms module as well as nodestats.
  • I moved the utils and linalg modules to their own folder in the docs.
  • I renamed the centrality measures to be more intelligible.
  • I changed np.NaN to np.nan in all of the code as per (https://numpy.org/doc/stable/reference/constants.html)
  • I changed the evaluation of NaN values to np.isnan(val) in all of the code as per (https://numpy.org/doc/stable/user/misc.html)
  • I added unit tests for the clustering coefficients.

@nwlandry nwlandry merged commit 7ee8aa2 into main Apr 5, 2023
@nwlandry nwlandry deleted the add-clustering branch April 5, 2023 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

names of centrality measures clustering coefficient not 1 in all-to-all
3 participants