Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major Refactor to GraphReporters #181

Merged
merged 16 commits into from
Mar 12, 2019

Conversation

jayqi
Copy link
Collaborator

@jayqi jayqi commented Feb 10, 2019

I think this is ready to go!

New AbstractGraph and DirectedGraph classes:

These objects will encapsulate all of the functionality related to the graph model of the package networks, including methods for calculating node and graph measures. They slot into the pkg_graph in place of the igraph object.

AbstractGraph base class

  • This enables future reporters for other graph types, e.g., UndirectedGraph, BipartiteGraph (though I can't think of great reporters that use them. Maybe a package social graph with authors as edges?)

Calculating measures

  • Measures are split into two types: node_measures and graph_measures.
    • Graph measure names are prefixed, like graphBetweenness.
  • Functions to calculate each measure are split out separately. Methods node_measures() and graph_measures() are used both to retrieve and to calculate specified measures.
    • Both methods take a vector of measure names
    • Default value of NULL will return cached already-calculated ones
  • Introduced idea of default measures, captured by methods default_node_measures and default_graph_measures of DirectedGraph.
    • I noticed that the tables were getting too wide. Some of the measures like closeness don't even make sense for unconnected graphs. I think it's fine to support more measures than we really want to show people by default.
    • Current defaults are: outDegree, inDegree, outSubgraphSize, inSubgraphSize, betweenness, pageRank, graphOutDegree, graphInDegree, graphBetweenness

Changes to individual measures

  • outSubgraphSize and inSubgraphSize have been replaced by recursiveDeps and recursiveRevDeps, which are comparatively decremented by 1 so as to not count the node itself. Resolves inSubgraphSize and outSubgraphSize : Should node include itself in count? #191.
  • Added new measures: inCloseness, graphInCloseness, authorityScore, graphAuthorityScore
  • Renamed outBetweenness to just betweenness. There is only one version of betweenness for directed graphs (pretty sure, and igraph doesn't a flag for that), and I don't think it makes sense to calculate undirected betweenness on a directed graph.

Graph Reporter Refactor

  • All graph reporter classes now have a private variable graph_class that must be assigned with a AbstractGraph class constructor.
  • pkg_graph now stores the DirectedGraph object. Accessing the igraph object would need: reporter$pkg_graph$igraph
  • AbstractGraphReporter: new method calculate_default_measures that replaces the old calculate_network_measures.
    • The call to calculate_test_coverage in FunctionReporter has been moved into this method.
  • Removed hardcoding of measure calculation from get_summary_view and elsewhere.
    • calculate_default_measures is now explicitly called in each reporter's .Rmd file to account for the change.
  • The network_measures active binding now concatenates the pkg_graph's graph measures with any non-graph-related network measures. The only reporter currently with non-graph-related network measures is the FunctionReporter, which has coverage-related aggregate measures.

Unit Test Overhaul

  • New structure: each reporter class now has two test files: test-XReporter-class.R tests the functionality of the class, and test-XReporter-network.R does expected value testing of the nodes and edges of the test packages against stored .csvs in tests/testthat/testdata
  • Replaced class structure tests with a test that checks the public interface of instantiated objects via names. This will include inherited members and is ultimately what is more relevant to what a user sees.
  • Consolidated logger silencing into setup-logger.R and teardown-logger.R. Resolves Remove duplicated logger silencing #170.

Other:

@bburns632
Copy link
Collaborator

bburns632 commented Feb 10, 2019

@jayqi some initial feedback on the items you called out.

Network vs Graph:

I believe we should use the term network instead of graph throughout this package. Besides personal preference, I believe network is used more often when modeling real systems whereas graph is used more often as a theoretical construct.

Here's snippet from the Graph Theory wiki page:

Graphs can be used to model many types of relations and processes in physical, biological,[4] social and information systems. Many practical problems can be represented by graphs. Emphasizing their application to real-world systems, the term network is sometimes defined to mean a graph in which attributes (e.g. names) are associated with the nodes and/or edges.

and one from the Network Theory wiki page:

Network theory is the study of graphs as a representation of either symmetric relations or asymmetric relations between discrete objects. In computer science and network science, network theory is a part of graph theory: a network can be defined as a graph in which nodes and/or edges have attributes (e.g. names).

Given the extent of this refactor, this would be a good time to change the use of these terms in this package.

Report Header Change:
This is one of our main branding opportunities for new and existing pkgnet users. I like the idea of changing it, but if we're going to change it, what about to something with a little more pizazz. For example, "a pkgnet created report", "a pkgnet creation", "a pkgnet report", etc. We should discuss further.

Change to * calculate_default_measures
If calculate_default_measures are to be done in the .Rmd file, does this mean that node measures will not be available in the nodes table in the returned R list object? I would vote to keep that functionality.

@bburns632
Copy link
Collaborator

@jayqi, @jameslamb, and I had the luxury to meet IRL regarding this PR.

With regard to the three items above, the consensus is:

  1. Adopt the convention of using the term graph instead of network for most everything within the package (e.g. function names, variables, etc.). However, for externally facing outputs (i.e. reports, R objects), the term network is preferred to align with the user facing nomenclature.
  2. Pull the report header change out of this PR and make a separate one. Will discuss in that PR.
  3. Node table with metrics, albiet less metrics, remains available within the R object output.

@codecov-io
Copy link

codecov-io commented Feb 23, 2019

Codecov Report

Merging #181 into master will increase coverage by 1.01%.
The diff coverage is 97.84%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #181      +/-   ##
==========================================
+ Coverage   90.51%   91.52%   +1.01%     
==========================================
  Files          10       11       +1     
  Lines         938      920      -18     
==========================================
- Hits          849      842       -7     
+ Misses         89       78      -11
Impacted Files Coverage Δ
R/PackageFunctionReporter.R 93.33% <100%> (+0.32%) ⬆️
R/AbstractGraphReporter.R 86.28% <100%> (+1.63%) ⬆️
R/AbstractPackageReporter.R 100% <100%> (+10.34%) ⬆️
R/PackageInheritanceReporter.R 100% <100%> (ø) ⬆️
R/PackageDependencyReporter.R 81.65% <90.9%> (-1.82%) ⬇️
R/GraphClasses.R 93.82% <93.82%> (ø)
R/CreatePackageReport.R 98.24% <0%> (-1.76%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ca56393...3e8ac75. Read the comment docs.

@jayqi
Copy link
Collaborator Author

jayqi commented Feb 24, 2019

@bburns632 @jameslamb --- This PR is ready for final review!

@bburns632
Copy link
Collaborator

Can you update the copyright year in footer.html please?

Copy link
Collaborator

@bburns632 bburns632 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jayqi looks good. I have two suggested changes:

  1. sort the node tables in some fashion (they are not currently sorted by default).
  2. minor copyright year update

Also, I have a question on the naming of some active bindings in the function reporter.

R/AbstractGraphReporter.R Show resolved Hide resolved
inst/package_report/package_report.Rmd Show resolved Hide resolved
tests/testthat/test-AbstractPackageReporter.R Show resolved Hide resolved
vignettes/pkgnet-intro.Rmd Show resolved Hide resolved
vignettes/pkgnet-intro.Rmd Show resolved Hide resolved
report_markdown_path = function(){
system.file(file.path("package_report", "package_function_reporter.Rmd"), package = "pkgnet")
},

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jayqi Why are there R6 active bindings saved in package function reporter and not active bindings for S4 or R5? Seems odd to build active bindings for only one of many possible types.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because they're not supported by Function Reporter. In any case, I'm going to move those bindings into private methods. I originally did it that way because I wanted to take advantage of active bindings' clean interface. But honestly I don't think users would or should need to use these methods, and they make the interface more complicated.

Copy link
Collaborator Author

@jayqi jayqi Mar 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #197 to call out that Function Reporter doesn't support S4 and RC yet.

And I've removed these active bindings. ✅

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments. Overall looking good. One thing I couldn't figure out from the diff...can you give me a code snippet with the "main" user workflow with pkgnet? I'm still struggling to understand whether people now have to use a method call on the reporters to get more statistics or not.

NEWS.md Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
@@ -19,7 +32,7 @@ None
* Revised unit test setup and teardown files to enable devtools::test() to work as well as CRAN server testing ([#167](https://github.com/UptakeOpenSource/pkgnet/pull/167))

## BUG FIXES
* Corrected node statisitcs table merging error ([#165](https://github.com/UptakeOpenSource/pkgnet/issues/165), [#166](https://github.com/UptakeOpenSource/pkgnet/pull/166))
* Corrected node statistics table merging error ([#165](https://github.com/UptakeOpenSource/pkgnet/issues/165), [#166](https://github.com/UptakeOpenSource/pkgnet/pull/166))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you make this a separate PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really? I don't want to screw with git to undo committing this. I also disagree that this kind of thing needs to be a separate PR. There is no discussion to be had, and setting the bar that high for fixing this kind of small typo makes it less likely that they get fixed. This file has no code and fixing this has no risk of breaking anything.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"screw with git to undo committing this" --> that is not what I'm recommending. If you made another PR it would get merged immediately (since it's such a small and non-controversial change) and when you rebased this PR to master the diff would disappear with no conflict.

It's fine for now because it seems this PR is close to being merged anyway, but in general I disagree with "setting the bar that high for fixing this kind of small typo makes it less likely that they get fixed". I dislike the pattern of long-lived large PRs picking up other small, unrelated changes because it means that anyone else doing development on the project will be waiting on those changes unnecessarily.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that's fine. I have had some bad experiences in the past where the rebase still gives me conflicts, but maybe that was just bad luck and I'm overcompensating.

R/AbstractGraphReporter.R Outdated Show resolved Hide resolved
R/AbstractGraphReporter.R Outdated Show resolved Hide resolved
R/GraphClasses.R Show resolved Hide resolved
R/PackageInheritanceReporter.R Show resolved Hide resolved
tests/testthat/test-FunctionReporter-class.R Show resolved Hide resolved
vignettes/pkgnet-intro.Rmd Show resolved Hide resolved
vignettes/pkgnet-intro.Rmd Show resolved Hide resolved
@jayqi
Copy link
Collaborator Author

jayqi commented Mar 2, 2019

@jameslamb

This is the bit I have in the vignette:

Both the DependencyReporter and the FunctionReporter have an object called pkg_graph that contains the graph model of their respective networks. This object has methods to calculate additional node-level and graph-level measures. It is powered by igraph, and the igraph object itself is directly accessible with pkg_graph$igraph.

report2$FunctionReporter$pkg_graph$node_measures(c('hubScore', 'authorityScore'))

Basically right now, you can't add non-default measures into CreatePackageReport.

You'd need to take your reporter, whether from directly instantiation or via CreatePackageReport, and manually calculate any non-default measures you're interested in seeing. You'd only be able to get that interactively. We don't currently have any way of doing any non-default stuff and getting that into an HTML package report, not just limited to these measures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment