-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Only allow connected graphs in eigenvector_centrality_numpy
#7549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
rossbar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @Peiffap
I wouldn't expect the connectedness-checking to be the performance bottleneck for such a compute heavy function. Some quick %timeit checks with various graphs bear that out locally, but if we wanted to be more rigorous we could add it to the benchmark suite. IMO that's not a blocker for this PR though!
|
All sensible suggestions, thanks for taking a look! It'd definitely be nice to have benchmarks for this kind of change (re: what I said in #7527). I could add a simple benchmark right now, but I think there might have to be a reorganisation of how benchmarking is done at some point in the near future if performance is something you want to prioritize. |
|
I added a simple benchmarking function. I didn't test this locally (it's a pain getting it to work without conda or mamba installed...), so we should make sure it's working as expected before merging |
|
FYI: somewhat related, we have, and are, exploring caching results of functions (see #7283). For example, this can allow us to compute connectedness of a graph only once, which may help mitigate the performance penalty of checking connectedness (and other such properties) here and elsewhere. At PyConUS @dschult, @rlratzel, @MridulS, and I discussed that caching of function results should be added narrowly, incrementally, and as use cases arise, unlike what was done in #7283. Hence maybe caching could begin with e.g. This doesn't change anything that needs done in this PR. It seems like a good idea to me to check for properties such as connectedness to ensure that an algorithm is well-defined for the input graph. |
| def time_eigenvector_centrality_numpy(self, graph): | ||
| # Added to ensure the connectivity check doesn't affect | ||
| # performance too much (see gh-6888, gh-7549). | ||
| _ = nx.eigenvector_centrality_numpy(self.graphs_dict[graph]) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a stab at this @Peiffap + the feedback on benchmarking (it's certainly something that needs to be documented better).
I had been doing this locally as well, and would be in slight favor of not adding the benchmark in this way for the simple reason that one of the example graphs in this benchmark (i.e. the drug interaction graph) is disconnected, which means running the benchmark with this input fails. Overall this doesn't really have an impact on the benchmarking suite (it ignores the failure and continues running) but I'd be in favor of keeping things "clean" for now.
However - it's not a blocker, I'm perfectly happy for it to go in now and just creating a follow-up issue to keep track of re-doing it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look and testing it out locally! That's... annoying, but I'm +1 on not introducing failures into the benchmarking suite. I'm unsure of what the best solution would be then, but here are some options:
- Leave things as they are (but we agree that's not ideal)
- No benchmarking for now (also not ideal)
- Make a separate class with appropriate setup (i.e. no disconnected graphs) to benchmark
eigenvector_centrality_numpy(). This touches on a more general question of how to organize benchmarks. One way (that I haven't thought through, though) would be to somewhat mirror the structure of thenetworkx/folder; if nothing else, it'd at least be consistent and easy to navigate. I feel like if this is the option we go with, then it probably makes sense to not include it as part of this PR, but to make it part of a bigger "tracking benchmarking" issue. - Explicitly check whether the graph is the drug interaction network, and skip the benchmark if it is. It looks like asv has decorators for this.
I think option 3 is the one we should be looking at when it comes to the future of the package, but as a more immediate fix, I like option 4 best.
- Add exception for disconnected graphs
- Add explanation for inconsistency
Fixes networkx#6888.
Co-authored-by: Dan Schult <dschult@colgate.edu>
Co-authored-by: Joye Mang <joyemang33@users.noreply.github.com>
- Handle directed graphs correctly
- Rewrite docs slightly
- Add tests that `AmbiguousSolution` is correctly raised
both for directed and undirected graphs
- Change name in connectivity check - Avoid local var binding for error message Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>
Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>
a247744 to
10e1104
Compare
|
I went ahead and split up the benchmark in a naive way in order to unstick this - maybe @MridulS can doublecheck that this makes sense. Anyways, my approval of the changes themselves (regardless of benchmarking changes) stands! |
|
There is an error with the benchmark action, complaining about |
Fixes #6888.
As discussed in #6888, the SciPy solver does not handle disconnected graphs well, leading to inconsistent results. The proposed solution was to add a check that
Gis connected, as long as that didn't make the algorithm much slower (for reference, the execution time was usually around 1-2% longer with the check).This PR:
eigenvector_centrality_numpyto only accept connected graphs (strongly connected for directed graphs)