Only allow connected graphs in `eigenvector_centrality_numpy` #7549

Peiffap · 2024-07-04T20:33:47Z

As discussed in #6888, the SciPy solver does not handle disconnected graphs well, leading to inconsistent results. The proposed solution was to add a check that G is connected, as long as that didn't make the algorithm much slower (for reference, the execution time was usually around 1-2% longer with the check).

This PR:

Changes eigenvector_centrality_numpy to only accept connected graphs (strongly connected for directed graphs)
Updates the docstring to clearly mention the reason for this choice (i.e. the inconsistency, and its explanation by @dschult)
Adds two tests to verify that the function properly raises when given a disconnected graph as input
Reorganises the arguments in the docstring to match the arguments of the function
Has other very minor changes (improvements?) to the docstring.

rossbar

LGTM, thanks @Peiffap

I wouldn't expect the connectedness-checking to be the performance bottleneck for such a compute heavy function. Some quick %timeit checks with various graphs bear that out locally, but if we wanted to be more rigorous we could add it to the benchmark suite. IMO that's not a blocker for this PR though!

networkx/algorithms/centrality/eigenvector.py

Peiffap · 2024-07-08T14:56:29Z

All sensible suggestions, thanks for taking a look!

It'd definitely be nice to have benchmarks for this kind of change (re: what I said in #7527). I could add a simple benchmark right now, but I think there might have to be a reorganisation of how benchmarking is done at some point in the near future if performance is something you want to prioritize.

Peiffap · 2024-07-08T15:04:35Z

I added a simple benchmarking function. I didn't test this locally (it's a pain getting it to work without conda or mamba installed...), so we should make sure it's working as expected before merging

eriknw · 2024-07-09T07:46:35Z

FYI: somewhat related, we have, and are, exploring caching results of functions (see #7283). For example, this can allow us to compute connectedness of a graph only once, which may help mitigate the performance penalty of checking connectedness (and other such properties) here and elsewhere.

At PyConUS @dschult, @rlratzel, @MridulS, and I discussed that caching of function results should be added narrowly, incrementally, and as use cases arise, unlike what was done in #7283. Hence maybe caching could begin with e.g. is_connected, is_strongly_connected, is_weakly_connected, number_connected_components, number_strongly_connected_components, number_weakly_connected_components, or just some of these.

This doesn't change anything that needs done in this PR. It seems like a good idea to me to check for properties such as connectedness to ensure that an algorithm is well-defined for the input graph.

networkx/algorithms/centrality/eigenvector.py

rossbar · 2024-07-11T10:28:17Z

benchmarks/benchmarks/benchmark_algorithms.py

+    def time_eigenvector_centrality_numpy(self, graph):
+        # Added to ensure the connectivity check doesn't affect
+        # performance too much (see gh-6888, gh-7549).
+        _ = nx.eigenvector_centrality_numpy(self.graphs_dict[graph])
+


Thanks for taking a stab at this @Peiffap + the feedback on benchmarking (it's certainly something that needs to be documented better).

I had been doing this locally as well, and would be in slight favor of not adding the benchmark in this way for the simple reason that one of the example graphs in this benchmark (i.e. the drug interaction graph) is disconnected, which means running the benchmark with this input fails. Overall this doesn't really have an impact on the benchmarking suite (it ignores the failure and continues running) but I'd be in favor of keeping things "clean" for now.

However - it's not a blocker, I'm perfectly happy for it to go in now and just creating a follow-up issue to keep track of re-doing it!

Thanks for taking a look and testing it out locally! That's... annoying, but I'm +1 on not introducing failures into the benchmarking suite. I'm unsure of what the best solution would be then, but here are some options:

Leave things as they are (but we agree that's not ideal)

No benchmarking for now (also not ideal)

Make a separate class with appropriate setup (i.e. no disconnected graphs) to benchmark eigenvector_centrality_numpy(). This touches on a more general question of how to organize benchmarks. One way (that I haven't thought through, though) would be to somewhat mirror the structure of the networkx/ folder; if nothing else, it'd at least be consistent and easy to navigate. I feel like if this is the option we go with, then it probably makes sense to not include it as part of this PR, but to make it part of a bigger "tracking benchmarking" issue.

Explicitly check whether the graph is the drug interaction network, and skip the benchmark if it is. It looks like asv has decorators for this.

I think option 3 is the one we should be looking at when it comes to the future of the package, but as a more immediate fix, I like option 4 best.

- Add exception for disconnected graphs - Add explanation for inconsistency Fixes networkx#6888. Co-authored-by: Dan Schult <dschult@colgate.edu> Co-authored-by: Joye Mang <joyemang33@users.noreply.github.com>

- Handle directed graphs correctly - Rewrite docs slightly - Add tests that `AmbiguousSolution` is correctly raised both for directed and undirected graphs

- Change name in connectivity check - Avoid local var binding for error message Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>

Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>

rossbar · 2024-09-06T14:10:55Z

I went ahead and split up the benchmark in a naive way in order to unstick this - maybe @MridulS can doublecheck that this makes sense. Anyways, my approval of the changes themselves (regardless of benchmarking changes) stands!

MridulS · 2024-09-07T15:27:15Z

There is an error with the benchmark action, complaining about libmambapy. But that's not the blocker here :)

jarrodmillman added the type: Bug fix label Jul 4, 2024

rossbar approved these changes Jul 8, 2024

View reviewed changes

networkx/algorithms/centrality/eigenvector.py Outdated Show resolved Hide resolved

networkx/algorithms/centrality/eigenvector.py Outdated Show resolved Hide resolved

rossbar reviewed Jul 8, 2024

View reviewed changes

networkx/algorithms/centrality/eigenvector.py Outdated Show resolved Hide resolved

Peiffap commented Jul 9, 2024

View reviewed changes

networkx/algorithms/centrality/eigenvector.py Outdated Show resolved Hide resolved

rossbar reviewed Jul 11, 2024

View reviewed changes

Peiffap and others added 8 commits September 6, 2024 07:01

Raise AmbiguousSolution in eigenvector_centrality_numpy

5c03bfc

- Add exception for disconnected graphs - Add explanation for inconsistency Fixes networkx#6888. Co-authored-by: Dan Schult <dschult@colgate.edu> Co-authored-by: Joye Mang <joyemang33@users.noreply.github.com>

Add exception tests

d07d47d

- Handle directed graphs correctly - Rewrite docs slightly - Add tests that `AmbiguousSolution` is correctly raised both for directed and undirected graphs

Meld with existing explanation

3ece20c

Apply suggestions from code review

6828a4d

- Change name in connectivity check - Avoid local var binding for error message Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>

Revert reference indent changes for eigenvector_centrality_numpy

96770d7

Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>

Add simple benchmark for eigenvector_centrality_numpy

aa7a90c

Avoid using connected() for boolean check

b9acde6

Move ecn benchmark to class w/ only connected inputs.

10e1104

rossbar force-pushed the gilles/improve-ecn branch from a247744 to 10e1104 Compare September 6, 2024 14:02

MridulS added the run:benchmark label Sep 7, 2024

MridulS added this to the 3.4 milestone Sep 7, 2024

MridulS merged commit b73297c into networkx:main Sep 7, 2024

Peiffap deleted the gilles/improve-ecn branch September 11, 2024 13:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Only allow connected graphs in `eigenvector_centrality_numpy` #7549

Only allow connected graphs in `eigenvector_centrality_numpy` #7549

Uh oh!

Peiffap commented Jul 4, 2024

Uh oh!

rossbar left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Peiffap commented Jul 8, 2024

Uh oh!

Peiffap commented Jul 8, 2024

Uh oh!

eriknw commented Jul 9, 2024

Uh oh!

Uh oh!

rossbar Jul 11, 2024

Uh oh!

Peiffap Jul 11, 2024

Uh oh!

rossbar commented Sep 6, 2024

Uh oh!

MridulS commented Sep 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

Uh oh!

Only allow connected graphs in eigenvector_centrality_numpy #7549

Only allow connected graphs in eigenvector_centrality_numpy #7549

Uh oh!

Conversation

Peiffap commented Jul 4, 2024

Uh oh!

rossbar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Peiffap commented Jul 8, 2024

Uh oh!

Peiffap commented Jul 8, 2024

Uh oh!

eriknw commented Jul 9, 2024

Uh oh!

Uh oh!

rossbar Jul 11, 2024

Choose a reason for hiding this comment

Uh oh!

Peiffap Jul 11, 2024

Choose a reason for hiding this comment

Uh oh!

rossbar commented Sep 6, 2024

Uh oh!

MridulS commented Sep 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

Only allow connected graphs in `eigenvector_centrality_numpy` #7549

Only allow connected graphs in `eigenvector_centrality_numpy` #7549