Skip to content

Conversation

@Peiffap
Copy link
Member

@Peiffap Peiffap commented Jul 4, 2024

Fixes #6888.

As discussed in #6888, the SciPy solver does not handle disconnected graphs well, leading to inconsistent results. The proposed solution was to add a check that G is connected, as long as that didn't make the algorithm much slower (for reference, the execution time was usually around 1-2% longer with the check).

This PR:

  • Changes eigenvector_centrality_numpy to only accept connected graphs (strongly connected for directed graphs)
  • Updates the docstring to clearly mention the reason for this choice (i.e. the inconsistency, and its explanation by @dschult)
  • Adds two tests to verify that the function properly raises when given a disconnected graph as input
  • Reorganises the arguments in the docstring to match the arguments of the function
  • Has other very minor changes (improvements?) to the docstring.

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @Peiffap

I wouldn't expect the connectedness-checking to be the performance bottleneck for such a compute heavy function. Some quick %timeit checks with various graphs bear that out locally, but if we wanted to be more rigorous we could add it to the benchmark suite. IMO that's not a blocker for this PR though!

@Peiffap
Copy link
Member Author

Peiffap commented Jul 8, 2024

All sensible suggestions, thanks for taking a look!

It'd definitely be nice to have benchmarks for this kind of change (re: what I said in #7527). I could add a simple benchmark right now, but I think there might have to be a reorganisation of how benchmarking is done at some point in the near future if performance is something you want to prioritize.

@Peiffap
Copy link
Member Author

Peiffap commented Jul 8, 2024

I added a simple benchmarking function. I didn't test this locally (it's a pain getting it to work without conda or mamba installed...), so we should make sure it's working as expected before merging

@eriknw
Copy link
Contributor

eriknw commented Jul 9, 2024

FYI: somewhat related, we have, and are, exploring caching results of functions (see #7283). For example, this can allow us to compute connectedness of a graph only once, which may help mitigate the performance penalty of checking connectedness (and other such properties) here and elsewhere.

At PyConUS @dschult, @rlratzel, @MridulS, and I discussed that caching of function results should be added narrowly, incrementally, and as use cases arise, unlike what was done in #7283. Hence maybe caching could begin with e.g. is_connected, is_strongly_connected, is_weakly_connected, number_connected_components, number_strongly_connected_components, number_weakly_connected_components, or just some of these.

This doesn't change anything that needs done in this PR. It seems like a good idea to me to check for properties such as connectedness to ensure that an algorithm is well-defined for the input graph.

Comment on lines 34 to 38
def time_eigenvector_centrality_numpy(self, graph):
# Added to ensure the connectivity check doesn't affect
# performance too much (see gh-6888, gh-7549).
_ = nx.eigenvector_centrality_numpy(self.graphs_dict[graph])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a stab at this @Peiffap + the feedback on benchmarking (it's certainly something that needs to be documented better).

I had been doing this locally as well, and would be in slight favor of not adding the benchmark in this way for the simple reason that one of the example graphs in this benchmark (i.e. the drug interaction graph) is disconnected, which means running the benchmark with this input fails. Overall this doesn't really have an impact on the benchmarking suite (it ignores the failure and continues running) but I'd be in favor of keeping things "clean" for now.

However - it's not a blocker, I'm perfectly happy for it to go in now and just creating a follow-up issue to keep track of re-doing it!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look and testing it out locally! That's... annoying, but I'm +1 on not introducing failures into the benchmarking suite. I'm unsure of what the best solution would be then, but here are some options:

  1. Leave things as they are (but we agree that's not ideal)
  2. No benchmarking for now (also not ideal)
  3. Make a separate class with appropriate setup (i.e. no disconnected graphs) to benchmark eigenvector_centrality_numpy(). This touches on a more general question of how to organize benchmarks. One way (that I haven't thought through, though) would be to somewhat mirror the structure of the networkx/ folder; if nothing else, it'd at least be consistent and easy to navigate. I feel like if this is the option we go with, then it probably makes sense to not include it as part of this PR, but to make it part of a bigger "tracking benchmarking" issue.
  4. Explicitly check whether the graph is the drug interaction network, and skip the benchmark if it is. It looks like asv has decorators for this.

I think option 3 is the one we should be looking at when it comes to the future of the package, but as a more immediate fix, I like option 4 best.

Peiffap and others added 8 commits September 6, 2024 07:01
    - Add exception for disconnected graphs
    - Add explanation for inconsistency

Fixes networkx#6888.

Co-authored-by: Dan Schult <dschult@colgate.edu>
Co-authored-by: Joye Mang <joyemang33@users.noreply.github.com>
    - Handle directed graphs correctly
    - Rewrite docs slightly
    - Add tests that `AmbiguousSolution` is correctly raised
      both for directed and undirected graphs
- Change name in connectivity check
- Avoid local var binding for error message

Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>
Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>
@rossbar
Copy link
Contributor

rossbar commented Sep 6, 2024

I went ahead and split up the benchmark in a naive way in order to unstick this - maybe @MridulS can doublecheck that this makes sense. Anyways, my approval of the changes themselves (regardless of benchmarking changes) stands!

@MridulS
Copy link
Member

MridulS commented Sep 7, 2024

There is an error with the benchmark action, complaining about libmambapy. But that's not the blocker here :)

@MridulS MridulS added this to the 3.4 milestone Sep 7, 2024
@MridulS MridulS merged commit b73297c into networkx:main Sep 7, 2024
@Peiffap Peiffap deleted the gilles/improve-ecn branch September 11, 2024 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Unstable and Incorrect "nx.eigenvector_centrality_numpy" result when the graph is disconnected

5 participants