Skip to content

Conversation

@vigna
Copy link
Contributor

@vigna vigna commented Jan 10, 2022

This PR implements the harmonic diameter, a measure of connectivity.

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this. It looks well implemented.

Can you add the function name to doc/reference/algorithms/distance_measures.rst
so the sphinx auto-documentation builder can find the function.

Then you can check the documentation in the details of the Circle-CI "artifact" test.
:} Thanks!

@vigna
Copy link
Contributor Author

vigna commented Jan 11, 2022

Aha! I was wondering how to see the generated docs, indeed.

@vigna
Copy link
Contributor Author

vigna commented Jan 11, 2022

Well, line added, test completed, but under "Artifacts" I have absolutely nothing. 🤷🏻‍♂️

@vigna
Copy link
Contributor Author

vigna commented Jan 11, 2022

Ah OK, it finally appeared. I had to be patient. Docs are OK.

@vigna
Copy link
Contributor Author

vigna commented Jan 11, 2022

If there is interest, I might commit harmonic eccentricity, too, which follows the same idea (harmonic mean of distances from x to all other nodes).

@dschult dschult added this to the networkx-2.7 milestone Jan 11, 2022
Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.
Thanks for this!

Harmonic Eccentricity is harder for me to find in the literature. And it doesn't seem to be in the reference you cite in this PR. Can you point to some literature?

@vigna
Copy link
Contributor Author

vigna commented Jan 12, 2022

Actually, no, I can't, it's a fairly obvious "harmonic" reworking of eccentricity. You can use harmonic means in closeness, and you get harmonic centrality, in diameter computation, and you get harmonic diameter, or in eccentricity computation, and you get harmonic eccentricity. Harmonic diameter is the harmonic mean of harmonic eccentricities, exactly like diameter is the maximum of eccentricities. But it might be premature. I should write a paper about this :).

@dschult
Copy link
Member

dschult commented Jan 13, 2022

Hee hee... Yes! It sounds like it is time to write a paper! :}

sum_invd = 0
for n in G:
if sp is None:
length = nx.single_source_shortest_path_length(G, n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would not work for weighted graphs, is harmonic_diameter defined for a weighted graph?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK I see I've got confused between single_source_shortest_path_length and shortest_path_length. The implementation as such is correct for distances defined as shortest paths. You can use another notion of distance—nothing prevents you from taking the harmonic mean. I thought that NetworkX would automatically "understand" whether a graph was weighted and compute distances accordingly (as diameter and eccentricity are already implemented in that way), but now I'm seeing that, say, harmonic_centrality has an explicitly distance parameter.

It seems that different people implemented different APIs in NetworkX for using a non-standard metric: it is passed via a dictionary of distances in diameter, and via a property used to detect edge weights in harmonic_centrality.

I'd stick with the current implementation as it mimics that of diameter, but you guys know better than me.

Copy link
Member

@dschult dschult Jan 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this module (distance_measures.py) could use a pass through to update all functions to allow weighted graphs. The definition should be straightforward, following from the definition in shortest_path.
I'll open an issue for this suggestion separate from this PR. (see #5257 )

@jarrodmillman
Copy link
Member

Is it worth adding a brief Note mentioning how this relates to harmonic centrality:
https://networkx.org/documentation/latest/reference/algorithms/generated/networkx.algorithms.centrality.harmonic_centrality.html

@dschult
Copy link
Member

dschult commented Jan 16, 2022

I thought I had found Harmonic Diameter discussed on Wikipedia. But apparently not.
So, I am not as sure about this PR as I was. And I have found the cited article
(well -- the arxiv version) and it doesn't contain the word "diameter".

Do you have a better reference for harmonic diameter?

@vigna
Copy link
Contributor Author

vigna commented Jan 18, 2022

Yeah, I was dubious whether to add a note. I did now. In the paper the harmonic diameter is called "connectivity length". But the name "harmonic diameter" is standard in the theory of metric spaces for the same kind of harmonic mean.

"If we want to quantify the typical separation between two vertices in the graph, a good measure is
given by the connectivity length D(G): this is the fixed distance to which we have to set every two vertices in the graph in order to maintain its performance. Interestingly enough, the connectivity length of the graph is not the arithmetic
mean but the harmonic mean of all the distances:"

You can see the metric-space definition at p. 4 of https://www.researchgate.net/profile/Huse_Fatkic/project/The-Theory-of-Fractional-Market-Shares/attachment/57c0699508aec1ae131051f8/AS:399366213914624@1472227733721/download/Huse+Fatkic%2C+Slobodan+Sekulovic%2C+Fatih+Destovic%2C+and+Hana++Fatkic%2C+June+6%2C++2014.pdf?context=ProjectUpdatesLog

@jarrodmillman
Copy link
Member

My main concern with using diameter is that I've only seen it in the context of a max. This doesn't seem to be using a max.

The definition of harmonic diameter in the paper you link to, they do take a max.

@jarrodmillman
Copy link
Member

I guess the max sort of goes away (since there is just one element in the set you are taking the max of) when k = |V| as in your case.

@vigna
Copy link
Contributor Author

vigna commented Jan 19, 2022

Yes. So, you could define the standard diameter as the maximum distance between k points for every choice of k points and let k go to infinity, as they do for the harmonic diameter. But that would overkill, because in the case of the standard diameter after k=2 you don't get a different value.

For the harmonic diameter things are a bit more complicated (and interesting). The harmonic diameter (in continuous geometric) is obtained by maximizing over the harmonic mean of all distances between all subsets of k points, and then letting k go to infinity.

But for graphs, the maximum value is attained for k equal to the number of vertices—that is, by taking the harmonic mean of distances between all nodes. So the harmonic diameter defined in the documentation (the harmonic mean of all distances) is the result of applying the general definition to graphs.

Marchiori and Latora got exactly the same definition using a completely different motivation.

In the classic (and monumental) survey by Newman, "The Structure and Function of Complex Networks", harmonic diameter is commented as follows:

"An alternative and perhaps more satisfactory approach is to define 𝓁� to be the “harmonic mean” geodesic distance between all pairs, i.e., the reciprocal of the average of the reciprocals. [...] This approach has been adopted only occasionally in network calculations [259], but perhaps should be used more often."

@dschult
Copy link
Member

dschult commented Jan 25, 2022

So, if understand the discussion, this function computes a kind of average path length where by average we use the harmonic mean instead of the arithmetic mean.

What do people think of adding this feature by adding a keyword argument to the average_shortest_path_length function? The keyword could be "mean" or "harmonic", and maybe we could find a way to allow other strange measures of average like "median"? OK... too much... :}

@vigna
Copy link
Contributor Author

vigna commented Jan 25, 2022

Incidentally, that function is really weird. It states a formula but does not compute that formula. If I apply it to a directed graph with a single arc 0→1, it returns 1/2, but the correct value is ∞.

From the code, it appears that the actual formula computed is the sum of finite lengths, divided by n/(n-1). This does not make any sense if the graph is not strongly connected, as you're adding a certain number of lengths, and then dividing by a different amount.

My guess is that whoever put the check for "weakly connected" did not understand that the test had to be for "strongly connected". Then the code makes sense because if there are non-reachable pairs there will be an exception, so the mean will again make sense.

Personally, I'd prefer the harmonic diameter code to be distinct from this—it really needs fixing.

@vigna
Copy link
Contributor Author

vigna commented Mar 2, 2022

Sorry—is there anything else I have to do? Or is this going to be included in NetworkX at some point?

@vigna
Copy link
Contributor Author

vigna commented Sep 16, 2022

OK, so now this is conflicting. Should I make the effort of solving the (trivial) conflicts or are you guys not interested in this new code?

@vigna
Copy link
Contributor Author

vigna commented Sep 29, 2022

Incidentally, I realized I'm using the term probably because I found it in this 2003 paper by Fogaras:

Fogaras, D. (2003). Where to Start Browsing the Web?. In: Böhme, T., Heyer, G., Unger, H. (eds) Innovative Internet Community Systems. IICS 2003. Lecture Notes in Computer Science, vol 2877. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39884-4_6

Section 3.2 says “The connectivity is expressed by the harmonic diameter of the Web graph, the harmonic mean of distances between all the pairs of nodes.”

@rossbar rossbar modified the milestones: networkx-3.0, networkx-3.1 Nov 29, 2022
@vigna
Copy link
Contributor Author

vigna commented Mar 6, 2023

I see the milestone have been pushed to 3.1. This does not seem such a difficult PR—one year no hold? 😂

Here are some other papers computing the harmonic diameter on networks:

https://dl.acm.org/doi/10.1145/3132847.3133087
https://arxiv.org/abs/1704.07525
https://doi.org/10.14778/2777598.2777604

@dschult
Copy link
Member

dschult commented Mar 7, 2023

Sorry about the long delay. We should be able to get this going again.

The main issue seemed to be the name harmonic_diameter. We agree that computing average distances using the harmonic formula is a nice option to have. But it is not a diameter (which involves the max of all distances). It is a mean.

I'd like to propose the name harmonic_mean_distance. It'd be good to add this new function to the "See Also" section of the doc_string for the functions average_shortest_path_length. It's possible that we would want it as an option to the average_shortest_path_length function, but I think it is better for now to get it into the distance_measures module.
We can refactor later is desired. And this makes it closer to these distance measures that might be closer to "harmonic" measures if we add those. @jarrodmillman what do you think and are there other issues?

@vigna
Copy link
Contributor Author

vigna commented Mar 7, 2023

Well, there's a lot of literature talking about the "effective diameter" which is the 90th percentile of distances, even if it not a diameter. And as remarked above in functional analysis they say "harmonic diameter", not "harmonic mean of distances". I added a few papers above to show where it is used. It's not exactly commonplace, but it's fairly common among people taking measurements on large networks.

It is possible to add it to average_shortest_path_length with an option, but from my viewpoint it could be confusing that on a non-strongly-connected graph you would get an exception with an option and a result with another. The average path length makes sense only if all paths exists, the harmonic diameter is a more general concept that conflates the number of reachable pairs and the distribution of the set of distances between those pairs.

@jarrodmillman jarrodmillman removed this from the 3.2 milestone Jul 20, 2023
Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make a decision and get this merged. This review proposes language to describe a little more about some hesitations for the name, but in the end it leaves the name harmonic_diameter.

It would be good to eventually make this optionally depend on edge weights using a keyword argument such as "weight=None". But let's do that in another PR. See issue #7624.

I think that resolves all the discussion in this PR about the function and it is ready to merge.

@dschult dschult merged commit 7d22e08 into networkx:main Sep 7, 2024
@dschult
Copy link
Member

dschult commented Sep 7, 2024

Thanks @vigna !!
And sorry for the long review.

@jarrodmillman jarrodmillman added this to the 3.4 milestone Sep 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

5 participants