Restrict to unique distributions in entry points#281
Conversation
…ique distributions. Fixes #280.
|
In this latest commit, I add 2b64fa2 against the main branch to test performance of entry_points before the patch and now after. On my 2019 Macbook Pro, I see 9.5ms as the baseline and 28ms after the patch, a 200% increase. |
|
I'd guess the way to avoid the large performance penalty is along the lines of (simplified here) diff --git i/importlib_metadata/__init__.py w/importlib_metadata/__init__.py
index 4c420a5..2a90b6a 100644
--- i/importlib_metadata/__init__.py
+++ w/importlib_metadata/__init__.py
@@ -590,6 +590,7 @@ class PathDistribution(Distribution):
.joinpath(), __div__, .parent, and .read_text().
"""
self._path = path
+ self._normalized_name = path.stem.partition("-")[0]
def read_text(self, filename):
with suppress(
@@ -648,7 +649,7 @@ def entry_points():
:return: EntryPoint objects for all installed packages.
"""
- unique = functools.partial(unique_everseen, key=operator.attrgetter('name'))
+ unique = functools.partial(unique_everseen, key=operator.attrgetter('_normalized_name'))
eps = itertools.chain.from_iterable(
dist.entry_points for dist in unique(distributions())
)(on top of your PR) Also, orthogonally, entry_points.txt parsing itself can be significantly sped up. I'd guess that best would be a hand-rolled parser as the format is actually much less general than everything ConfigParser supports (I didn't find a spec for entry_points.txt, but can it e.g. ever use https://docs.python.org/3/library/configparser.html#interpolation-of-values?), but a quick fix that already yields large gains (~2x speed up on diff --git i/importlib_metadata/__init__.py w/importlib_metadata/__init__.py
index fac3063..257e19e 100644
--- i/importlib_metadata/__init__.py
+++ w/importlib_metadata/__init__.py
@@ -87,6 +87,10 @@ class EntryPoint(
dist: Optional['Distribution'] = None
+ cp = ConfigParser(delimiters='=')
+ # case sensitive: https://stackoverflow.com/q/1611799/812183
+ cp.optionxform = str
+
def load(self):
"""Load the entry point from its definition. If only a module
is indicated by the value, return that module. Otherwise,
@@ -122,11 +126,9 @@ class EntryPoint(
@classmethod
def _from_text(cls, text):
- config = ConfigParser(delimiters='=')
- # case sensitive: https://stackoverflow.com/q/1611799/812183
- config.optionxform = str
- config.read_string(text)
- return cls._from_config(config)
+ cls.cp.clear()
+ cls.cp.read_string(text)
+ return cls._from_config(cls.cp)
@classmethod
def _from_text_for(cls, text, dist):Meanwhile, a hand-rolled parser such as @classmethod
def _from_text(cls, text):
- config = ConfigParser(delimiters='=')
- # case sensitive: https://stackoverflow.com/q/1611799/812183
- config.optionxform = str
- config.read_string(text)
- return cls._from_config(config)
+ if not text:
+ return
+ group = None
+ for line in filter(None, map(str.strip, text.splitlines())):
+ if line.startswith("["):
+ group = line[1:-1]
+ else:
+ name, value = map(str.strip, line.split("=", 1))
+ yield cls(name, value, group)gives you another nearly 2-fold speedup (it depends on how well you want to handle malformed inputs, of course, and this is assuming we indeed don't need to support interpolation or anything fancy). (edit: actually, half of the hand-rolled implementation could even be shared with the implementation of Distribution._read_sections.) |
Could it fall back to ConfigParser in those cases? |
Yes, but we'd need to actually detect if interpolation is being used (for which I guess |
|
I'm all but certain that interpolation isn't expected. In fact, that's how pkg_resources parses it. So it's definitely safe not to rely on ConfigParser. For this PR, I'm going to accept it as written, and we can work on optimization techniques subsequently. |
Closes #280.