Re-use compiled regex for block level checks#1169
Re-use compiled regex for block level checks#1169waylan merged 2 commits intoPython-Markdown:masterfrom
Conversation
|
Please edit By the way, thanks for the change. Compiling regular expressions is almost always the right thing to do. |
|
@mitya57 thanks for pinging me, I'll get that fixed. I always forget how the changelogs work. |
Sorry for jumping in but this part got me curious. As far as I know CPython will always cache[1] the compiled regexps internally so using compile and storing globally should not be the right thing to do (maybe unless there are enough different regexps in the program that the internal cache is too small to hold all of them). I am a bit surprised that you got a performance improvement from this change. Can you confirm that the slowness initially reported is fixed after this change was applied? |
|
Python-Markdown itself has around 70 various regular expressions. Small projects are unlikely to reach that limit. But projects may import other modules with regular expressions, not only markdown. And in such case this limit will be reached sooner or later. So I think even if compiling is not always useful, in some cases it is, and because it causes no harm, why not use it. |
Problem
This page takes several seconds to load:
Cause
We've hit a somewhat exceptional case in that there are 76+ different markdown documents to render to HTML for that page (the author of the mod has been very prolific with creating releases, see the "changelog" list at the bottom).
With profiling enabled, it turns out that about one second is spent (re-)compiling a regex inside of
isblocklevel, once per markdown document:Other ideas
We (site developers) will probably take a number of other actions to address this slowness, such as paginating the releases and pre-rendering the markdown to HTML when saving to SQL rather than at render. But that regex still represents some low-hanging fruit for other users with many documents to render.
Changes
Now that regex is compiled once at startup and re-used in compiled form as needed. Should speed up rendering multiple documents modestly.
FYI to @DasSkelett, going to see how this goes.