Image

Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Why does my regex "match" at the beginning of the string, but not at the end?

+5
−0

My impression is that the regexps behave a little bit odd:

>>> import re
>>> r=re.compile("test")
>>> r.match("test")
<re.Match object; span=(0, 4), match='test'>
>>> r.match("1test")
>>> r.match("test2")
<re.Match object; span=(0, 4), match='test'>
>>> r.match("1test2")
>>> 

I also tried using python-pcre and got the same results.

The regex is just an ordinary word, so I think this should behave like a substring match, or perhaps a full-line match. But it seems to match lines starting with "test", but not lines that have "test" somewhere else.

Why?

History

1 comment thread

About "why" questions in programming (and about the Q&A site model used on Codidact) (1 comment)

2 answers

+6
−0

Yes, re.match matches lines that start with the described pattern:

If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding Match.

To check that the pattern describes the entire line, use re.fullmatch; to do a substring match, use re.search.

There's also a separate heading in the documentation describing these differences.

Besides choosing the operation to do with the regex, you can also use "anchors" for the pattern that only match at the beginning or end of the string. To match at the beginning, use ^ as the first character in the regex; to match at the end, use $ as the last character. These are "zero-width" matches; when the regex engine checks for them, it doesn't associate them with any characters from the input string, but only checks the current position as it's matching.

Why?

Of course we should have different functions to do different things. The remaining question is why we should have a match at all.

The simplest explanation I can think of is that it's easy to implement efficiently, and often useful. In particular, you can easily and efficiently implement fullmatch in terms of match - first match, and then see whether there is anything left in the input string after matching the pattern. But we can't do it the other way around: if we only have fullmatch and want to get the match effect, we can only modify the regex pattern to have "also match any characters after that" (.*), and matching against that takes extra time (or special work to optimize the regex engine).

Meanwhile, a search for a substring must be slower. In the worst case, you basically need to check at every position in the input.

It should also be noted that there isn't an efficient way to check whether the input ends with a regex. That's because matching a regex requires scanning forwards in the string from some starting point, but the regex pattern doesn't match a fixed amount of data. Therefore, if the pattern matches at the end, we don't know where to start looking — we need to do the slow searching operation, and then see if one of those matches is at the end.

History

0 comment threads

+5
−0

The documentation for re.match(...) is explicit that it only matches at position 0.

If you're asking this question, what you probably want is re.search(...) to match at any point within the string.

History

1 comment thread

Thank you very much - exactly, `r.search()` is what I should have used. (1 comment)

Sign up to answer this question »