In this article, we will see how to locate the position of a regex match in a string using the start(), end(), and span() methods of the Python re.Match object.
We will solve the following three scenarios
- Get the start and end position of a regex match in a string
- Find the indexes of all regex matches
- Get the positions and values of each match
Note: Python re module offers us the search(), match(), and finditer() methods to match the regex pattern, which returns us the Match object instance if a match found. Use this Match object to extract the information about the matching string using the start(), end(), and span() method.
These Match object methods are used to access the index positions of the matching string.
start()returns the starting position of the matchend()return the ending position of the matchspan()return a tuple containing the(start, end)positions of the match
Table of contents
Example to get the position of a regex match
In this example, we will search any 4 digit number inside the string. To achieve this, we must first write the regular expression pattern.
Pattern to match any 4 digit number: \d{4}
Steps:
- Search the pattern using the search() method.
- Next, we can extract the match value using
group() - Now, we can use the
start()andend()methods to get the starting and ending index of the match. - Also, we can use the
span()method() to get both start and end indexes in a single tuple.
Access matching string using start(), and end()
Now, you can save these positions and use them whenever you want to retrieve a matching string from the target string. We can use string slicing to access the matching string directly using the index positions obtained from the start(), end() method.
Example
Find the indexes of all regex matches
Assume you are finding all matches to the regular expression in Python, apart from all match values you also want the indexes of all regex matches. In such cases, we need to use the finditer() method of Python re module instead of findall().
Because the findall() method returns all matches in the form of a Python list, on the other hand, finditer() returns an iterator yielding match objects matching the regex pattern. Later, we iterate each Match object to extract all matches along with their positions.
In this example, we will find all 5-letter words inside the following string and also print their start and end positions.
Output
match 1 Jessa start index 0 End index 5 match 2 Kelly start index 20 End index 25 match 3 marks start index 36 End index 41
find all the indexes of all the occurrences of a word in a string
Example
Output
1 st match start index 0 End index 4 2 nd match start index 19 End index 23
Points to be remembered while using the start() method
Since the re.match() method only checks if the regular expression matches at the start of a string, start() will always be zero.
However, the re.search() method scans through the entire target string and looks for occurrences of the pattern that we want to find, so the match may not start at zero in that case.
Now let’s match any ten consecutive alphanumeric characters in the target string using both match() and search() method.
Example
