In regular expressions, subpattern groups are used to match and capture specific parts of a string. These groups can be referenced later using backreferences (\1, \2, …).
Let’s take a closer look at how subpattern group references work in Python.
Creating Subpattern Groups
To create a subpattern group, we enclose the desired pattern within parentheses.
For example, let’s say we want to match a word followed by a comma, followed by the same word again. We can create a subpattern group for the word as follows:
import re
pattern = r'(\w+),\1' # Matches a word, followed by a comma, followed by the same word
Using Subpattern Group References
In the above example, the subpattern group (\w+) captures one or more word characters. We then use \1 to reference the captured part.
When we apply this pattern to a string using the re
module, it will find matches where a word is repeated after a comma:
import re
pattern = r'(\w+),\1'
text = "apple,apple banana,banana"
matches = re.findall(pattern, text)
print(matches) # Output: ['apple', 'banana']
The findall
function returns a list of all matches found in the string. In this case, it returns ['apple', 'banana']
, which are the words repeated after a comma.
Multiple Subpattern Group References
We can have multiple subpattern groups in a regular expression and use their respective references (\1, \2, …) in the order they were defined.
Let’s modify our previous example to match two words separated by a comma, where the second word is the reverse of the first:
import re
pattern = r'(\w+),(\w+)\1'
text = "hello,olleh world,dlrow"
matches = re.findall(pattern, text)
print(matches) # Output: [('hello', 'olleh'), ('world', 'dlrow')]
Here, we have two subpattern groups: (\w+) and (\w+). The second group is followed by \1, which references the first group. This will only match if the second word is the reverse of the first, resulting in ['hello', 'olleh']
and ['world', 'dlrow']
as the output.
Conclusion
Subpattern group references (\1, \2, …) in Python regular expressions allow us to capture and reuse specific parts of a string. They provide a powerful way to match patterns that involve repetition or a specific relationship between different parts of the string.