[파이썬] 서브 패턴 그룹의 참조(1, 2, ...)

01 Sep 2023

python

In regular expressions, subpattern groups are used to match and capture specific parts of a string. These groups can be referenced later using backreferences (\1, \2, …).

Let’s take a closer look at how subpattern group references work in Python.

Creating Subpattern Groups

To create a subpattern group, we enclose the desired pattern within parentheses.

For example, let’s say we want to match a word followed by a comma, followed by the same word again. We can create a subpattern group for the word as follows:

import re

pattern = r'(\w+),\1'  # Matches a word, followed by a comma, followed by the same word

Using Subpattern Group References

In the above example, the subpattern group (\w+) captures one or more word characters. We then use \1 to reference the captured part.

When we apply this pattern to a string using the re module, it will find matches where a word is repeated after a comma:

import re

pattern = r'(\w+),\1'

text = "apple,apple banana,banana"

matches = re.findall(pattern, text)
print(matches)  # Output: ['apple', 'banana']

The findall function returns a list of all matches found in the string. In this case, it returns ['apple', 'banana'], which are the words repeated after a comma.

Multiple Subpattern Group References

We can have multiple subpattern groups in a regular expression and use their respective references (\1, \2, …) in the order they were defined.

Let’s modify our previous example to match two words separated by a comma, where the second word is the reverse of the first:

import re

pattern = r'(\w+),(\w+)\1'

text = "hello,olleh world,dlrow"

matches = re.findall(pattern, text)
print(matches)  # Output: [('hello', 'olleh'), ('world', 'dlrow')]

Here, we have two subpattern groups: (\w+) and (\w+). The second group is followed by \1, which references the first group. This will only match if the second word is the reverse of the first, resulting in ['hello', 'olleh'] and ['world', 'dlrow'] as the output.

Conclusion

Subpattern group references (\1, \2, …) in Python regular expressions allow us to capture and reuse specific parts of a string. They provide a powerful way to match patterns that involve repetition or a specific relationship between different parts of the string.