[파이썬] 매칭의 최소화(*?, +?, ??, {m,n}?)

In regular expressions, there are certain quantifiers that can be used to control the matching behavior. These quantifiers help in specifying the number of occurrences of a pattern that should be matched.

In this blog post, we will explore the concept of “매칭의 최소화” or “minimization of matching” in Python regular expressions. We will discuss four quantifiers: *?, +?, ??, and {m,n}?, and understand how they differ from their greedy counterparts.

Quantifiers

  1. *? - The *? quantifier allows for a non-greedy match of zero or more occurrences of the preceding pattern. It matches as few characters as possible.

  2. +? - The +? quantifier is similar to *?, but it matches one or more occurrences of the preceding pattern in a non-greedy fashion.

  3. ?? - The ?? quantifier is also a non-greedy quantifier. It matches zero or one occurrence of the preceding pattern.

  4. {m,n}? - The {m,n}? quantifier is another non-greedy quantifier. It matches from m to n occurrences of the preceding pattern, with the minimal number of matches possible.

Example

Let’s consider an example where we have a string that contains repeating pattern ab.

import re

string = "ababab"

# Greedy matching
greedy_match = re.findall(r'ab*', string)
print("Greedy match:", greedy_match)

# Non-greedy matching
non_greedy_match = re.findall(r'ab*?', string)
print("Non-greedy match:", non_greedy_match)

Output:

Greedy match: ['ab', 'ab', 'ab']
Non-greedy match: ['a', 'a', 'a']

In this example, we are trying to match the pattern ab in the string “ababab”. The greedy match ab* matches the longest possible substring starting with ab, resulting in ab, ab, ab. However, the non-greedy match ab*? matches the shortest possible substring starting with ab, resulting in a, a, a.

Conclusion

By using the non-greedy quantifiers *?, +?, ??, and {m,n}?, we can minimize the matching behavior of regular expressions and match as few characters as possible. This can be useful in situations where we want to match the smallest possible substring or in cases where we want to optimize performance by avoiding unnecessary matches.

Understanding and utilizing these quantifiers can significantly enhance the flexibility and efficiency of regular expressions in Python.