python 正则表达式
python 正则表达式
- re — Regular expression operations — Python 3.9.5 documentation
- Regular Expression HOWTO — Python 3.9.5 documentation
- re 是 python 的标准库,位于
Lib/re.py
- regex 表示 regular expression。正则表达式会被编译成一个字节码,python 底层调用 C 编写的正则引擎执行该字节码。可以优化正则表达式的书写,从而使得字节码的执行速度更快。
quick reference¶
RE = r'pattern'
re_obj = re.compile(RE)
#regex obj or re.(RE, ..)
match
search
findall #return string list, not match obj
finiter
split
sub #替换
subn
#match obj
group #group(0) 返回匹配的整个字符串,group(1) 返回第一个 group
groupdict
start
end
m.span([group])
- 默认 group 为 0,即整个匹配
- 返回 (m.start(group), m.end(group))
- if group did not contribute to the match, this is
(-1, -1)
example¶
match 返回 match object
import re
pattern = '^a...s$'
test_string = 'abyss'
result = re.match(pattern, test_string)
if result:
print("Search successful.")
else:
print("Search unsuccessful.")
findall 返回字符串列表
# Program to extract numbers from a string
import re
string = 'hello 12 hi 89. Howdy 34'
pattern = '\d+'
result = re.findall(pattern, string)
print(result)
# Output: ['12', '89', '34']
group¶
(...)
匹配生成一个 group。并且可以在后面的字符串中使用\number
引用
(?...)
(?:...)
:non-capturing,产生的 group 相当于会被忽略(?P<key>...)
:产生一个命名 group,通过 goup
reference group¶
- 可以在定义 pattern 时引用 group 内容
- 可以在匹配的 match object 里获取 group
- 可以在 sub 替换函数中,引用 group 作为替代字符串
Context of reference to group “quote” | Ways to reference it |
---|---|
in the same pattern itself | - (?P=quote) (as shown)- \1 |
when processing match object m | - m.group('quote') - m.end('quote') (etc.) |
in a string passed to the repl argument of re.sub() |
- \g<quote> - \g<1> - \1 |
example |
- 匹配一个被引号括起来的字符串:
(?P<quote>['"]).*?(?P=quote)
- 替换掉匹配的 group,新值引用了 group
import re
# 示例字符串
text = "Name: John, Age: 30, City: New York"
# 使用正则表达式匹配 Name 和 Age,并替换为新的值
result = re.sub(r"(Name: )(\w+), (Age: )(\d+)", r"\1Alice, \325", text)
print(result)
lookahead assertion¶
(?=...)
Matches if ...
matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov)
will match 'Isaac '
only if it’s followed by 'Asimov'
.
(?!...)
Matches if ...
doesn’t match next. This is a negative lookahead assertion. For example, Isaac (?!Asimov)
will match 'Isaac '
only if it’s not followed by 'Asimov'
.
example
其它¶
VERBOSE¶
pat = re.compile(r"""
\s* # Skip leading whitespace
(?P<header>[^:]+) # Header name
\s* : # Whitespace, and a colon
(?P<value>.*?) # The header's value -- *? used to
# lose the following trailing whitespace
\s*$ # Trailing whitespace to end-of-line
""", re.VERBOSE)
Greedy vs Non-Greedy vs Possessive¶
- greedy: 默认情况
*, +, ?
都会尽可能匹配多的字符 - non-greedy:使用
?
修饰,即*?
,+?
,??
,匹配尽可能少的字符 - possessive:匹配尽可能多,并且不建立 backtrace 点,可能导致匹配失败。
- 比如
aaaa
,a*a
中的a*
匹配到第 4 个 a 后,会导致 a 匹配失败,因此会回溯。最终a*
匹配 3 个 a,最后一个 a 单独匹配。 - 但是
a*+a
没有这个回溯过程,导致表达式匹配失败
- 比如