Regular Expressions

Python suspendedfc

position	ease	box	interval	due
front	2.5	0	0	2021-05-11T07:42:36Z

ref

\w (word character) matches **any single letter, number or underscore (same as [a-zA-Z0-9_])***.

Match object

A Match Object is an object containing information about the search and the result.
This object is returned by `re.search()`, `re.match()`, `re.findall()`.

Has propterties and methods

`.span()` returns a tuple containing the start-, and end positions of the match.
`.string` returns the string passed into the function

`.group()` returns the part of the string where there was a match

import re
txt = "My name is Javeed Ali Khan"
result = re.search(r"(\bJ\w+) (\bA[li]{2})", txt)
print(result.span())
print(result.group()) # all match
print(result.group(0)) # all match
print(result.group(1)) # 1st group
print(result.group(2)) # 2nd group
print(result.string)

(11, 21)
Javeed Ali
Javeed Ali
Javeed
Ali
My name is Javeed Ali Khan

re methods 1

ref

re.compile(pattern, flags)	Compile a regular expression of pattern, with flag s
re.match(pattern, string)	Match pattern only at beginning of string
re.search(pattern,string)	Match patterns anywhere in the string
re.split(pattern, string)	Split string by occurences of pattern
re.sub(pttrn_2_repl, repl_with, in_string)

re method Objects 1

match.group(“name”)	Return subgroup “name” of match
match.groups()	Return tuple containing all subgroups of match
match.groupdict()	Return dict containing all named subgroups of match
match.start(group)	Return start index of substring match by group
match.end(group)	Return end index of substring matched by group
match.span(group)	Return 2-tuple start and end indices of group in match

re methods 2

re.fullmatch(pattern, string)	Match pattern if whole string matches regular expression
re.findall(pattern, string)	Return all non-overlapping matches of pattern in string, as a list of strings
re.finditer(pattern, string)	Return an iterator yielding match objects over non-overlapping matches of pattern in string
re.subn(pattern, str2, string)	Replace left most occurrences of pattern in string with str2, but return a tuple of (newstring, # subs made)
re.purge()	Clear the regular expression cache

Difference between match , search and findall

ref

match	search	findall
1. first occurence	1. first occurence	returns all occurences
2. if match found in another line returns null	2. check all lines unline match	returns list of strings
3. returns match object	3. returns match object	or list of tuples of strings not match object

findall example

``` cc_list = ‘‘‘Ezra Koenig <ekoenig@vpwk.com>, Rostam Batmanglij <rostam@vpwk.com>, Chris Tomson <ctomson@vpwk. Bobbi Baio <bbaio@vpwk.com’’’ >>> matched = re.findall(r'\w+\@\w+\.\w+', cc_list) >>> matched [‘ekoenig@vpwk.com’, ‘rostam@vpwk.com’, ‘ctomson@vpwk.com’, ‘cbaio@vpwk.com’] >>> matched = re.findall(r'(\w+)\@(\w+)\.(\w+)', cc_list) >>> matched [(‘ekoenig’, ‘vpwk’, ‘com’), (‘rostam’, ‘vpwk’, ‘com’), (‘ctomson’, ‘vpwk’, ‘com’), (‘cbaio’, ‘vpwk’, ‘com’)] >>> names = [x[0] for x in matched] >>> names [‘ekoenig’, ‘rostam’, ‘ctomson’, ‘cbaio’] ```

Examples

Named group

```(?P<name>regex)```

print("start")
import re
txt = """Javeed Ali Khan Mohammed 2284440597 lisai Taaina Immune by first dose Last update: Tuesday 8 June, 09:42 PM Immune by first dose until 22/10/2021 New Services Display All > COVID-19 Vaccine Certify Mobile Organ Health Donation Passport Number D00 000 of Javeed Ali Khan Mohammed 2284440597 lisai Taaina Immune by first dose """
txt = """Immune by first dose Last update Mon, 14 Jun 12:00 PM O Current Permits Immune by first dose Last update Mon, 14 Jun 12:00 PM O Current Permits"""

# pattern = r"""(?P<day_of_week>(mon|tues|wed(nes)?|thur(s)?|fri|sat(ur)?|sun)(day)?)(?<day_of_month>\s*\d+)(?<month>\s*(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(Nov|Dec)(?:ember)?),\s*)(?p<time>1[0-2]|0?[1-9]):([0-5][0-9]) ?([AaPp][Mm])"""

pattern = r"""(?P<day_of_week>(mon|tues|wed(nes)?|thur(s)?|fri|sat(ur)?|sun)(day)?,?)(?P<day_of_month>\s*\d+)(?P<month>\s*(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(Nov|Dec)(?:ember)?),?\s*)(?P<time>(1[0-2]|0?[1-9]):([0-5][0-9]) ?([AaPp][Mm]))"""
# pattern = r"""((mon|tues|wed(nes)?|thur(s)?|fri|sat(ur)?|sun)(day)?)(\s*\d+)(\s*(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(Nov|Dec)(?:ember)?),\s*)(1[0-2]|0?[1-9]):([0-5][0-9]) ?([AaPp][Mm])"""

result = re.search(pattern, txt, re.IGNORECASE)
print("result is")
print(result)
if result:
    print(result.group("day_of_week"))
    print(result.group("day_of_month"))
    print(result.group("month"))
    print(result.group("time"))

start
result is
<_sre.SRE_Match object at 0x7fb2d4181510>
Mon,
 14
 Jun
12:00 PM

Python suspendedfc#

Match object#

re methods 1#

re method Objects 1#

re methods 2#

Difference between match , search and findall#

Examples#

Named group#