### Regular Expression

In [1]:
import re

In [13]:
# `match()` - checks for a match at the beginning of the string
# `search()` - checks for text anywherer in the string, both of these return a boolean

text = "This is a good day"

if re.search("bad", text):
    print("it's a good day")
else:
    print("Alas :(")


Alas :(


In [16]:
# `findall()` - finds all non-overlapping matches of a pattern and returns then in a list
# `split()` - splits a string at each match of a pattern and returns each split as elements of a list

text = "Amy works diligently. Amy gets good grades. Our student Amy is successful"

re.split("Amy ", text)

['', 'works diligently. ', 'gets good grades. Our student ', 'is successful']

In [17]:
# How many times have we talked about Amy?
re.findall("Amy", text)

['Amy', 'Amy', 'Amy']

### Anchors  
Anchors specify the start ^ and/or the end \$  of the string you are trying to match.   
If you put ^ before a string, it means that the text the regex processor retrieves must start with the string you specify.   
If you put the $ after a string, it means the text the regex retrieves must end with the string.  

In [35]:
# An re.Match object always have a boolean value of true if found; Can evaluate it with an if statement. span tells you where the text matched.
if re.search("^Amy", text):
    print("yes, there are occurences that start with Amy")
else:
    print("Amy not found at the beginning")

yes, there are occurences that start with Amy


In [37]:
if re.search("successful$", text):
    print("yes, there are occurences that end with successful")
else:
    print("successful not found at the end")

yes, there are occurences that end with successful


### Patterns and Character Classes

In [22]:
grades = "ACAAAABCBCBAA"

re.findall("B", grades)

['B', 'B', 'B']

In [30]:
# If we want to find the number of As and Bs, we need to use square brackets instead of "AB"
re.findall("[AB]", grades)

AttributeError: 'list' object has no attribute 'len'

In [40]:
# You can include a range of characters, which are ordered alphnumerically. For instance, if we want to refere to all lower case letters we could use [a-z]. Lets build a regex to parse out all instances where this student recevied and A followed by a B or C.
print(re.findall("[A][B-C]", grades))

print(re.findall("AB|AC", grades))

['AC', 'AB']
['AC', 'AB']


In [38]:
# Findall AC or BCs
re.findall("[A|B][C]", grades)

['AC', 'BC', 'BC']

In [41]:
# Find everything that is not an A
re.findall("[^A]", grades)

['C', 'B', 'C', 'B', 'C', 'B']

In [42]:
# find everything that is not an A or C
re.findall("[^A|C]", grades)

['B', 'B', 'B']

In [43]:
# What does this evaluate to? It *should* evaluate to anything that does not contain As but is at the beginning of the string, which is nothing as the string starts with A.
# Match any value at the beginning of the string that is not an A.
re.findall("^[^A]", grades)

[]

### Quantifiers
Number of times you would like a pattern to be matched in order to match. The most basic is quantifier is expressed as e{m, n} where e is the expression or character to be matched, m is the number of minimum times it should occur, and n is the max times it should occur.

In [58]:
# What if we want to know how many times has a student been on a back-to-back A streak?
# Make sure there is no space between the numbers in the brackets!!!
print(re.findall("A{2,10}", grades))
grades

['AAAA', 'AA']


'ACAAAABCBCBAA'

In [67]:
# Note -- that if we only want to count the exact number of back-to-back AAs, then we'd need to use a different syntax
print(re.findall("A{2,2}", grades) == re.findall("A{1,1}A{1,1}", grades) == re.findall("AA", grades) == re.findall("A{2}", grades))

True


In [69]:
# could look for a decreasing trends in grades
re.findall("A{1,10}B{1,10}C{1,10}", grades)

['AAAABC']

In [76]:
# Import CSV and use other quantifiers
# * is used for 0 or more times
# ? is used for 1 or more times
# + is used for 1 or more times

with open("../Data/FERPA Guidelines.txt", "r") as file:
    wiki = file.read()

print(wiki)

Overview[edit]
FERPA gives parents access to their child's education records, an opportunity to seek to have the records amended, and some control over the disclosure of information from the records. With several exceptions, schools must have a student's consent prior to the disclosure of education records after that student is 18 years old. The law applies only to educational agencies and institutions that receive funds under a program administered by the U.S. Department of Education.

Other regulations under this act, effective starting January 3, 2012, allow for greater disclosures of personal and directory student identifying information and regulate student IDs and e-mail addresses.[2] For example, schools may provide external companies with a student's personally identifiable information without the student's consent.[2]

Examples of situations affected by FERPA include school employees divulging information to anyone other than the student about the student's grades or behavior,

In [87]:
# How would we get all the headers? We know the headers are all following by [edit]

# This is a clunky way of grabbing the text that has "[edit] at the end of it, focusing only on retreiving letters"
re.findall(r"[a-zA-Z]{1,100}\[edit\]", wiki)



['Overview[edit]', 'records[edit]', 'records[edit]']

In [89]:
# We can use \w to match any letters digits or numbers
# \w is a metacharacter
re.findall(r"[\w]{1,100}\[edit\]", wiki)

['Overview[edit]', 'records[edit]', 'records[edit]']

In [93]:
# Probably what we want to is match the words not the letters. We can use the * to match any word character 0 or more times. 
re.findall(r"[\w]*\[edit\]", wiki)

['Overview[edit]', 'records[edit]', 'records[edit]']

In [94]:
# Match word or space characters any number of times. Notice that we included a space after the [\w ] entry. Subtle, but effective. 
re.findall(r"[\w ]*\[edit\]", wiki)

['Overview[edit]',
 'Access to public records[edit]',
 'Student medical records[edit]']

In [99]:
# So this gives us a list of titles with [edit] remaining in the string, what if we loop through the list and remove these portions?
for title in re.findall(r"[\w ]*\[edit\]", wiki):
    print(re.sub(r"\[edit\]", '', title))

Overview
Access to public records
Student medical records


### GROUPS
You can match patterns at the same time using groupings. To group patterns together you can use paratheses. 


In [112]:
re.findall(r"([\w ]*)(\[edit\])", title)


[('Student medical records', '[edit]')]

In [120]:
# To get a list of match objects, we can use the finditer() 
for items in re.finditer(r"([\w ]*)(\[edit\])", wiki):
    print(items.groups())

('Overview', '[edit]')
('Access to public records', '[edit]')
('Student medical records', '[edit]')


In [121]:
# We see here that the groups() method returns a tuple of the group, we can get individual grouping using group(number). where group(0) is the whole match and each other number is a portion of the match in which we are interested.
# So in this case, we'd want to return the 1 group which is just the titles from the wiki article. 
for items in re.finditer(r"([\w ]*)(\[edit\])", wiki):
    print(items.group(1))

Overview
Access to public records
Student medical records


In [127]:
# Makes more sense to give the results a name so it's easier to discern what we are returning. We can give them a label and use a dictionary. We use the syntax(?P<name>), where the parethesis starts the group, the ?P indicates that this in an extension to the basic reges, and the <name> is the dictionary we want to use wrapped in <>
for items in re.finditer(r"(?P<title>[\w ]*)(?P<edit_link>\[edit\])", wiki):
    print(items.groupdict()['title'])
    #print(items.groupdict()['edit_link'])

Overview
[edit]
Access to public records
[edit]
Student medical records
[edit]


In [128]:
print(items.groupdict())

{'title': 'Student medical records', 'edit_link': '[edit]'}


### Look Ahead and Look Behind
The pattern being given to the regex engine is for the text either before or after the text we are trying to isolate.
We have been throwing the [edit] portion away, but we could use them to match but not capture them then we could put them in a group and use look ahead instead with the ?= syntax

In [133]:
# For the code below, we are matching on two groups --> captured in the paratheses
# the 1st group is named "title" and is matching any any amount of whitespace or regular word characters
# the 2nd group is maching [edit] but is not returning this value in our match output object, it's just using it as a "look ahead"

for items in re.finditer(r"(?P<title>[\w ]+)(?=\[edit\])", wiki):
    print(items)

<re.Match object; span=(0, 8), match='Overview'>
<re.Match object; span=(2715, 2739), match='Access to public records'>
<re.Match object; span=(3692, 3715), match='Student medical records'>


In [130]:
print(items.groupdict())

{'title': ''}


In [138]:
# Let's look at some other data
with open("../Data/Buddhist Notes.txt") as file:
    buddhist = file.read()
    
print(buddhist)

Buddhist universities and colleges in the United States
From Wikipedia, the free encyclopedia
Jump to navigationJump to search

This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Buddhist universities and colleges in the United States" – news · newspapers · books · scholar · JSTOR (December 2009) (Learn how and when to remove this template message)
There are several Buddhist universities in the United States. Some of these have existed for decades and are accredited. Others are relatively new and are either in the process of being accredited or else have no formal accreditation. The list includes:

Dhammakaya Open University – located in Azusa, California, part of the Thai Wat Phra Dhammakaya[1]
Dharmakirti College – located in Tucson, Arizona Now called Awam Tibetan Buddhist Institute (http://awaminstitute.org/)
Dharma Realm Buddhist Univers

In [149]:
# We can see that the basic pattern of the universities follows a pattern university name, then -, then the words located in, followed by the city and state

# We can use the verbose mode to write clean, multline regex statemenst
patterns = r"""
(?P<title>.*)       # The university -- matching any number of chacters (. is for any single character, * is 0 or more times so .* gives you any number of characters)
(-\ located\ in\ )  # an indicator of the location
(?P<city>\w*)       # Ths city, any number of letters, digits or numbers
(,\ )               # seperator for the state
(?P<state>\w*)      # the state of the city
"""

patterns_cgp = r"""
(?P<title>.*?)            # Title (non-greedy)
(\s*-\s*located\s+in\s+)    # Literal ' - located in ' with flexible spacing
(?P<city>[\w\s]+?)        # City name (may include spaces)
(\),\s+)                      # Separator
(?P<state>\w+)            # State abbreviation or name
"""

# Now we can call finditer and pass the re.VERBOSE flag as the last parameter, this makes it easier to understand a large regex
for item in re.finditer(patterns_cgp, buddhist, re.VERBOSE):
    print(items.groupdict())

In [152]:
with open("../Data/NY Times Health.txt") as file:
    nyt = file.read()

print(nyt)

548662191340421120|Sat Dec 27 02:10:34 +0000 2014|Risks in Using Social Media to Spot Signs of Mental Distress http://nyti.ms/1rqi9I1
548579831169163265|Fri Dec 26 20:43:18 +0000 2014|RT @paula_span: The most effective nationwide diabetes prevention program you've probably never heard of:  http://newoldage.blogs.nytimes.com/2014/12/26/diabetes-prevention-that-works/
548579045269852161|Fri Dec 26 20:40:11 +0000 2014|The New Old Age Blog: Diabetes Prevention That Works http://nyti.ms/1xm7fTi
548444679529041920|Fri Dec 26 11:46:15 +0000 2014|Well: Comfort Casseroles for Winter Dinners http://nyti.ms/1xTNoO0
548311901227474944|Fri Dec 26 02:58:39 +0000 2014|High-Level Knowledge Before Veterans Affairs Scandal http://nyti.ms/13yCpvS
548305625449787392|Fri Dec 26 02:33:42 +0000 2014|Your Money: Affordable Care Act’s Tax Effects Now Loom for Filers http://nyti.ms/13yAtUf
548283182853160960|Fri Dec 26 01:04:32 +0000 2014|Well: Christmas in the Hospital http://nyti.ms/1vtPNcm
548278414504108033

In [168]:
# So we can see there tweets; Let's get a list of all the hastags in the data set

# We want to include the hash sign first, then any number of alphanumeric characters. end when we see some whitespace
# whitespace --> \s whitespace 
# include the hashtag --> [\#]

pattern = r'(#[\w]*)(?=\s)' # So start with hastag, then capture any number of words or digits, capture any number of times (*) not using a plus becuase it requires at least 1 of each, and then use a lookahead to ?=\s the first space

for item in re.finditer(pattern, nyt):
    print(item.group())

#askwell
#pregnancy
#Colorado
#VegetarianThanksgiving
#FallPrevention
#Ebola
#Ebola
#ebola
#Ebola
#Ebola
#EbolaHysteria
#AskNYT
#Ebola
#Ebola
#Liberia
#Excalibur
#ebola
#Ebola
#dallas
#nobelprize2014
#ebola
#ebola
#monrovia
#ebola
#nobelprize2014
#ebola
#nobelprize2014
#Medicine
#Ebola
#Monrovia
#Ebola
#smell
#Ebola
#Ebola
#Ebola
#Monrovia
#Ebola
#ebola
#monrovia
#liberia
#benzos
#ClimateChange
#Whole
#Wheat
#Focaccia
#Tomatoes
#Olives
#Recipes
#Health
#Ebola
#Monrovia
#Liberia
#Ebola
#Ebola
#Liberia
#Ebola
#blood
#Ebola
#organtrafficking
#EbolaOutbreak
#SierraLeone
#Freetown
#SierraLeone
#ebolaoutbreak
#kenema
#ebola
#Ebola
#ebola
#ebola
#Ebola
#ASMR
#AIDS2014
#AIDS
#MH17
#benzos


In [None]:
# Rules for writing a regex that will read URLS
# it has to end in .com$
# a dot separates all the strings
# Two dots cannot appear consecutively

('.com$')

In [169]:
import re
string = 'bat, lat, mat, bet, let, met, bit, lit, mit, bot, lot, mot'
result = re.findall('b[ao]t', string)
print(result)

['bat', 'bot']


In [170]:
def l2_dist(a, b):
    result = ((a - b) * (a - b)).sum()
    result = result ** 0.5
    return result 

In [200]:
a = np.random.rand(20, 20)
a.shape

(20, 20)

In [201]:
b = np.random.rand(20, 20)
b.shape

(20, 20)

In [206]:
l2_dist(np.reshape(a, (20*20)), np.reshape(b, (20*20, 1)))
l2_dist(a.T, b.T)


np.float64(7.9007424049265085)

In [207]:
np.linspace(1, 4, 4)

array([1., 2., 3., 4.])

In [212]:
np.random.rand(4).shape

(4,)

In [211]:
np.random.rand(4, 1).shape


(4, 1)

In [215]:
np.arange(1, 4, 1).ndim == 1

True

In [223]:
old = np.array([[1, 1, 1, 1], [1, 1, 1, 1]])
new = old

In [228]:
new[1:3, 1:3]

array([[1, 1]])

In [222]:
new[0, :2]

array([0, 0])

In [221]:
new[0, 0:2] = 0


In [220]:
new

array([[0, 0, 1],
       [1, 1, 1]])

In [231]:
import re
s = 'ACAABAACAAB'
result = re.findall('A{1,2}', s)
len(result)

4

In [233]:
import re
s = 'ACAABAACAAAB'
result = re.findall('A{1,2}', s)
L = len(result)
L

5

In [234]:
a1 = np.random.rand(4)
a2 = np.random.rand(4,1)
a3 = np.array([[1,2,3,4]])
a4 = np.arange(1,4,1)
a5 = np.linspace(1, 4, 4)

In [242]:
a1.shape

(4,)

In [None]:
# Find a list of all the names in the following string
# So we need a pattern that captures capital letters

import re
def names():
    simple_string = """Amy is 5 years old, and her sister Mary is 2 years old. 
    Ruth and Peter, their parents, have 3 kids."""

    # So capture Capitalization + lowercase with [A-Z][a-z], * is for 0 or more times
    # Then, look ahead to the pattern for a white space \s or a comma (,), but don't return this (?=)
    return re.findall(r"([A-Z][a-z]*)(?=\s|,)", string)
    raise NotImplementedError()

In [264]:
names()

4

In [243]:
string = "Amy is 5 years old, and her sister Mary is 2 years old. Ruth and Peter, their parents, have 3 kids."

In [259]:
re.findall(r"([A-Z][a-z]*)(?=\s|,)", string)

['Amy', 'Mary', 'Ruth', 'Peter']

In [263]:
import re
def names():
    simple_string = """Amy is 5 years old, and her sister Mary is 2 years old. 
    Ruth and Peter, their parents, have 3 kids."""

    return re.findall(r"([A-Z][a-z]*)(?=\s|,)", string)
    raise NotImplementedError()

assert len(names()) == 4, "there are four names"

### READ IN GRADES DATA

In [373]:
with open("../Data/Grades.txt", "r") as file:
    grades = file.read()

b_list = []
for item in re.findall("(\w* \w*)(?=: B)+", grades):
    b_list.append(item)



  for item in re.findall("(\w* \w*)(?=: B)+", grades):


In [387]:
b_list = []
for item in re.finditer("(\w* \w*)(?=: B)+", grades):
    b_list.append(item.group())

b_list

  for item in re.finditer("(\w* \w*)(?=: B)+", grades):


['Bell Kassulke',
 'Simon Loidl',
 'Elias Jovanovic',
 'Hakim Botros',
 'Emilie Lorentsen',
 'Jake Wood',
 'Fatemeh Akhtar',
 'Kim Weston',
 'Yasmin Dar',
 'Viswamitra Upandhye',
 'Killian Kaufman',
 'Elwood Page',
 'Elodie Booker',
 'Adnan Chen',
 'Hank Spinka',
 'Hannah Bayer']

In [395]:
# So, we basically want to iterate through the entire list and pluck out any letters that occur after : + \s, and we just want Bs
import re
def grades():
    with open ("../Data/grades.txt", "r") as file:
        grades = file.read()

    b_list = []
    for item in re.finditer(r"(\w* \w*)(?=: B)+", grades):
        b_list.append(item.group())
    return b_list
    raise NotImplementedError()

In [397]:
grades()

['Bell Kassulke',
 'Simon Loidl',
 'Elias Jovanovic',
 'Hakim Botros',
 'Emilie Lorentsen',
 'Jake Wood',
 'Fatemeh Akhtar',
 'Kim Weston',
 'Yasmin Dar',
 'Viswamitra Upandhye',
 'Killian Kaufman',
 'Elwood Page',
 'Elodie Booker',
 'Adnan Chen',
 'Hank Spinka',
 'Hannah Bayer']

In [1]:
import re
def grades2():
    with open ("../Data/grades.txt", "r") as file:
        grades = file.read()

    b_list = []
    for item in re.finditer(r"(\w* \w*)(?=: B)+", grades):
        b_list.append(item.group())
    return b_list
    raise NotImplementedError()

In [2]:
grades2()

['Bell Kassulke',
 'Simon Loidl',
 'Elias Jovanovic',
 'Hakim Botros',
 'Emilie Lorentsen',
 'Jake Wood',
 'Fatemeh Akhtar',
 'Kim Weston',
 'Yasmin Dar',
 'Viswamitra Upandhye',
 'Killian Kaufman',
 'Elwood Page',
 'Elodie Booker',
 'Adnan Chen',
 'Hank Spinka',
 'Hannah Bayer']

In [5]:
import re
def names():
    simple_string = """Amy is 5 years old, and her sister Mary is 2 years old. 
    Ruth and Peter, their parents, have 3 kids."""

    return re.findall(r"([A-Z][a-z]*)(?=\s|,)", simple_string)

In [6]:
names()

['Amy', 'Mary', 'Ruth', 'Peter']

In [7]:
with open("../Data/logdata.txt") as file:
    logs = file.read()

In [8]:
print(logs)

146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622
197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554
156.127.178.177 - okuneva5222 [21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701
100.32.205.59 - ortiz8891 [21/Jun/2019:15:45:28 -0700] "PATCH /architectures HTTP/1.0" 204 6048
168.95.156.240 - stark2413 [21/Jun/2019:15:45:31 -0700] "GET /engage HTTP/2.0" 201 9645
71.172.239.195 - dooley1853 [21/Jun/2019:15:45:32 -0700] "PUT /cutting-edge HTTP/2.0" 406 24498
180.95.121.94 - mohr6893 [21/Jun/2019:15:45:34 -0700] "PATCH /extensible/reinvent HTTP/1.1" 201 27330
144.23.247.108 - auer7552 [21/Jun/2019:15:45:35 -0700] "POST /extensible/infrastructures/one-to-one/enterprise HTTP/1.1" 100 22921
2.179.103.97 - lind8584 [21/Jun/2019:15:45:36 -0700] "POST /grow/front-end/e-commerce/robust HTTP/2.0" 304 14641
241.114.184.133 - t

In [13]:
# Extract the host, user_name, time, and request

host = "([0-9]{1,3}(?:\.[0-9]{1,3}){3})"
user_name = "(-\s(-|\w{3,20}\d)\s)" # The username can be missing, as indicated by a -, so we need to incorporate an or statement
time = "\[(.*)\]"
request = r"\"([A-Z]{1,10}.*\d)\""

#[\w]*)(?=\s)
#re.findall(r"([0-9]{1,3}(?:\.[0-9]{1,3}){3})(?=\s-)(-|\w+)", logs)
re.findall(request, logs)


  host = "([0-9]{1,3}(?:\.[0-9]{1,3}){3})"
  user_name = "(-\s(-|\w{3,20}\d)\s)" # The username can be missing, as indicated by a -, so we need to incorporate an or statement
  time = "\[(.*)\]"


['POST /incentivize HTTP/1.1',
 'DELETE /virtual/solutions/target/web+services HTTP/2.0',
 'DELETE /interactive/transparent/niches/revolutionize HTTP/1.1',
 'PATCH /architectures HTTP/1.0',
 'GET /engage HTTP/2.0',
 'PUT /cutting-edge HTTP/2.0',
 'PATCH /extensible/reinvent HTTP/1.1',
 'POST /extensible/infrastructures/one-to-one/enterprise HTTP/1.1',
 'POST /grow/front-end/e-commerce/robust HTTP/2.0',
 'GET /redefine/orchestrate HTTP/1.0',
 'PUT /orchestrate/out-of-the-box/unleash/syndicate HTTP/1.1',
 'POST /enhance/solutions/bricks-and-clicks HTTP/1.1',
 'DELETE /rich/reinvent HTTP/2.0',
 'HEAD /scale/global/leverage HTTP/1.0',
 'POST /innovative/roi/robust/systems HTTP/1.1',
 'HEAD /systems/sexy HTTP/1.1',
 'GET /incubate/incubate HTTP/1.1',
 'GET /convergence HTTP/2.0',
 'HEAD /convergence HTTP/2.0',
 'DELETE /bandwidth/reintermediate/engage HTTP/2.0',
 'PUT /optimize HTTP/1.1',
 'DELETE /bandwidth/turn-key/users HTTP/2.0',
 'POST /efficient/unleash HTTP/1.1',
 'POST /morph/optimi

In [843]:
patterns = """
(?P<host>([0-9]{1,3}(?:\.[0-9]{1,3}){3}))   # Hostname which is made up of only numbers, repeated pattern
(?P<user_name>(-\s(-|\w{3,20}\d)\s))   # user_name, a bunch of letters and ending with a number
#(?P<time>((?<=\[).*(?=\])))                   # time, starts with a bracket and ends with a bracket                  
#(?P<request>((?<=\")[A-Z]{1,8}.*\d(?=\")))    # request, starts with quotes and ends with quotes
"""

time = "\[(.*)\]"
request = '\"([A-Z]{1,10}.*\d)\"'


# Now we can call finditer and pass the re.VERBOSE flag as the last parameter, this makes it easier to understand a large regex
for item in re.finditer(patterns, logs, re.VERBOSE):
    print(item.groupdict())

  (?P<host>([0-9]{1,3}(?:\.[0-9]{1,3}){3}))   # Hostname which is made up of only numbers, repeated pattern
  time = "\[(.*)\]"
  request = '\"([A-Z]{1,10}.*\d)\"'


In [None]:
def logs():
    with open("../Data/logdata.txt", "r") as file:
        logdata = file.read()
     
     # Notes - cannot use lookahead or lookbehind patterns when chaining together regex statements in this manner
     # Note how you have to put tags inside quotes or brackets if you don't want the quotes or brackets to be returned
     # in the regex return.
     
        patterns = r"""
        (?P<host>[0-9]{1,3}(?:\.[0-9]{1,3}){3})   # Host: IP address
        (\s-\s)                                   # Whitespace between fields
        (?P<user_name>(-|\w{3,20}\d))             # Pluck out hostname, but some host names are missing
        (\s)                                  
        \[(?P<time>(.*?))\]                         # time (between brackets [...], need a non-greedy (.*) case to just get text inside markers) 
        (\s)                                              
        \"(?P<request>([A-Z]{1,10}.*\d))\"          # status - between quotes "..."
        """
        
        results = []
        for item in re.finditer(patterns, logdata, re.VERBOSE):
            #print(item.groupdict())
            results.append(item.groupdict())
        return results
    

In [952]:
logs()

[{'host': '146.204.224.152',
  'user_name': 'feest6811',
  'time': '21/Jun/2019:15:45:24 -0700',
  'request': 'POST /incentivize HTTP/1.1'},
 {'host': '197.109.77.178',
  'user_name': 'kertzmann3129',
  'time': '21/Jun/2019:15:45:25 -0700',
  'request': 'DELETE /virtual/solutions/target/web+services HTTP/2.0'},
 {'host': '156.127.178.177',
  'user_name': 'okuneva5222',
  'time': '21/Jun/2019:15:45:27 -0700',
  'request': 'DELETE /interactive/transparent/niches/revolutionize HTTP/1.1'},
 {'host': '100.32.205.59',
  'user_name': 'ortiz8891',
  'time': '21/Jun/2019:15:45:28 -0700',
  'request': 'PATCH /architectures HTTP/1.0'},
 {'host': '168.95.156.240',
  'user_name': 'stark2413',
  'time': '21/Jun/2019:15:45:31 -0700',
  'request': 'GET /engage HTTP/2.0'},
 {'host': '71.172.239.195',
  'user_name': 'dooley1853',
  'time': '21/Jun/2019:15:45:32 -0700',
  'request': 'PUT /cutting-edge HTTP/2.0'},
 {'host': '180.95.121.94',
  'user_name': 'mohr6893',
  'time': '21/Jun/2019:15:45:34 -0700'

In [954]:
len(logs()) == 979
    

True