Please go through the following in the text:
- [Chapter 10 - Tuples](https://eng.libretexts.org/Bookshelves/Computer_Science/Book%3A_Python_for_Everybody_(Severance)/10%3A_Tuples)
- [Chapter 11 - Regular Expressions](https://eng.libretexts.org/Bookshelves/Computer_Science/Book%3A_Python_for_Everybody_(Severance)/11%3A_Regular_Expressions)
### Tuples and Regex
#### Tuples

In [None]:
# Using tuples!
# You already saw this in the last activity.
students = {'001234567': 'Generic Student', '009876543': 'Another Student', '000000001':'Cody Coyote'}

# Do you see tuples in there? ('001234567', 'Generic Student')? Let's pull one out.
first_dict_item = list(students.items())[0] # cast to list first to make it easier to pull out elements
print(first_dict_item, "\nType:", type(first_dict_item)) 
# Note: As of Python version 3.7, the list of items in a dictionary are the same as they were when you inserted the values.
# Before, there would be no particular order (and it was honestly a little annoying)

In [None]:
# Tuples
# These are similar to a list, except they are immutable! They are also hashable, so we can use them as key values in python dictionaries 😀
# You can make a tuple in any way below
tuple_example = 1,2,3,4,5 # separate values by commas with no brackets
tuple_example2 = (10,7,8,9,6) # separate values by commas with parentheses

# Cast string to tuple
string2tuple = tuple("hey there")
print("Example of a string cast to tuple:", string2tuple)

# Cast entire string to one value of a tuple
string2tuple2 = ("hey there",) #put a comma
print("Example of an entire string cast to one value of a tuple:", string2tuple2)

# Cast list to tuple
list2tuple = tuple([1, 3, 4, 7, 38])
print("Example of a list cast to tuple:", list2tuple)

# You can do slicing and subsetting the same way as lists!
print("Get a slice of the first 3 elements in a tuple:", tuple_example[0:3])

# And of course you can have more complicated structures, like tuples of tuples
is_it_yummy = (("yuck", "mayo"), ("yummy", "mustard"), ("yuck", "tabasco"), ("yummy", "tapatio"), ("yuck", "mayonnaise"), ("yummy", "mayoketchup"))
print("Tuple of tuples:", is_it_yummy)

# Tuples are iterables too! 😀 but you knew that from using it earlier as a dictionary (key,value) pair
for delicious, food in is_it_yummy: # for every tuple pair, such as ("yuck", "mayo"), in is_it_yummy, do the following:
    print("What do I think about", food, "🤔...", "😜"*(delicious=="yuck") + "😋"*(delicious=="yummy"))

### Extra stuff: What is this thing doing? "😜"*(delicious=="yuck") + "😋"*(delicious=="yummy")
# (delicious=="yuck") is either a True or False. If it was True...
# "😜"*(True) + "😋"*(False)) = "😜"*(True) = "😜"
# This is just one of many ways to do some quick code that is similar to an "if" statement


In [None]:
# Remember though, tuples cannot be changed! 
tuple_example[0] = 100 #this won't work!

In [None]:
# Useful! You can use a tuple to assign more than one variable at a time
full_name = "Ben Becerra"
first_name, last_name = full_name.split() # This string methods splits every word, default is by empty space " "
print("First Name:", first_name)
print("Last Name:", last_name)

In [None]:
# If tuples are immutable, how do I sort??
# Use sorted() to create a new list of sorted elements 😀
unsorted_tuple = (1000,-10,0,30)
sorted_tuple = sorted(unsorted_tuple) #Try sorted(unsorted_tuple, reverse=True) for reverse order
print("Before sorting:", unsorted_tuple)
print("After sorting:", sorted_tuple) #oh this is a list. Did you want a tuple? Try tuple(sorted_tuple)

#### Intro to Regular Expressions (RegEx) - Pattern Matching and Data Extraction!

Regular expressions can look very complicated, and almost like learning ANOTHER language (it's all syntax though).There are four common functions: findall, search, split, sub
combined with three levels of syntax: metacharacters, special sequences, character sets


There's even MORE functions, but let's just stick to the basics. For MORE info, go to https://www.w3schools.com/python/python_regex.asp 
and https://docs.python.org/3/howto/regex.html#  It's a little overwhelming though..

The goal of this lesson is an introduction to RegEx. After this lesson, we'll start using GenAI to help us write RegEx patterns. 🤖

In [8]:
import re  #import the regular expressions module
random_text = "The beauty of the sunset was obscured by the industrial cranes. 100 200 300 400" #from https://randomwordgenerator.com/sentence.php 😀


### the re.findall() function
re.findall() to find and extract a whole word match


In [None]:
sunset_match = re.findall(r"sunset", random_text) #why is there an 'r' in front of the string?
# the 'r' is not always needed, but a good practice. Some RegEx syntax uses backslashes \ which can also look like an "escape character"
# Remember \n ? That's an "escape character" to create a new line.
print("Any matches to 'sunset'?",sunset_match)

### re.findall() with metacharacters (https://www.w3schools.com/python/python_regex.asp scroll down to Metacharacters)

sunset_match2 = re.findall(r"sunset|beauty", random_text)
print("Any matches to 'sunset' OR 'beauty'?",sunset_match2)

### re.findall() with character sets (https://www.w3schools.com/python/python_regex.asp scroll down to Sets)
digits_only = re.findall(r"[0-9]", random_text)
print("Any matches to digits only between 0-9?", digits_only)

### As you can notice, it gets complicated REAL quick. RegEx skills take time and practice to develop.

### The re.search() function
re.search() to search for a whole word

In [None]:
search_example = re.search(r"sunset", random_text)
print(search_example) # What is this? a Match object?? 
# What does this object look like (i.e. attributes)? What can this object do (i.e. methods)?

# remove the # comment sign to run the help() function and see all the methods available in the Match object
#help(search_example)  # Oh! It can do a variety of things 😀 

print("The start and end positions for 'sunset':", search_example.span())

# So just from these two functions alone, this could be useful for identifying WHERE the matches occurred in a string. 

### The re.split() function
Split strings according to a particular match... more powerful than string method .split() since we can refine them with metacharacters, special sequences, and character sets

In [None]:
some_string = "Hello%20everyone!&nbspHow%20are%20you?"
print(re.split(r"%20|&nbsp", some_string)) #This is using Metacharacters to refine our split.

some_string2 = "I'm$doing$well!"
print(re.split(r"\$", some_string2)) #Special characters like $ may need a \ to be processed correctly


### The re.sub() function
Substitute matches with another string

In [None]:
sub_example = "I really like pizza 🍔 and hamburgers 🍔"
print("Before:", sub_example)
sub_burger = re.sub("🍔", "🍕", sub_example)
print("Replace 🍔 with 🍕:", sub_burger) # Hmm oops, don't replace ALL of them...
sub_burger_1 = re.sub("🍔", "🍕", sub_example, 1)
print("Replace just ONE occurrence of 🍔 with 🍕:", sub_burger_1)


### Tuples and RegEx - Activity

In [None]:
#1a - Create a tuple of the following numbers 1,2,3,4,5
number1to5 = 
print(number1to5)

#1b - Use this tuple as an iterable to show the squared terms instead. 
#Hmm, something isn't working... (hint it's a small review of 'for' loops) 
for value in number1to5:
    print(number1to5**2) #something isn't working right.. fix it!🔨



In [25]:
#1a 
number1to5 = (1,2,3,4,5)
print(number1to5)
#1b
for value in number1to5:
    print(value ** 2)

(1, 2, 3, 4, 5)
1
4
9
16
25


In [None]:
#2 - I'm trying to split the following tuple into two separate lists... but I want it sorted first. Help me out 🙂
year_value = ((2020,45),(1980,23),(2004,55),(1995,11))
###Add ONE line here 😀

###

year_list = []
value_list = []

for year, value in year_value:
    year_list.append(year)
    value_list.append(value)

print(year_list)
print(value_list)

# Note: The year and values should still correspond to the original order, like this:
# [1980, 1995, 2004, 2020]
# [23, 11, 55, 45]
# Hint: sorted() creates a NEW object... might need to rename some stuff, or just overwrite the original...

In [5]:
#2
year_value = ((2020,45), (1980,23), (2004,55), (1995,11)) 
year_value = sorted(year_value)

year_list = []
value_list = []

for year, value in year_value:
    year_list.append(year)
    value_list.append(value)

print(year_list)
print(value_list)

[1980, 1995, 2004, 2020]
[23, 11, 55, 45]


In [None]:
#3 - Oops, it was supposed to be Jan Smith... can you replace with RegEx? 
import re
messy_data = "<id value='jan.sith.$D2019-12-01$T10:45:00Z-85354-9'/>"



In [10]:
#3
import re
messy_data = "<id value='jan.sith.$D2019-12-01$T10:45:00Z-85354-9'/>"

name_sub = re.sub('jan.sith', 'Jan.Smith', messy_data)
print(name_sub)

<id value='Jan.Smith.$D2019-12-01$T10:45:00Z-85354-9'/>


In [None]:
#4 - Can you also extract the date? 
messy_data = "<id value='jan.sith.$D2019-12-01$T10:45:00Z-85354-9'/>"
#Hint: Take it one step at a time! Notice how there is a $ between the date? Try to split...
#Hint: after splitting, look at the list you made... just select the correct item such as split_messy_data[1]

split_messy_data = # First, complete the code to split

#Then select the correct item


#Optional: Remove the 'D' also 😀

In [24]:
#4
import re
messy_data = "<id value='jan.sith.$D2019-12-01$T10:45:00Z-85354-9'/>"

split_messy_data = re.split(r"\$", messy_data)

extract_date = split_messy_data[1][1:]
print(extract_date)


2019-12-01


Copyright Benjamin J. Becerra v2024.02.14.0