<a href="https://colab.research.google.com/github/yongsa-nut/TU_Intro_Prog/blob/main/Chapter_7_Strings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chapter 7 Strings

https://github.com/yongsa-nut/TU_Intro_Prog/

## Basic String Operations

* Many types of programs perform operations on strings

* Many tools for examining and manipulating strings

* Strings are sequences, so many of the tools that work with sequences work
with strings

* Display the character by using $\texttt{print()}$ function

* Assigning a string into a variable can be done by quotes.




## Accessing the Individual Characters in a String

* Use $\texttt{for}$ loop to access an individual character in a string
  * Format: $\texttt{for char in string:}$
  * Useful when need to iterate over the whole string, such as to count <br> the occurrences of a specific character

In [None]:
name = "Juliet"

for char in name:
  print(char)

In [None]:
# counts the number of times the target letter appears in a string

TARGET = "T"

my_string = input("Enter a sentence:")

count = 0
for ch in my_string:
  if ch == TARGET:
    count += 1

print(f"The letter {TARGET} appears {count} times.")

## Accessing the Individual Characters in a String

* Use **indexing** to access an individual character in a string:

 * Each character has an index specifying its position in the string, starting at 0

 * Format: $\texttt{character = my_string[i]}$

 * $\texttt{IndexError}$ exception will occur, if you try to access an index that <br> is out of range for the string




In [None]:
my_string = "Roses are red"
ch = my_string[6]
print(ch)

In [None]:
my_string[20] # index error

## $\texttt{len}$ to check string's length
* $\texttt{len(string)}$ function can be used to obtain the length of a string

In [None]:
my_string = "Roses are red"
print(len(my_string))

String Concatenation $\texttt{+}$

* Concatenation: appending one string to the end of another string

* Use the $\texttt{+}$ operator to produce a string that is a combination of its operands

* The augmented assignment operator $\texttt{+=}$ can also be used to concatenate strings


In [None]:
# concat ex1
first_name = "Emily"
last_name  = "Yeager"
full_name = first_name + last_name
print(full_name)

In [None]:
# concat ex2
letters = "abc"
letters += "def"
print(letters)

## Strings are immutable!

* Strings are immutable

* Once they are created, they cannot be changed

* Concatenation doesn’t actually change the existing string, but rather <br> creates a new string and assigns the new string to the previously used variable.

* Cannot use an expression of the form $\texttt{string[index] = new_character}$
  * Statement of this type will raise an exception

In [None]:
my_text = "this is a cat."
my_text[0] = "T"

## String Slicing
* **Slice**: span of items taken from a sequence, known as substring

* Slicing format: $\texttt{string[start : end: step]}$

  * Expression will return a string containing a copy of the characters from start up to, but not including, end

  * If start is not specified, $\texttt{0}$ is used for start index

  * If end is not specified, $\texttt{len(string)}$ is used for end index
  
  * If step is not specified, $\texttt{1}$ is used for step value

  * Negative indexes count from the end of string



In [None]:
# Slicing Example
full_name = "Patty Lynn Smith"

middle_name = full_name[6:10]
print(middle_name)

first_name = full_name[:5]
print(first_name)

last_name = full_name[11:]
print(last_name)

In [None]:
full_name = "Patty Lynn Smith"

last_name = full_name[-5:]
print(last_name)

my_string = full_name[:]
print(my_string)

In [None]:
letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

print(letters[0:26:2])

In [None]:
print(letters[::2])

In [None]:
print(letters[::-1])

## Searching with $\texttt{in}$ operator

* Can use the $\texttt{in}$ operator to determine whether one string is contained in  another string
* Syntax: $\texttt{ string1 in string2}$
  * $\texttt{string1}$ and $\texttt{string2}$ can be string literals or variables referencing strings
* Similarly you can use the $\texttt{ not in }$ operator to determine whether <br> one string is not contained in another string


In [None]:
# in example
text = "Four score and seven years ago"

print("seven" in text)
print("four" in text)

## The repetition operator
* Repetition operator: makes multiple copies of a string and joins them together
* The $\texttt{*}$ symbol is a repetition operator when applied to a string.
* Syntax: $\texttt{string_to_copy * n}$
* Variable references a new string which contains multiple copies of the original string

In [None]:
# repetition ex1:
my_string = 'w' * 5
print(my_string)
print("Hello" * 5)

In [None]:
# repetition ex2:

def main():
  # print nine rows increasing in length.
  for count in range(1, 10):
    print('Z' * count)

  # print nine rows decreasing in length.
  for count in range(8, 0, -1):
    print('Z' * count)

main()

## String Testing Methods
* Syntax: $\texttt{string.method(arguments)}$
* String methods to test a string for specific characters
  * $\texttt{isalnum()}$  : return $\texttt{True}$  if the string contains **only alphabetic letters or digits**. $\texttt{False}$ otherwise.
  
  * $\texttt{isalpha()}$  : return $\texttt{True}$ if the string contains **only alphabetic letters**. $\texttt{False}$ otherwise.
  
  * $\texttt{isdigit()}$  : return $\texttt{True}$ if the string contains **only digits**. $\texttt{False}$ otherwise.
  
  * $\texttt{islower()}$  : return $\texttt{True}$ if the string contains **only lower cases**. $\texttt{False}$ otherwise.
  
  * $\texttt{isspace()}$  : return $\texttt{True}$ if the string contains **only whitespace characters** (space, $\texttt{\n}$, $\texttt{\t}$). $\texttt{False}$ otherwise.
  
  * $\texttt{isupper()}$  : return $\texttt{True}$ if the string contains **only upper cases**. $\texttt{False}$ otherwise.
  
  * If string is empty (len = 0), return $\texttt{False}$ in all cases.



In [None]:
# String testing method:

def main():
  # Get a string from the user
  user_string = input("Enter a string: ")

  print("This is what I found about the string: ")

  # Test the string
  if user_string.isalnum():
    print("The string is alphanumeric.")
  if user_string.isalpha():
    print("The string contains only alphabetic characters.")
  if user_string.isdigit():
    print("The string contains only digits.")
  if user_string.ispace():
    print("The string contains only whitespace characters.")
  if user_string.islower():
    print("The letters in the string are all lowercases.")
  if user_string.isupper():
    print("The letters in the string are all uppercases.")

main()

In [None]:
dir(str)

## String Methods that return a new string

* $\texttt{lower()}$ : Return a copy of the string with all alphabetic letters **converted to lowercase**.
* $\texttt{upper()}$ : Return a copy of the string with all alphabetic letters **converted to uppercase**.

* $\texttt{lstrip()}$ : Return a copy of the string with all **leading whitespace characters** <br> (spaces, $\texttt{\n}$, $\texttt{\t}$) **removed**.
  * strip left

* $\texttt{lstrip(char)}$ : Return a copy of the string with **all instances of <br> char that appear at the beginning of string removed**.

* $\texttt{rstrip()}$ : Return a copy of the string with **all trailing <br> whitespace characters removed**.
  * strip right

* $\texttt{rstrip(char)}$ : Return a copy of the string with **all instances of<br> char that appear at the end of string removed**.

* $\texttt{strip()}$ : Return a copy of the string with **all leading and <br> trailing whitespace characters removed**.
  * strip both ends

* $\texttt{strip(char)}$ : Return a copy of the string with **all instances of <br>char that appear at the beginning and the end of string removed**.

In [None]:
# Upper ex:
letters = "WXYZ"
print(letters, letters.lower(), letters)

In [None]:
letters = "WXYZ1234wyxz"
print(letters.lower())

In [None]:
# Lower ex:
letters = "abcd"
print(letters.upper())

print("VvvvV".upper())

In [None]:
# strip ex1:
letters = "  middle  "

print(f"lstrip: {letters.lstrip()}." )
print(f"rstrip: {letters.rstrip()}.")
print(f"strip: {letters.strip()}.")

In [None]:
# strip with char:
letters = "mmmmidleeee"

print(f"lstrip: {letters.lstrip('m')}." )
print(f"rstrip: {letters.rstrip('e')}.")
print(f"strip: {letters.strip('m').strip('e')}.") # can chain methods

## String searching method

* Programs commonly need to search for substrings

* $\texttt{endswith(substring)}$: return $\texttt{True}$ if the string ends with substring. <br> $\texttt{False}$ otherwise.

* $\texttt{startswith(substring)}$: return $\texttt{True}$ if the string starts with substring. <br> $\texttt{False}$ otherwise.

In [None]:
# endswith ex:
filename = input("Enter the filename: ")
if filename.endswith(".txt"):
  print("This is the name of a text file.")
elif filename.endswith(".py"):
  print("This is the name of a Python source file.")
elif filename.endswith(".doc"):
  print("That is the name of a word processing document.")
else:
  print("Unknown file type.")

In [None]:
# startswith ex:
## Count emails from a specific person

email_headers = ["From: Marlin. Date: 10/09/2012",
                "From: Adam. Date: 10/10/2012",
                "From: Smith. Date: 12/09/2013",
                "From: Adam. Date: 08/20/2014"]

target = "From: Adam"

count = 0
for email in email_headers:
  if email.startswith(target):
    count += 1

print(f"The number of email from Adam is {count}")

## String searching methods

* $\texttt{find(substring)}$ : search for $\texttt{substring}$ within the string
  * Returns the lowest index of the substring. <br> If the $\texttt{substring}$ is not in the string, return -1.
* $\texttt{replace(substring, new_string)}$ :
  * Returns a copy of the string where **every instance** of $\texttt{substring}$ <br> is replaced with $\texttt{new_string}$

In [None]:
# find ex:
string = "Four score and seven years ago."
position = string.find("seven")

if position != -1:
  print("Found at", position)
else:
  print("Not Found")

In [None]:
# replace ex
string = "Four score and seven years ago."
new_string = string.replace("years", "days")
print(new_string)

## String searching methods summary

* $\texttt{endswith(substring)}$: return $\texttt{True}$ if the string ends with substring. <br> $\texttt{False}$ otherwise.

* $\texttt{startswith(substring)}$: return $\texttt{True}$ if the string starts with substring. <br> $\texttt{False}$ otherwise.

* $\texttt{find(substring)}$ : search for $\texttt{substring}$ within the string
  * Returns the lowest index of the substring. <br> If the $\texttt{substring}$ is not in the string, return -1.

* $\texttt{replace(substring, new_string)}$ :
  * Returns a copy of the string where **every instance** of $\texttt{substring}$ <br> is replaced with $\texttt{new_string}$

## Splitting String
* $\texttt{split(params)}$ method: returns a list containing new strings by <br> splitting the string based on a separator.
  * By default, uses space as a separator
  
  * Can specify a different separator by passing it as an argument to the $\texttt{split}$ method

In [None]:
# split ex1:
date_string = "11/26/2018"
date_list = date_string.split("/")
print(date_list)

print("Month: ", date_list[0])
print("Day: ", date_list[1])
print("Year: ", date_list[2])

In [None]:
# split ex2:

def main():
  # create a string with multiple words
  my_string = "One two three four"

  word_list = my_string.split()

  print(word_list)

main()

In [None]:
# split ex3: URL parsing

## recover paramters ID, COND, SESSION

my_url_arg = "ID=5123sdf431%20COND=5%20SESSION=2%20" # %20 = space in url

params =  my_url_arg.split("%20")
id = params[0].split("=")[1]
cond = params[1].split("=")[1]
session = params[2].split("=")[1]

print(f"ID = {id}, Condition = {cond}, Session = {session}.")

## String join

* $\texttt{join(iterable)}$ method : takes an iterable (list or tuple) and
  *  returns a new string created by joining the elements of an iterable by <br> the original string.

In [None]:
# join ex1

num_list = ['1','2','3','4']
separator = ','
print(separator.join(num_list))

In [None]:
# join ex2:
texts = ('A','B','C')
between = '-'
print(between.join(texts))

## Summary
* String is an immutable sequence of characters
* indexing: method for iterating over strings
* Operators: $\texttt{+}$ (concat), $\texttt{*}$ (repetition), $\texttt{in}$ (searching)
* Slicing: $\texttt{[start:end:step]}$ (indexes can be negative)
* String checking methods:
 * $\texttt{isalnum}$ , $\texttt{isalpha}$, $\texttt{isdigit}$, $\texttt{isupper}$, $\texttt{islower}$, $\texttt{isspace}$
* String editting methods:
 * $\texttt{upper}$, $\texttt{lower}$, $\texttt{lstrip}$, $\texttt{rstrip}$, $\texttt{strip}$
* String searching methods:
 * $\texttt{startswith}$, $\texttt{endswith}$, $\texttt{find}$, $\texttt{replace}$
* Splitting and joining string (string to list and list to string)
 * $\texttt{split}$ and $\texttt{join}$