In [None]:
#The first cell is just to align our markdown tables to the left vs. center

In [1]:
%%html
<style>
table {float:left}
</style>

# Manipulating Strings
***
## Learning Objectives
In this lesson you will: 

        1. Learn the fundamentals of processing text stored in string values
        2. Apply various methods to strings
>- Note: This lesson concludes our Python fundamentals section of this course and the material for the Midterm
>- After this, we should have enough of the basic understanding of Python to start working on applied business analytics problems!


## Links to topics and functions:
>- <a id='Lists'></a>[String Literals](#String-Literals)
>- <a id='methods'></a>[String Methods](#String-Methods)


### References:
>- Sweigart(2015, pp. 123-143)
>- w3Schools: https://www.w3schools.com/python/python_strings.asp

#### Don't forget about the Python visualizer tool: http://pythontutor.com/visualize.html#mode=display

## Table of String Methods:
|Methods/Functions  |Description    |
|:-----------:      |:-------------|
|upper()            |Returns a new string with all UPPER CASE LETTERS|
|lower()            |Returns a new string with all lower case letters|
|isupper()          |Checks whether all the letters in a string are UPPER CASE|
|islower()          |Checks whether all the letters in a string are lower case|
|isalpha()          |Checks whether a string only has letters and is not blank|
|isalnum()          |Checks whether only letters and numbers are in the string|   
|isdecimal()        |Checks whether the string only consists of numeric characters|
|isspace()          |Checks whether the string only contains: spaces, tabs, and new lines|
|istitle()          |Checks whether the string only contains words that start with upper followed by lower case|
|startswith()       |Checks if the string value begins with the string passed to the method
|endswith()         |Checks if the string value ends with the string passed to the method
|join()             |Concatenates a list of strings into one string
|split()            |Basically, "unconcatenates" a string into a list of strings
|rjust()            |Right justifies a string based on an integer value of spaces
|ljust()            |Left justifies a string based on an integer value of spaces
|center()           |Centers a string based on an integer value of spaces
|strip()            |Removes whitespace characters at the beginning and end of string
|rstrip()           |Removes whitespace from the right end of the string
|lstrip()           |Removes whitespace from the left end of the string

# String Literals
>- Basically, this is telling Python where a string begins and ends
>- We have already used single `'` and `"` quotes but what if we want to mix these? 


### Using double quotes
>- One wrong and correct way to define a string in Python using quotes

In [2]:
ralphie = 'Ralphie is CU's mascot'

SyntaxError: invalid syntax (<ipython-input-2-6eec046b96be>, line 1)

In [4]:
ralphie = "Ralphie is CU's mascot"

print(ralphie)

Ralphie is CU's mascot


#### Another way using escape characters

In [5]:
ralphie = 'Ralphie is CU\'s mascot'

print(ralphie)

Ralphie is CU's mascot


### Escape characters allow us to put characters in a string that would otherwise be impossible

#### Here are some common escape characters

|Escape Character   | Prints as     |
:-----------:       |:----------:   |
|\\'                 |Single quote   |
|\\"                 |Double quote   |
|\t                 |Tab            |
|\n                 |New line       |
|\\\                 |Backslash      |

In [6]:
ralphieSaid = "\"I'm the best mascot in all of college sports\""

print(ralphieSaid)

"I'm the best mascot in all of college sports"


In [7]:
ralphie2 = "\"I'm the best mascot \n in all of college sports\""

print(ralphie2)

"I'm the best mascot 
 in all of college sports"


In [9]:
ralphie3 = "\"I'm the best maascot\n \t in all of college sports\""

print(ralphie3)

"I'm the best maascot
 	 in all of college sports"


### Multi-line Strings 
>- Use triple quotes
>- All text within triple quotes is considered part of the string
>- This is particularly useful when commenting out your code

In [10]:
'''
* Here is some text that can
    carry over to multiple lines
    and Python treats this
    as one string. We can have
    single quotes ' and
    double quotes "" within the triple quotes
    and this is still all part of the
    same block.

'''

'\n* Here is some text that can\n    carry over to multiple lines\n    and Python treats this\n    as one string. We can have\n    single quotes \' and\n    double quotes "" within the triple quotes\n    and this is still all part of the\n    same block.\n\n'

### Indexing and Slicing Strings
>- Recall how we used indexes and slicing with lists: `list[1]`, `list[0:3]`, etc
>- Also recall how we said strings are "list-like" 
>- We can think of a string as a list with each character having an index

#### Let's slice up some strings

In [11]:
ralphie = "Go Buffs!"

In [12]:
ralphie[3]

'B'

In [13]:
ralphie[2:5]

' Bu'

In [14]:
team = ralphie[-6:-1]

print(team)

Buffs


### How many times does each character appear in `ralphie`? 

In [15]:
charCount = {}

for char in ralphie:
    
    charCount.setdefault(char,0)
    
    charCount[char] = charCount[char] + 1
    
charCount

{'G': 1, 'o': 1, ' ': 1, 'B': 1, 'u': 1, 'f': 2, 's': 1, '!': 1}

#### How many times does 'f' appear in our `ralphie` variable?

In [16]:
charCount['f']

2

#### Recall: get a sorted count of characters from `charCount` 

In [17]:
sorted(charCount.items(), key = lambda x: x[1], reverse = 1)

[('f', 2),
 ('G', 1),
 ('o', 1),
 (' ', 1),
 ('B', 1),
 ('u', 1),
 ('s', 1),
 ('!', 1)]

## String Methods

### upper(), lower(), isupper(), islower()

In [18]:
ralphie.upper()

'GO BUFFS!'

In [19]:
ralphie.lower()

'go buffs!'

##### Are all the letters uppercase?

In [22]:
ralphie.isupper()

False

##### Is X methods

##### Are all the letters lowercase?

In [21]:
ralphie.islower()

False

#### We can also type strings prior to the method

In [23]:
'HELLO'.isupper()

True

In [24]:
'hello'.islower()

True

In [25]:
'hello1234'.islower()

True

In [26]:
'123'.islower()

False

### `isalpha()`, `isalnum()`, `isdecimal()`, `isspace()`, `istitle()`

>- These can be useful for data validation

##### Does the string only contain letters with no space characters?

In [27]:
ralphie.isalpha()

False

##### Does the string only contain letters or numbers with no spaces?

In [28]:
ralphie.isalnum()

False

##### Does the string only contain numbers?

In [29]:
ralphie.isdecimal()

False

In [30]:
'ralphie123'.isalpha()

False

In [31]:
'ralphie123'.isalnum()

True

In [32]:
'12345'.isalnum()

True

In [33]:
'12345'.isdecimal()

True

##### Does the string contain only words that start with a capital followed by lowercase letters?

In [34]:
ralphie.istitle()

True

#### Example showing how the `isX` methods are useful
>- Task: create a program that will ask a user for their age and print their age to the screen
>>- Create data validation for age requiring only numbers for the input
>>- If the user does not enter a number, ask them to enter one. 

In [36]:
while True:
    
    age = input("What is your age? ")
    
    if age.isdecimal():
        break
        
    else:
        print("Please enter a number for your age")
        

print(age)

What is your age? fve
Please enter a number for your age
What is your age? 22
22


### `startswith()` and `endswith()` methods

##### Does the string start/end with a particular string?

In [37]:
ralphie.startswith('Go')

True

In [38]:
'Hello'.startswith('He')

True

In [39]:
ralphie.startswith('Buffs')

False

In [40]:
ralphie.endswith('!')

True

### `join()` and `split()` methods

#### `join()`
>- Take a list of strings and concatenate them into one string
>- The join method is called on a string value and is usually passed a list value

In [41]:
cuLeeds = ['marketing', 'finance', 'management', 'analytics']

In [45]:
cuLeedsJoin = ', '.join(cuLeeds)

cuLeedsJoin #this is now one long string will all the items from the list

'marketing, finance, management, analytics'

In [47]:
' and '.join(cuLeeds)

'marketing and finance and management and analytics'

#### `split()`
>- Commonly used to split a multi-line string along the newline characters
>- The split method is called on a string value  and returns a list of strings

In [48]:
deanLetter = '''
Dear Dean Matusik:
   
    We have been working really hard
    to learn Python this semester. 
    The skills we are learning in 
    the analytics program will
    translate into highly demanded
    jobs and higher salaries than 
    those without anlaytics skills. 

'''

#### Split `deanLetter` based on the line breaks
>- Will result in a list of all the string values based on line breaks

In [49]:
deanLetter.split('\n')

['',
 'Dear Dean Matusik:',
 '   ',
 '    We have been working really hard',
 '    to learn Python this semester. ',
 '    The skills we are learning in ',
 '    the analytics program will',
 '    translate into highly demanded',
 '    jobs and higher salaries than ',
 '    those without anlaytics skills. ',
 '',
 '']

##### Splitting on another character

In [51]:
deanLetter.split(':')

['\nDear Dean Matusik',
 '\n   \n    We have been working really hard\n    to learn Python this semester. \n    The skills we are learning in \n    the analytics program will\n    translate into highly demanded\n    jobs and higher salaries than \n    those without anlaytics skills. \n\n']

##### The default separator is any white space (new lines, spaces, tabs, etc)

In [52]:
deanLetter.split()

['Dear',
 'Dean',
 'Matusik:',
 'We',
 'have',
 'been',
 'working',
 'really',
 'hard',
 'to',
 'learn',
 'Python',
 'this',
 'semester.',
 'The',
 'skills',
 'we',
 'are',
 'learning',
 'in',
 'the',
 'analytics',
 'program',
 'will',
 'translate',
 'into',
 'highly',
 'demanded',
 'jobs',
 'and',
 'higher',
 'salaries',
 'than',
 'those',
 'without',
 'anlaytics',
 'skills.']

##### We can change the default number of splits if we pass a second parameter

In [53]:
deanLetter.split(' ',3)

['\nDear',
 'Dean',
 'Matusik:\n',
 '  \n    We have been working really hard\n    to learn Python this semester. \n    The skills we are learning in \n    the analytics program will\n    translate into highly demanded\n    jobs and higher salaries than \n    those without anlaytics skills. \n\n']

### Justifying Text with `rjust()`, `ljust()`, and `center()`
>- General syntax: `string.rjust(length, character)` where:
>>- length is required and represents the total length of the string
>>- character is optional and represents a character to fill in missing space

In [54]:
'Hello'.rjust(10)

'     Hello'

##### We can insert another character for the spaces

In [55]:
'Hello'.rjust(10,'-')

'-----Hello'

In [56]:
'Hello'.ljust(10)

'Hello     '

##### Insert another character for spaces

In [57]:
'Hello'.ljust(10,'!')

'Hello!!!!!'

In [58]:
'Hello'.center(10)

'  Hello   '

In [59]:
'Hello'.center(20,'*')

'*******Hello********'

### Justifying Text Example
>- Task: write a function that accepts 3 parameters: itemsDict, leftWidth, rightWidth and prints a table for majors and salaries
>>- itemsDict will be a dictionary variable storing salaries (the values) for majors (the keys)
>>- leftWidth is an integer parameter that will get passed to the ljust() method to define the column width of majors
>>- rightWidth is an integer parameter that will get passed to the ljust() method to define the column width of salaries

In [70]:
def printSalary(itemsDict, leftWidth, rightWidth):
    
    print('Major'.ljust(leftWidth, ),'Salary'.ljust(rightWidth, ))
    
    print('-' * (leftWidth + rightWidth))      # Replicates '-' based on the total width
    
    for key, value in itemsDict.items():
        
        print(key.ljust(leftWidth,'.') + str(value).rjust(rightWidth))

In [67]:
salaries = {'Marketing':50000,'Accounting':55000,'Analytics':57000,'Management':60000}

printSalary(salaries,15,7)

Major           Salary 
----------------------
Marketing......  50000
Accounting.....  55000
Analytics......  57000
Management.....  60000


### Some basic analytics on our salary table
>- How many total majors were analyzed? Name the variable `sampSize`
>- How was the average salary of all majors? Name the variable `avgSal`

In [69]:
sampSize = 0
sumSal = 0

for key in salaries:
    
    sampSize += 1
    
    sumSal += salaries[key]
    
avgSal = round(sumSal/sampSize,2)

Hi Boss, here is a summary of the results of the salary study:

>- Total majors: {{sampSize}}
>- Average Salary: ${{avgSal}}

#### Recall: To print results in a markdown cell you need to do the following:
Install some notebook extensions using the Anaconda shell (new terminal on a Mac)
1. If you have installed Anaconda on your machine then...
2. Search for "Anaconda Powershell prompt"
3. Open up the Anaconda Powershell and type the following commands
>- pip install jupyter_contrib_nbextensions
>- jupyter contrib nbextension install --user
>- jupyter nbextension enable python-markdown/main
4. After that all installs on your machine, you will need to reload Anaconda and juptyer

<a id='top'></a>[TopPage](#Teaching-Notes)