# Chapter 6: Manipulating Strings
Text is one of the most common forms of data your programs will handle. You already know how to concatenate two string values together with the + operator, but you can do much more than that. You can extract partial strings from string values, add or remove spacing, convert letters to lowercase or uppercase, and check that strings are formatted correctly. You can even write Python code to access the clipboard for copying and pasting text.

In this chapter, you’ll learn all this and more. Then you’ll work through two different programming projects: a simple clipboard that stores multiple strings of text and a program to automate the boring chore of formatting pieces of text.

## Working with Strings

Let’s look at some of the ways Python lets you write, print, and access strings in your code.

### String Literals

Typing string values in Python code is fairly straightforward: they begin and end with a single quote. But then how can you use a quote inside a string? Typing 'That is Alice's cat.' won’t work, because Python thinks the string ends after Alice, and the rest (s cat.') is invalid Python code. Fortunately, there are multiple ways to type strings.

### Double Quotes

Strings can begin and end with double quotes, just as they do with single quotes. One benefit of using double quotes is that the string can have a single quote character in it.

In [3]:
spam = "That's Alice's cat"
print(spam)

That's Alice's cat


Since the string begins with a double quote, Python knows that the single quote is part of the string and not marking the end of the string. However, if you need to use both single quotes and double quotes in the string, you’ll need to use escape characters.

### Escape Characters

An escape character lets you use characters that are otherwise impossible to put into a string. An escape character consists of a backslash (\) followed by the character you want to add to the string. (Despite consisting of two characters, it is commonly referred to as a singular escape character.) For example, the escape character for a single quote is \'. You can use this inside a string that begins and ends with single quotes.

In [4]:
spam = 'Say hi to Bob\'s mother'
print(spam)

Say hi to Bob's mother


Python knows that since the single quote in Bob\'s has a backslash, it is not a single quote meant to end the string value. The escape characters \' and \" let you put single quotes and double quotes inside your strings, respectively.

In [None]:
# \' Single quote
# \" Double Quote
# \t Tab
# \n new line
# \\ Backslash

In [5]:
print("Hello there!\nHow are you?\nI\'m doing fine.")

Hello there!
How are you?
I'm doing fine.


### Raw Strings
You can place an r before the beginning quotation mark of a string to make it a raw string. A raw string completely ignores all escape characters and prints any backslash that appears in the string. Because this is a raw string, Python considers the backslash as part of the string and not as the start of an escape character. Raw strings are helpful if you are typing string values that contain many backslashes, such as the strings used for Windows file paths like r'C:\Users\Al\Desktop' or regular expressions described in the next chapter.

In [6]:
print(r'That is Carol\'s cat.')

That is Carol\'s cat.


### Multiline Strings with Triple Quotes
While you can use the \n escape character to put a newline into a string, it is often easier to use multiline strings. A multiline string in Python begins and ends with either three single quotes or three double quotes. Any quotes, tabs, or newlines in between the “triple quotes” are considered part of the string. Python’s indentation rules for blocks do not apply to lines inside a multiline string.

In [7]:
print('''Dear Alice,

Eve's cat has been arrested for catnapping, cat burglary, and extortion.

Sincerely,
Bob''')

Dear Alice,

Eve's cat has been arrested for catnapping, cat burglary, and extortion.

Sincerely,
Bob


### Multiline Comments

While the hash character (#) marks the beginning of a comment for the rest of the line, a multiline string is often used for comments that span multiple lines. The following is perfectly valid Python code:

In [8]:
"""This is a test Python program.
Written by Al Sweigart al@inventwithpython.com

This program was designed for Python 3, not Python 2.
"""

def spam():
    """This is a multiline comment to help
    explain what the spam() function does."""
    print('Hello!')

### Indexing and Slicing Strings

Strings use indexes and slices the same way lists do. You can think of the string 'Hello, world!' as a list and each character in the string as an item with a corresponding index.

'   H   e   l   l   o   ,       w   o   r   l    d    !   '
    0   1   2   3   4   5   6   7   8   9   10   11   12

The space and exclamation point are included in the character count, so 'Hello, world!' is 13 characters long, from H at index 0 to ! at index 12.

In [9]:
spam = 'Hello, world!'
print(spam[0])
print(spam[4])
print(spam[-1])
print(spam[0:5])
print(spam[:5])
print(spam[7:])

H
o
!
Hello
Hello
world!


Note that slicing a string does not modify the original string. You can capture a slice from one variable in a separate variable.

In [10]:
spam = 'Hello, world!'
fizz = spam[0:5]
print(fizz)

Hello


### The in and not in Operators with Strings

The in and not in operators can be used with strings just like with list values. An expression with two strings joined using in or not in will evaluate to a Boolean True or False. These expressions test whether the first string (the exact string, case-sensitive) can be found within the second string

In [12]:
print('Hello' in 'Hello, World')
print('HELLO' in 'Hello, World')
print('' in 'spam')
print('cats' not in 'cats and dogs')

True
False
True
False


### Putting Strings Inside Other Strings (f-strings and interpolation)

Putting strings inside other strings is a common operation in programming. So far, we’ve been using the + operator and string concatenation to do this:

In [13]:
name = 'Al'
age = 4000
print('Hello, my name is ' + name + '. I am ' + str(age) + ' years old.')

Hello, my name is Al. I am 4000 years old.


However, this requires a lot of tedious typing. A simpler approach is to use string interpolation, in which the %s operator inside the string acts as a marker to be replaced by values following the string. One benefit of string interpolation is that str() doesn’t have to be called to convert values to strings.

In [14]:
name = 'Al'
age = 4000
print('My name is %s. I am %s years old.' % (name, age))# note doesnt require str() for age

My name is Al. I am 4000 years old.


Python 3.6 introduced f-strings, which is similar to string interpolation except that braces are used instead of %s, with the expressions placed directly inside the braces. Like raw strings, f-strings have an f prefix before the starting quotation mark.

In [16]:
name = 'Al'
age = 4000
print(f'My name is {name}. Next year I will be {age + 1}.') # don't forget the f prefix in front

My name is Al. Next year I will be 4001.


## Useful String Methods
Several string methods analyze strings or create transformed string values. This section describes the methods you’ll be using most often.

### The upper(), lower(), isupper(), and islower() Methods

The upper() and lower() string methods return a new string where all the letters in the original string have been converted to uppercase or lowercase, respectively. Nonletter characters in the string remain unchanged.

In [20]:
spam = 'Hello, world!'
spam = spam.upper()
print(spam)

spam = spam.lower()
print(spam)

HELLO, WORLD!
hello, world!


Note that these methods do not change the string itself but return new string values. If you want to change the original string, you have to call upper() or lower() on the string and then assign the new string to the variable where the original was stored. This is why you must use spam = spam.upper() to change the string in spam instead of simply spam.upper(). (This is just like if a variable eggs contains the value 10. Writing eggs + 3 does not change the value of eggs, but eggs = eggs + 3 does.)

The upper() and lower() methods are helpful if you need to make a case-insensitive comparison. For example, the strings 'great' and 'GREat' are not equal to each other. But in the following small program, it does not matter whether the user types Great, GREAT, or grEAT, because the string is first converted to lowercase.

When you run this program, the question is displayed, and entering a variation on great, such as GREat, will still give the output I feel great too. Adding code to your program to handle variations or mistakes in user input, such as inconsistent capitalization, will make your programs easier to use and less likely to fail.

In [21]:
print('How are you?')
feeling = input()
if feeling.lower() == 'great':
    print('I feel great too.')
else:
    print('I hope the rest of your day is good.')

How are you?
I feel great too.


The isupper() and islower() methods will return a Boolean True value if the string has at least one letter and all the letters are uppercase or lowercase, respectively. Otherwise, the method returns False.

In [None]:
spam = 'Hello world!'
print(spam.islower()) # Will return False
print(spam.isupper()) # Will return False

print('HELLO'.isupper()) # Will return True
print('abc12345'.islower()) # Will return True (all letters are lowercase)
print('12345'.isupper()) # Will return False (no lowercase letters present)


False
False
True
True
False


Since the upper() and lower() string methods themselves return strings, you can call string methods on those returned string values as well. Expressions that do this will look like a chain of method calls.

In [31]:
print('Hello'.upper())
print('Hello'.upper().lower())
print('Hello'.upper().lower().upper())
print('HELLO'.lower().islower())

HELLO
hello
HELLO
True


### The isX() methods
Along with islower() and isupper(), there are several other string methods that have names beginning with the word is. These methods return a Boolean value that describes the nature of the string. Here are some common isX string methods:

isalpha() Returns True if the string consists only of letters and isn’t blank

isalnum() Returns True if the string consists only of letters and numbers and is not blank

isdecimal() Returns True if the string consists only of numeric characters and is not blank

isspace() Returns True if the string consists only of spaces, tabs, and newlines and is not blank

istitle() Returns True if the string consists only of words that begin with an uppercase letter followed by only lowercase letters


In [33]:
print('hello'.isalpha()) # True because consists of only letters and isn't blank

print('hello123'.isalpha()) # False because is not only letters

print('hello123'.isalnum()) # True

print('hello'.isalnum()) # True because only consists of letters and numbers

print('123'.isdecimal()) # True

print('    '.isspace()) # True

print('This Is Title Case'.istitle()) # True

print('This Is Title Case 123'.istitle()) # True

print('This Is not Title Case'.istitle()) # False

print('This Is NOT Title Case Either'.istitle()) # False

True
False
True
True
True
True
True
True
False
False


### Validating user input
The isX() string methods are helpful when you need to validate user input. For example, the following program repeatedly asks users for their age and a password until they provide valid input.

In [34]:
while True:
    print('Enter your age:')
    age = input()
    if age.isdecimal(): # if returned as False, the loop will continue again
        break
    print('Please enter a number for your age.')

while True:
    print('Select a new password (letters and numbers only):')
    password = input()
    if password.isalnum(): # if returned as False, the loop will continue again
        break
    print('Passwords can only have letters and numbers.')

Enter your age:
Please enter a number for your age.
Enter your age:
Select a new password (letters and numbers only):
Passwords can only have letters and numbers.
Select a new password (letters and numbers only):


### The startswith() and endwith() Methods
The startswith() and endswith() methods return True if the string value they are called on begins or ends (respectively) with the string passed to the method; otherwise, they return False.

These methods are useful alternatives to the == equals operator if you need to check only whether the first or last part of the string, rather than the whole thing, is equal to another string.

In [36]:
print('Hello, world!'.startswith('Hello')) # True

print('Hello, world!'.endswith('world!')) # True

print('abc123'.startswith('abcdef')) # False

print('abc123'.endswith('12')) # False

print('Hello, world!'.startswith('Hello, world!')) # True

print('Hello, world!'.endswith('Hello, world!')) # True

True
True
False
False
True
True
