# Strings

<font color="red">Strings</font> are collections of <font color="red">characters</font>.  
A character is any specific letter, number, or symbol.  "a"  "6" and "%" are all characters, for instance.

In [None]:
s = "Hello World"

In [None]:
# You can use either single or double quotes, but don't mix them!
s = "Hello World"
t = 'Hello World'

The plus sign adds two numbers together.  For strings, it <font color="red">concatenates</font> them, or sticks one on the end of the other.

In [None]:
a = "Hello"
b = "World"
c = a + b
print(c)

# or without variables:
print("Hello" + "World")

# Notice that there isn't a space between these words.
# Play along!  Why not?

In [None]:
# Play along!  Use concatenation (with or without variables) to create/print 5 strings.

In [None]:
# In addition to addition, you can use a multiplication to add multiple "copies" of a string.

a = "yes "
b = 3*a
print(b)

In [7]:
# ***
# There are a number of special characters in Python, and you have to indicate these in a special way.
#
# One way is to use a backslash before the character like this:
stanza = "I think that I shall never see\nA poem as lovely as a tree."
print(stanza)

# In this case the "\n" indicates a "newline character" and acts to "break" the string
# onto the next line.  Sometimes "\r" is used in addition to, or instead of the newline character
# https://en.wikipedia.org/wiki/Newline

I think that I shall never see
A poem as lovely as a tree.


In [9]:
# Python has access to all the UTF-8 characters (https://en.wikipedia.org/wiki/UTF-8) as well
# making it possible to write characters in different languages, mathematical formulae, etc.

# This is the character for superscript 2 (like an exponent)
ss2 = u"\u00B2"
print(ss2)

²


In [11]:
# Notice that the embedding is similar.  A "u" precedes the string definition, and the backslash-u
# is used to "escape" to a unicode, 4-character definition.  Here's how you could use it:
formula = u"2\u00B2 = 4"
print(formula)

2² = 4


In [None]:
# When you need special characters, you can Google for them.  I found this one here:
# http://www.fileformat.info/info/unicode/char/b2/index.htm
# Which includes "Python Source Code" that you can copy and paste

# Play along!
# Define a python string that includes:
# "enye" the spanish n with a tilde over it: https://en.wikipedia.org/wiki/%C3%91

In [23]:
# ***
# You can even put quotation marks in your string, but you have to play a bit:
quote = 'John Snow says, "Winter is Coming."'
print(quote)

John Snow says, "Winter is Coming."


## Built in String Methods
Strings are actually <font color="red">objects</font>, and have <font color="red">methods</font> attached to them.  Methods are a lot like functions - they perform a specific task.

You can access these methods by using the period (or <font color="red">dot</font>).  Some methods take variables as input (parameters) but some methods don't.

Calling a method usually looks like this:
<B>object.method(variable1,variable2)</b>
<BR>or like this, if it doesn't require any information:
<B>object.method()</b>

In [None]:
# Here's an example:
a = "Hello"
a.swapcase()

In [None]:
# And two more, each wrapped in a print statement
print(a.lower())
print(a.upper())

In [None]:
# Play along!  In your own words, what did each of these three string methods do?

In [None]:
# Play along!  Use each of these three string methods on a string you define.

# Indexing

You can grab (or <font color="red">slice</font>) parts of a string by using its <font color="red">index</font>.

Like most languages, Python is <font color="red">zero-based</font>, meaning the first item in a list, array, etc., has the number ZERO, not ONE.

H E L L O
<p>0 1 2 3 4

H is the 0th character here, O is the 4th, etc.

In [None]:
# You can find out how many characters are in a string by using the len() function.

x = "cat"
len(x)

In [None]:
x = "supercalifragilisticexpialidocious"
len(x)

In [None]:
# Play along!  Define three strings, each a short sentence, and find out how long they are.

In [None]:
my_string = "The lazy dog jumped over the quick brown fox."
print(len(my_string))

## How to index

Use <font color="red">square brackets</font> to index into a string:

In [None]:
x = "Hello"

# Grab the 0th character
x[0]

In [None]:
# Grab the 4th character
x[4]

In [None]:
# Keep in mind that while indexes are zero-based, lengths are not!
# "Hello" has five characters:
len("Hello")

In [None]:
# But you can't grab the "5th" character, because it doesn't exist!
"Hello"[5]

In [None]:
# Note the error you're getting: "String index out of range"
# This is an easy mistake to make, and you'll probably make it often.

# Play along!  Make three of these errors on purpose.

In [None]:
# Play along!
# Define a string of 10-15 characters, and grab specific characters using zero or positive integer
# indices (0, 5, 7, etc.)
# Be sure you know which letter you're trying to grab!  If I define:
a = "Billy_Goat."
# I might say, I want to grab the "y"
print(a[5])
# Oops, I missed!  I forgot Python is zero based.  Let me try again:
(a[4])
# Much better!


In [None]:
# Negative numbers can also be used to index, they just reference from the "back" of the string
# Negative one will grab the last character, for example.

x = "Hello"
x[-1]

In [None]:
# -5 will grab the fifth character from the back.  Note that from this end, Python IS effectively
# ONE-BASED.
x[-5]

In [None]:
# Play along!  Define a string of 10-15 characters, and use negative indices 5 five 
# times to grab characters based on their distance from the end of the string.

In [None]:
# You can also grab RANGES of data, using the COLON operator
x = "Pickle_Rick"
print(x[2:6])

# This grabs from the 2nd character ('c') UP TO BUT NOT INCLUDING the 6th character ('_')
# in other words, the 2nd, 3rd, 4th, and 5th characters.
#
# The UP TO BUT NOT INLUDING part is tricky, and easy to forget.  That's why we practice!

In [None]:
# Play along!  Define a longish string, and print out five SLICES that you take from it.

In [None]:
# You don't have to fill in both sides of the colon.  If you leave the last one off,
# it'll SLICE to the end.  If you leave the first one blank, it'll SLICE from the beginning.

# Play along!  Define a string, and slice from the beginning and end.
# What happens if you don't fill in either side of the colon?

In [None]:
# STEPS
# You can also define a STEP value, to grab every second character, or every third, and so on:

# As before:
# the index to the right of the colon is the START and
# the second index is the STOP
# The third is the STEP

a = 'abcdefghijklmnopqrstuvwxyz'
print(a[0:-1:2])

In [None]:
# Play along!  Can you modify this code to grab every third character?  Every fifth?

In [13]:
# If you just want to grab every nth character, you don't need to fill in the START or STOP, just
# the STEP.
#
# It's a handy shortcut!

a = 'abcdefghijklmnopqrstuvwxyz'
print([a[::5]])

['afkpuz']


In [None]:
# What happens if your second value (the STOP) is too long?  
a = 'Pickle_Rick'
print(a[0:999])

In [None]:
# Play along!  
# How is this behavior different than what we saw before?  
# What happens if you try to grab ONLY the 999th character?

In [None]:
# So far we've grabbed regular patterns, but what if we want some specific characters?
x = "The lazy dog jumped over the quick brown fox."
x[4,15,16]

In [None]:
# This doesn't work, but we can use a built in tool to do it.

# First, we IMPORT the tool, which is found in the OPERATOR package
from operator import itemgetter

# Then we can use the tool to grab what we want
x = "The lazy dog jumped over the quick brown fox."
itemgetter(4,15,16)(x)

# This doesn't need to make a lot of sense yet!  This is just good to have seen once, and I'm leaving
# it here for when you might need it later.

## Strings are immutable

Unlike <font color="red">lists</font>, which we'll see in a moment, strings cannot be changed directly. 

In [None]:
a = "heblo"
# This won't work, for example:
a[2] = "l"

In [None]:
# As we've seen, you can use the DOT OPERATOR to transform a string
# 

a = 'hello'
print(a.capitalize())

# But notice that it doesn't change the original string:
print(a)

# To make this change "stick", you need to reassign back to the original (or new) variable
a = a.capitalize()
print(a)

In [None]:
# Play along!  
# First take a look at this page:
# https://docs.python.org/3.6/library/stdtypes.html#textseq
# 
# What google search terms could you use to find that page?  (Try it out, don't just guess!)

# Now, define a string with 10-15 characters.  Use three methods (see below)
# you haven't yet used to transform
# the string using syntax like: x= x.method() and print it out.  See the example:
a = "My feet are cold."
a = a.title()
print(a)

# In three codeblocks, demonstrate "replace", "capitalize", and "upper"

In [None]:
# Play along!

# Many of the methods here are useful for returning a boolean, testing whether something is true:

print("5".isdigit())
print("a".isdigit())


In [None]:
print(" ".isspace())
print("q".isspace())

In [None]:
# Play along!  Demonstrate 2 more "is" methods here

In [None]:
# Many methods are also useful to search for substrings:

pets = "cat, dog, rat, bear"

# One easy one is the IN operator
"bear" in pets

In [None]:
# You can check at the end, or the beginning:
pets.endswith("bear")

In [None]:
pets.startswith("dog")

In [None]:
# Sometimes user input can be unreliable, especially with capitalization.  See this example:

x = input("What is your quest?")
print("You said your quest was to: ",x)

if "grail" in x.lower():
    print("Enter!")
else:
    print("You are thrown off the bridge.")
    
# Play along!  Why is it helpful to have the "lower" method here?


In [None]:
# You can import the "string" module to get access to some useful strings.
# These predefined strings are examples of CONSTANTS, or variables that have been defined but will
# not change.  There are many constants already defined for you!

import string
all_the_digits = string.digits
print(all_the_digits)

In [None]:
# Play along!  What kinds of constants (for example "pi") might you need at some point?
# Name three.  (You don't need to find them.)

# Lists

<Font color="red">Lists</font> are ordered sequences of values.  These are often numbers, but can be characters, strings, booleans, or any other objects.  <font color="red">Square brackets</font> mark them, and individual values are separated by commas.

In [None]:
# List examples:
x = [5,1,9,10,12]
print(x)

x = ['Sam',4,"cat",True]
print(x)

In [None]:
# Play along!  Define five lists.

In [None]:
# Strings and lists are related!  A string is just a sequence of characters, after all.
p = "Thomas"
q = ['T','h','o','m','a','s']
print(p)
print(q)

In [None]:
# It's easy to turn a string into a list:
name = 'Thomas'
name_as_list = list(name)
print(name)
print(name_as_list)

In [None]:
# Play along!
# What is casting?  Look back in your notes, and write code to cast "5" as a float.

# How is what we just saw with changing a string to a list like casting?

In [None]:
# You can go the other way, too, but the syntax is odd.
# You take an empty string, and join the list you want to turn into a string
x = ['c','a','t']
print(x)
x = ''.join(x)
print(x)

In [None]:
# This may not be doing what you think.  Try this:
# **x = ['c','a','t']
x = 'x'.join(x)
print(x)

# Play along!
# In your own words, what did this do?

In [15]:
# ***
# One particularly handy method to turn strings into lists is the "split" method.
# 
# In this case, you give the method a character (or string) you'd like to use as a 
# DELIMITER, and it will cut there like scissors, with each piece of the string an element in 
# list.
# 
# For those of you with programming experience, this is a string tokenizer.
# (A TOKEN is a piece of the list.)

a = "My name is Luka. I live on the second floor."
a_in_pieces = a.split(" ")  # Using the space character (" ") as a delimiter
print(a_in_pieces)

['My', 'name', 'is', 'Luka.', 'I', 'live', 'on', 'the', 'second', 'floor.']


In [16]:
# Notice that here I've put in two spaces after the period
# Play along!
# What happens when you split it?  How is it different than the previous example?
a = "My name is Luka.  I live on the second floor."

['My', 'name', 'is', 'Luka.', '', 'I', 'live', 'on', 'the', 'second', 'floor.']


In [19]:
# You can use other characters, or even multiple characters to split:

groceries = 'Milk, eggs, butter, bread'
grocery_list = groceries.split(', ')  # Here I split on a comma FOLLOWED BY a space
print(grocery_list)

['Milk', 'eggs', 'butter', 'bread']


In [None]:
# We don't typically create lists manually, though.  We'll often use other functions to make 
# a bit list of numbers all at once.  We'll use NUMPY to do that soon, but for now, we'll use 
# range
x = list(range(0,10))
print(x)

In [None]:
# Range works exactly like indexing, but instead of colons, you use commas.  
# range(START, STOP, STEP).  With no step, 1 is assumed.
x = list(range(5,12))
print(x)
y = list(range(3,21,3))
print(y)

In [None]:
# Play along!  
# Use range to define 5 lists of numbers, and play with start, stop, and step.
# Be sure to check to see if they match your expectation!

In [None]:
# Play along!
# What is the output of this code?  Write your answer before you run the codeblock

x = list(range(5,6,11))
print(x)

In [None]:
# Lists are easy to modify.  We can change specific values:

x = [0,1,99,3,4]
print(x)
x[2] = 2
print(x)

In [None]:
# We can also APPEND and EXTEND lists

x = [0,2,4]
print(x)
x.append(6)
print(x)

In [None]:
# What happens here?
x = [0,2,4]
print(x)
x.append([6,8,10])
print(x)

# Play along!  
# What is the length of this list?
# 
# What happened?

In [None]:
# In this case, you want to use EXTEND, not APPEND:
x = [0,2,4]
print(x)
x.extend([6,8,10])
print(x)
print(len(x))

In [None]:
# Notice that the format for extending strings and lists are different:

# For a list:
my_list = list(range(10))
# Since you're operating on the list itself, you just extend it.
my_list.extend(list(range(10,20)))
print(my_list)

# But for a string, it's different.  Remember, strings are IMMUTABLE so you have to do
# something like this
my_string = 'Super'
my_string = my_string + ' Duper'
print(my_string)

In [None]:
# What happens if you try that on a list?

# Compare this:
my_list = list(range(10))
my_list.extend(list(range(10,20)))
print('Example 1:')
print(my_list)

# With this:
my_list = list(range(10))
my_list = my_list.extend(list(range(10,20)))
print('Example 2:')
print(my_list)

# Play along!
# What's different about these two sections?  
# Did it work?

In [None]:
# As with strings, you can use IN to test membership

x = [6,3,9,11]

print(3 in x)
print(77 in x)

# Equality and Objects

In [None]:
# For simple variables I can do something like this:

x = 5
y = x  # y now has the value of x (5)

# If I add five to x:
x = x + 5

# x is now 10, but y is still 5
print(x)
print(y)

In [None]:
# But watch what happens when I do the same to a list:

x = [4,9,10]
y = x

x.append(15)

print(x)
print(y)

# If things worked as they did with simple variables, x would have "15" on the end
# but Y wouldnt.  
#
# The reason is that objects work differently than simple variables.
# When you use "equals", it doesn't COPY the value, it POINTS TO the value.
# 
# It isn't a copy, it's a REFERENCE.  ("What I do to Trump, I also do to The President.")

In [None]:
# If you want to make a copy, you have to use the copy() METHOD:

x = [1,2,3]
y = x.copy()

x.append(8)

print(x)
print(y)

In [None]:
# Play along!
# Using the 11 methods below, demonstrate how each works, complete with a description.
# You'll need to look these up! https://docs.python.org/3/tutorial/datastructures.html as a start.
# Each should be in its own codeblock.
# append, extend, insert, remove, pop, clear, index, count, sort, reverse, copy

In [None]:
# Play along!
# Answer these questions:
# What is the difference between extend and append?
# When would you want to use copy?  (Why can’t you just say: list_b = list_a?)
# What happens when you use index and the item you’re looking for isn’t there?  