---

# Python Part 1. Strings 

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/RandyRDavila/Data_Science_and_Machine_Learning_Spring_2022/blob/main/Lecture_1/Python_Part_1_Strings.ipynb)



Python strings are sequences of characters and symbols surrounded by ``` " " ``` or by ``` ' ' ```, and will be used in many contexts, such as when writing *doc strings*, printing descriptive output, and when naming the columns of tabular data (*DataFrames*). Strings are also often the first representation of data read into a program before being converted to numerical values (if needed). To illustrate how strings work consider typing the following code into the cell below:

```python
# This is one way to define a string
first_name = "John"

# This is another way to define a string
last_name = 'Doe'

```

**Tip 1.** When writing code in a professional manor try and stay consistant with how you define your strings. 

**Tip 2.** This code is an example of **variable assignment**. Variable names must begin with a non-numerical value and should always be lower case. Also, it is standard in Python to use underscores to seperate words in variable names. 

---

In [1]:
# This is one way to define a string
first_name = "John"

# This is another way to define a string
last_name = 'Doe'

---

You can check the values of the variables ```first_name``` and ```last_name``` by printing them with the ```print()``` function:
```python
print(first_name)

print(last_name)
```

---

In [2]:
print(first_name)

print(last_name)

John
Doe


---

We can insert a newline after a print statement by including the *new line character* ```"\n"``` in the print function. Compare the following code with that above:
```python
print(first_name, "\n")
print(last_name)

```

---

In [3]:
print(first_name, "\n")
print(last_name)

John 

Doe


---

**String concatenation** is the process of merging one or more strings together. In Python this process is done by way of calling the ```+``` operator. For example, try running this in the following code cell:
```python
first_name + last_name

```

**Note.** If you are just wanting to view the value of one line of code, you do not have to call the ```print()``` function. This only works in the Python REPL and Jupyter notebooks. 

---

In [4]:
first_name + last_name

'JohnDoe'

---

We can easily add a white space (which is itself a character) between the first_name and last_name variable by running:
```python
first_name + " " + last_name
```


---

In [5]:
first_name + " " + last_name

'John Doe'

---

Often times we need to place the value of a variable into a string. This **string interpolation** can be done using *f-strings*. This operation works by placing the letter ```f``` on the left side of your string in question and then placing variables inside ```{}```. For example try running the following code in the cell below:
```python
# Define a new variable age 
age = 34

# String interpolation
f"{first_name} {last_name} is {age} years old"
```

---

In [6]:
# Define a new variable age 
age = 34

# String interpolation
f"{first_name} {last_name} is {age} years old"

'John Doe is 34 years old'

---

**Everything in Python is an object and objects have attributes and methods associated with them.** When encountering a new Python object it is good practice to look up the attributes and methods associated with the object in question. For a list of all strings methods please click [here](https://www.w3schools.com/python/python_ref_string.asp), or try out the following abreviated list chosen at random:

1. ```capitalize()```:  Converts the first character to upper case
2. ```title()```:  Capitalizes the first letter of each word
3. ```count()```:  Returns the number of times a specified value occurs in a string
4. ```find()```:  Searches the string for a specified value and returns the position of where it was found

**The attributes and methods of an object are accessed with:** 
* ```object.attribute```
* ```object.method()``` 

For example, try running the following code in the cell below:

```python
test_string = "hello world"
test_string.title()

```


---

In [7]:
test_string = "hello world"
test_string.title()

'Hello World'

---

Notice that the first character in ```test_string``` now appears capitalized. This output was *returned* by the method ```title()``` and did not alter the variable itself. In particular it is important to note that **strings are immutable**. This can be varified by running the following code in the cell below:
```python
print(test_string)
```



---

In [8]:
print(test_string)

hello world


---

Next lets try the ```count()``` string method. Run the following code in the cell below:
```python
test_string.count()
```


---

In [9]:
test_string.count()

TypeError: count() takes at least 1 argument (0 given)

---
Running the code above should produce the following message:

---

```python
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-32-a6597d389766> in <module>
----> 1 test_string.count()

TypeError: count() takes at least 1 argument (0 given)

```

---


**Tip.** Whenever you encounter an error it is important to quickly view the last line of the error output in order to quickly diagnose what is going wrong. 

In our case we see the line ```TypeError: count() takes at least 1 argument (0 given)```. This tells me that the ```count()``` string method takes at least one argument. We can access help with this problem by calling the ```help()``` function built into base Python. Run the following code in the cell below:
```python
help(test_string.count)
```


---

In [10]:
help(test_string.count)

Help on built-in function count:

count(...) method of builtins.str instance
    S.count(sub[, start[, end]]) -> int
    
    Return the number of non-overlapping occurrences of substring sub in
    string S[start:end].  Optional arguments start and end are
    interpreted as in slice notation.



---

Apparently the ```count()``` string method requires one substring argument and returns the number of occurences of this substring. So for example, try running the following code in the cell below:
```python
print(test_string, "\n")
test_string.count("l")
```

---

In [11]:
print(test_string, "\n")
test_string.count("l")

hello world 



3

---

There are so many more string methods that I hope you explore on your own since we do not have time to explore them all here. Before moving to the next string topic try running one of my favorite string methods in the cell below:

```python
txt = "I like bananas"
x = txt.replace("bananas", "apples")
print(x)
```


---

In [12]:
txt = "I like bananas"
x = txt.replace("bananas", "apples")
print(x)

I like apples


---

## Strings are Sequences of Characters

Next recall that **strings are sequences of characters** and hence must have a length! We can find out how many characters are in a string by calling the ```len()``` function. For example try:
```python

full_name = first_name + " " + last_name 
print(f"{full_name} contains {len(full_name)} characters")
```


---

In [13]:
full_name = first_name + " " + last_name 
print(f"{full_name} contains {len(full_name)} characters")

John Doe contains 8 characters


---

## String Indexing

Since sequences are *ordered* and since strings are sequences of characters, it is natural to assume that characters in a string have *location* specified by a given index. In Python (and most other programming languages) indexing starts at 0. For example try running the following code in the cell below:
```python
test_string = "abcdefg"
print(f"The character at the 0 index of {test_string} is {test_string[0]} \n")
print(f"The character at the 1 index of {test_string} is {test_string[1]} \n")
print(f"The character at the 2 index of {test_string} is {test_string[2]} \n")
```


---

In [14]:
test_string = "abcdefg"
print(f"The character at the 0 index of {test_string} is {test_string[0]} \n")
print(f"The character at the 1 index of {test_string} is {test_string[1]} \n")
print(f"The character at the 2 index of {test_string} is {test_string[2]} \n")

The character at the 0 index of abcdefg is a 

The character at the 1 index of abcdefg is b 

The character at the 2 index of abcdefg is c 



---

**We can also reverse index**:

```python
test_string = "abcdefg"
print(f"The character at the -1 index of {test_string} is {test_string[-1]} \n")
print(f"The character at the -2 index of {test_string} is {test_string[-2]} \n")
print(f"The character at the -3 index of {test_string} is {test_string[-3]} \n")
```


What did you notice?

---

In [15]:
test_string = "abcdefg"
print(f"The character at the -1 index of {test_string} is {test_string[-1]} \n")
print(f"The character at the -2 index of {test_string} is {test_string[-2]} \n")
print(f"The character at the -3 index of {test_string} is {test_string[-3]} \n")

The character at the -1 index of abcdefg is g 

The character at the -2 index of abcdefg is f 

The character at the -3 index of abcdefg is e 



---

## String Slicing


Anytime you can index through an object in Python you can also *slice* through the object. For example, try running the following code in the cell below:
```python
print(test_string[1])
print(test_string[2])
print(test_string[3])
print(test_string[1:4])
```
**Note.** The synax ```object[a:b]``` means every entry of your  ```object``` starting at index ```a``` and ending at index ```b-1```.

---

In [16]:
print(test_string[1])

b
