# CSS 201.5 - CSS Bootcamp

## Python Programming

# Python Programming

## Strings in Python

## What is a string?

> A **string** is a *sequence* of characters. It belongs to the `str` type in Python.

A string stores characters as text, and is created using either single (`''`) or double (`""`) quotes.

Note that although strings are often used to store *words*, this isn't necessarily the case. A string could be:

In [None]:
"dog\tand\tcat"

In [None]:
"abcdef"

In [None]:
"1 + 4"

With many more possibilities. Basically, *any* character that you wrap with quotes becomes part of a `str` in Python.

### Multi-line strings

Multi-line strings can be defined using `""" """`, as below.

In [None]:
long_str = """
This string spans multiple lines.
    This is the second line.
\"This is the third line.\"
Umberto's
{} and {}.
"""
print(long_str.format('{}','house'))

### Side note: a `str` is a kind of sequence

> A **sequence** is a collection of items (e.g., numbers, characters, etc.) with some *determined order*.  

A `list` and `str` are both kinds of sequences. 

We'll discuss **sequences** more when we talk about `list`s, but there are a couple of important properties to remember:

- Sequences have a particular *order*.  
- You can **index** into a sequence to obtain the item at a particular position.  

### Checking whether something is a `str`

Recall that you can check the **type** of a variable using `type`.

In [None]:
type("This is a sentence.")

In [None]:
type("1 + 4")

In [None]:
type(1 + 4)

### Check-in

Which of the following variables would evaluate to a `str`?

In [None]:
x1 = 1.5
x2 = True
x3 = "2 * 100"

## Why care about strings?

**Strings** are incredibly useful and versatile, so it's important to understand how they work and how to manipulate them.

Common uses of strings:

- Pretty much all text data is stored as a `str` (e.g., a text corpus, a word, etc.).  
- Storing information that can't be represented as `int` or `bool`, such as **password**.  
- Declaring **features** of an object in Python that can't be represented as `int` or `bool`. 
- Representing a **filename**.

Strings are so useful that virtually all programming languages have something like a `str` type.

## Working with strings: basic operations

Today, we're going to focus on a few **basic operations** we can use with strings. In a future lecture, we'll talk about more complex operations.

The basic operations include:

1. Getting the length (`len`) of a string.  
2. Indexing into a string (`string_name[0]`).  
3. Looping through a string (`for ch in string_name...`).  

You'll note that each of these operations can also be applied to a `list` type!

### Calculating string length with `len`

> The `len` operator calculates the number of characters in a `str` (or `list`).  

In [None]:
x1 = "CSS 201 jhdfkjahsdjfh kfjdhsfkjhasdkjhf. fkjdhsaklfjasdf"
print(len(x1))

In [None]:
x2 = "class"
print(len(x2))

#### Check-in

How many characters are in the string `"2 + 2"`?

Try answering before you try typing in the expression.

#### Spaces count as characters!

An empty space (`" "`) counts as a character in Python.

Thus, the `str` `"big dog"` has one extra character than the `str` `"bigdog"`. 

In [None]:
len("big dog")

In [None]:
len("bigdog")

#### Check-in

How many characters are in the `str` below?

In [None]:
str_test = "Computational Social Science is fun."

#### Putting quotes into a string

Certain characters, like quotes, require an **escape** character if you want to put them into a string. Otherwise they'll simply *end* the string.

In [None]:
quote_str = "Then he said, \"I love CSS!\""
print(quote_str)

### Indexing into a `str`

> In programming, **indexing** into a sequence means retrieving the item at a particular position.

Because a `str` is a kind of sequence, we can retrieve the character at a particular position.

We can index into a `str` (or `list`) using the `string_name[...]` notation, where `...` would be replaced with the **index** of the character we want to retrieve.

In [None]:
test_var = "computer"
test_var[::-1]

#### Note on indexing

Python uses **zero-indexing**: the first element in a sequence is assigned the index `0`, the second is assigned `1`, and so on.

- This can be hard to get used to at first!  
- But over time, it'll start to seem more natural.  

#### Check-in

Which of the indexing operations below would return the letter `"S"`?

In [None]:
s = "CSS"
x1 = s[0]
x2 = s[1]
x3 = s[2]

#### Check-in

Why does the code below return an **error**?

In [None]:
s = "CSS"
s[4]

### Slicing into a `str`

> **Slicing** is like indexing, but allows you to return a *subset* within a sequence.

For example, rather than getting the *n-th* character of a `str`, you can return the characters between index `0` and index `2`.

- To **slice**, use the syntax `[start_index:end_index]`.  
- `start_index` is the index of the first character you want to return.  
- `end_index` is the index of the final character you want to return, plus one.
   - Like `range`, the final index is not "inclusive".  

In [None]:
s = "programming"
s[0:4]

#### Check-in

How many characters would the following **slice** return? *Which* characters would they be?

In [None]:
s = "programming"
subset = s[5:7] ## how many characters is this?

#### Check-in

Write a **slice** operation to return the `str` `"humid"` within the string `"dehumidify"`.

In [None]:
original_str = "dehumidify"
### Your code here

### Looping through strings

> **Looping** through a `str` means repeating some piece of code for each (or a subset) of the characters within a string.

We've already discussed [loops in previous lectures](06-loops), so this will be a brief review:

- A `for` loop **iterates** through each item in a sequence (like a `str`), repeating some piece of code.  
- A `while` loop **continues** as long as some condition is met, and can also be used to iterate through a sequence.

#### Looping with a `for` loop

In [None]:
seq = "CSS"
for i in seq:
    print(i)

#### Looping with a `while` loop

In [None]:
i = 0
seq = "CSS"
while i < len(seq):
    print(seq[i])
    i += 1

## Modifying case

Often, you'll need to modify the **case** of a `str` (i.e., make it either *upper* or *lower* case). 

- One use-case for this is needing to *compare* two strings, but not caring about whether they have identical case. 
- E.g., "APplE" is the same *word* as "apple", but these strings wouldn't evaluate as equal.

In [None]:
"appLe" == "apple"

In [None]:
"apple" == "apple"

'2 * 2' == '4'

### `upper` and `lower`

As the names imply, `upper` and `lower` are both *functions* that you can use on a `str`.  

In [None]:
"APPLE".lower()

In [None]:
"apple".upper()

In [None]:
"APPLE".lower() == "apple"

### `title`

The `title` function is a variant of `upper`/`lower`, which just capitalizes the *first* letter of each word.

In [None]:
og_string = "my name is umberto"
og_string.title()

Note that if you have capital letters *after* the first letter of a word, these will now become lowercase!

In [None]:
og_string = "DNA"
og_string.title()

### Evaluating case

Just as you can **modify** the case of these strings, you can also evaluate it:

- `isupper()` 
- `islower()` 
- `istitle()`

These functions all check whether a string conforms to those patterns.

In [None]:
"CSS".isupper()

In [None]:
"CSS".islower()

In [None]:
"I Love Programming".istitle()

### Check-in

If you called `istitle()` on the following string, would it evaluate to `True` or `False`?

In [None]:
test_str = "I love CSS"
### Your answer/code here

### Other helpful evaluation methods

There are a few other helpful methods for **evaluating** properties of a string:

- `isdigit`: checks if the characters are entirely digits (e.g., $0, 1, ..., 9$)  
- `isalpha`: checks if the characters are entirely alphabetic characters (e.g., `abcd...`). 
- `isspace`: checks if the string is entirely space characters (e.g., ` `). 

## Replacing characters

Another common operation is [**replacing** elements of a string](https://www.w3schools.com/python/ref_string_replace.asp). 

Examples:

- In a `list` of filenames, replacing every `-` with a `_`. 
- Removing certain words or characters, e.g., replacing every instance of a word with a ` `.  

This can be done with the `replace` function.

In [None]:
## Replace "-" with "_"
og_filename = "css-lecture-06"
og_filename.replace("-", "_")

### Replacing the first $N$ instances

`replace` can also be used to replace only the first $N$ instances of a string. 

In [None]:
## Replace only the first instance of "bananas"
og_string = "bananas, apples, bananas, grapes"
og_string.replace("bananas", "oranges", 1)

### Check-in

Use the `replace` function to replace the **first 2 instances** of `-` with `_`.

In [None]:
original_filename = "css-l06-su23-test.py"
### Your code here

### `replace` is case-sensitive

Note that `replace` attempts an **exact match** of the `str` you're looking to replace.

- This includes exact **case match**. 
- `"apple" != "APPLE"`. 

In [None]:
case_mismatch = "I like Apples"
### replace won't do anything here
case_mismatch.replace("apples", "bananas")

In [None]:
case_mismatch = "I like Apples"
### replace will replace it here
case_mismatch.replace("Apples", "bananas")

## Concatenating strings

> String **concatenation** simply means *combining* multiple strings.

Often, you'll need to *combine* the characters in multiple strings.

- Combining the **directory path** and a **filename** to get the full path of a file.
- Combining parts of strings to get a valid **URL**.  
- Combining the first and last name of a client to `print` out the **full name**.

### Approach 1: the `+` operator

The `+` operator can be used to **combine** multiple `str` objects.

In [None]:
"Comput" + "ational"

In [None]:
"css201/" + "lec06/" + "file.py"

#### Check-in

What do you notice about how these strings are combined? Is a space added between each constituent `str` or no?

#### Watch out for spaces (and lack thereof)!

By default, `+` will just combine two different string objects directly.

That is, `"Hello" + "World"` will become `"HelloWorld"`.

If you want to add a space *between* these objects, make sure to add a space character in your concatenation operation.

In [None]:
p1 = "Hello"
p2 = "World"
p1 + " " + p2

#### Check-in

Why does the code below throw an error? 

**Bonus**: What would you need to do to make it *not* throw an error?

In [None]:
2 + " cats"

#### Concatenating an `int` to a `str`

The `+` operator assumes you are concatenating multiple `str` objects. Thus, trying to combine an `int` with a `str` this way will throw an error.

However, you can use **type-casting** to turn the `int` into a `str`, and then combine them.

In [None]:
str(2) + " cats"

#### Check-in

Use the `+` operator to combine the variables below into a single string (in order, i.e., `var1` followed by `var2`, etc.). 
- Add a space between each variable. 
- Watch out for conflicting types!

In [None]:
var1 = "This"
var2 = "Is"
var3 = "CSS"
var4 = 202
#### Your code here

### Approach 2: using `format`

The `format` method can also be used to merge multiple strings together.

- This approach is less intuitive at first, but is very flexible.  
- I use this approach when I'm `print`ing out lots of custom variable values, e.g., as in an output message.

With `format`, you can declare "variables" within a `str` using the `{x}` syntax. 

In [None]:
first = "Smarty"
last = "Student"
print("Hello, {f} {l}".format(f = first, l = last))

#### Check-in

Use `format` to `print` out a message that reads: 

`"Welcome to CSS 201"`.

In [None]:
department = "CSS"
number = "201"
#### Your code here

### Approach 3: using `join`

Another somewhat common use-case is **joining** strings that are currently stored as elements of a list.

The `join` syntax starts with the *character* (or character*s*) you'll be using to **join** each `str` together.

- This could be a space character, an underscore, or anything you want.  
- It then makes a call to `.join(list_name)`. 

In [None]:
separate_str = ['The', 'quick', 'brown', 'fox', 'jumped']
separate_str

In [None]:
" ".join(separate_str)

#### Check-in

Use `join` to turn the following list of directory and sub-directory names into a full file path, connected by the `"\"` symbol. 

In [None]:
dirs = ["css", "201", "lectures", "lec06"]
#### Your code here

### Other approaches

There are a number of [other approaches](https://www.pythontutorial.net/python-string-methods/python-string-concatenation/) to concatenating strings. 

Personally, I primarily use:

- The `format` operator when I'm `print`ing out complicated strings. 
- The `+` operator for everything else.  

## `split`ting a string

Just as you can `join` parts of a `list` into a `str`, you can also `split` a `str` into a `list`!

Common use cases:

- Extracting directories and sub-directories of a file path.  
- **Tokenizing** a sentence, i.e., retrieving all the distinct *words* (e.g., in English, written words are typically separated by spaces).  
- Extracting different **hash-tags** from a tweet (e.g., `"#CSS#Programming"`). 

In [None]:
example_sentence = "The quick brown fox jumped over the lazy dog"
example_sentence.split(" ")

#### Check-in

How many **words** (i.e., character-sequences separated by spaces) are in the sentence below?

Hint: use a combination of `split` and `len` to solve this question.

In [None]:
test_sentence = "This sentence has a number of different words and your goal is to count them"
### Your code here

## Combining lists

Two or more lists can be combined using the `+` operator.

In [None]:
list1 = [1, 2, 3]
list2 = ['4', 5, '6']
list1 + list2

These lists do *not* have to have the same `type` or number of objects.

In [None]:
list3 = ["a", "b"]
list1 + list3

### Check-in

Use the `+` operator to combine the lists below, then use `join` to join the words into a complete sentence (with each word separated by a `" "`).

In [None]:
l1 = ['CSS', '201']
l2 = ['is', 'fun']
### Your code here

## Adding items to a `list`

In addition to using the `+` operator, you can add individual *items* to a list using the `append` function.

- Note that this modifies the list "in place", i.e., it doesn't *return* a value, but rather it mutates the existing `list` object.

In [None]:
fruits = ['apple', 'banana']
fruits.append('orange')
print(fruits)

### Filling up an empty `list`

The `append` function is often used to **fill up** a `list` with items, such as during a `for` loop.

For example, you might:

- Initialize an *empty* list.  
- Loop through numbers between `1` and `100`.
- Add those numbers to the empty list if they're odd.

In [None]:
new_list = [] ### Initialize empty list
for num in range(1, 101): ### Loop through range
    if num % 2 == 1: ### If number is odd...
        new_list.append(num) ### Append it to list
new_list[0:3] ### Get the first three elements of new list

### Check-in

Add the number `4` to the list below using `append`.

In [None]:
sample_list = [1, 2, 3]
### Your code here

### Check-in

The code cell below contains two lists: one contains a list of foods, the other contains a list of words with the letter "a". 

Using `append` and a `for` loop, add the items from `foods` to `a_words` if:

- they contain the letter "a".
- they don't already appear in `a_words`. 

In [None]:
foods = ['apple', 'banana', 'orange', 'kiwi', 'strawberry', 'mango', 'pineapple', 'berry']
a_words = ['board', 'table', 'apple', 'human']
### Your code here
for f in foods:
    if 'a' in f and f not in a_words:
        a_words.append(f)

print(a_words)

### Using `insert`

- The `append` function always adds items to the **end** of a list.  
- Instead, you can use `insert` to insert items at a specific location, such as the start.
- Syntax: `list_name.insert(position, item)`

In [None]:
sample_list = [2, 3, 4]
sample_list.insert(0, 1) ### insert a 1 at the zero-th position
print(sample_list)

## Removing items from a `list`

There are two primary ways to **remove** an item from a list.

- `pop`: this removes the item at a given index (by default, this is the *last* item), and also **returns** that item. 
- `remove`: this removes the first occurrence of a particular *value* from a `list`.

So, roughly:

- `pop` removes by *position*.  
- `remove` removes by *value*.  

### `pop`ping in action

The syntax for `pop` is straightforward: `list_name.pop()`

In [None]:
sample_list = [1, 2, 5, 7]
sample_list.pop() ### by default, returns final element

Now, if we look back at `sample_list`, we see that the final element has indeed been removed.

In [None]:
sample_list

### Check-in

What do you think would happen if we `pop` from an empty list?

In [None]:
empty_list = []
### what would happen if we call empty_list.pop()

### `remov`ing in action

The syntax for `remove` is also straightforward: `list_name.remove(value)`

- Where `value` is the value that you want to remove.  
- Note that unlike `pop`, `remove` does *not* return a particular value, but it does modify the list in place.

In [None]:
sample_list = [1, 2, 5, 7]
sample_list.remove(5)
print(sample_list)

### Check-in

What would happen to `test_list` if we call `test_list.remove("apple")`?

1. `['bread', 'apple', 'cheese', 'apple']`
2. `['bread', 'cheese', 'apple']`
3. `['bread', 'cheese']`

In [None]:
test_list = ['bread', 'apple', 'cheese', 'apple']
### Your code here

## Finding the index of a particular value

The `index` function allows you to return the index corresponding to the *first occurrence* of a particular value.

**Basic syntax**: `list_name.index(value)`

- Note that you can also (optionally) parameterize the `start` and `end` of this search: 
   - `list_name.index(value, start, end)`

In [None]:
test_list = ['bread', 'apple', 'cheese', 'apple', 'house', 'car', 'yard', 'apple']
test_list.index("bread")

In [None]:
### Returns *first* occurrence of "apple"
test_list.index("apple")

In [None]:
### Returns first occurrence of "apple", *after* index = 2
test_list.index("apple", 4, 8)

### Check-in

Use the `index` function to retrieve the index of the first occurrence of the number `10` in the list below.

In [None]:
number_list = [1, 10, 15, 20, 10, 55]
### Your code here

### Check-in

Use the `index` function to retrieve the index of the first occurrence of the number `10` between the indices `2` and `5` in the list below.

In [None]:
number_list = [1, 10, 15, 20, 10, 55, 10]
### Your code here

## `sort`ing a list

> **Sorting** a `list` means rearranging its elements according to some measure of "least" and "greatest".

There are many different [**algorithms** for sorting a list](https://en.wikipedia.org/wiki/Sorting_algorithm), which we won't cover in detail here.

However, in Python, there are two main *functions*:

- `sorted(list)`: returns a sorted version of a `list`.  
- `list.sort()`: sorts a particular `list` **in place**. 

In [None]:
number_list = [2, 1, 9, 5, 3, 4]
sorted_list = sorted(number_list)
sorted_list

In [None]:
number_list = [2, 1, 9, 5, 3, 4]
number_list.sort()
number_list

### Ascending vs. descending?

- By default, `sorted` will sort a list in **ascending** order.
- The `reverse` key allows you to instead sort that list in **descending** order (i.e., largest elements first).



In [None]:
number_list = [2, 1, 9, 5, 3, 4]
sorted_list = sorted(number_list, reverse = True)
sorted_list

### Check-in

The list `names` below is unsorted. Use the `sorted` function to return a new list with the names sorted, in **descending** order.

In [None]:
names = ['Umberto', 'Will', 'Sean', 'Eileen', 'Sam']
names.sort()
names

## Nested lists

A `list` can contain many different `type`s of objects: `str`, `int`, and even other `list`s!

- Each **nested list** can contain further nested lists, or other types of objects.  
- Nested lists do not have to be the same length.

In [None]:
nested_list = [[1, 2, 3],
              ['css', 'poli', 'econ'],
              ['tea', 'coffee'],
              'text',
              [1, 2, 3, 4]]
nested_list[4]

### Check-in

What would `len(nested_list)` return?

In [None]:
nested_list = [[1, 2, 3],
              ['css', 'poli', 'econ'],
              ['tea', 'coffee']]
## Your answer here

### Check-in

Write a `for` loop that iterates through each item in `nested_list`, and prints its length.

In [None]:
nested_list = [[1, 2, 3],
              ['css', 'poli', 'econ'],
              ['tea', 'coffee']]
### Your code here
for elem in nested_list:
    print(len(elem))

### Check-in

Write a `for` loop that iterates through each item in `nested_list`, and prints its length.

## Lists vs. tuples

So far, we've focused on **lists**.

A **tuple** is another type of ordered sequence. They share several similarities with lists:

- You can index into both a **tuple** and a **list**.  
- You can loop through both a **tuple** and a **list**. 

However, there are also a couple key differences:

- Tuples are declared using `()`, not `[]`.  
- Unlike `list`s, a `tuple` is not mutable (i.e., it can't be changed in place).

In [None]:
example_tuple = (1, 2, 3)
example_tuple

### Tuples (continued)

- We won't focus *too much* on tuples for now.  
- However, I wanted to highlight some of those similarities and differences. 
- It's likely that at some point in your journey with Python, you'll end up using or encountering tuples.

In [None]:
for i in example_tuple:
    print(i)

## Conclusion

In this lecture we learned:

1. `strings`
1. `lists` 
1. And how to operate with these objects

Next lecture:

1. `dictionaries`
1. `functions`