# Strings

- Strings are an ordered sequence of characters i.e, a collection of characters, which are enclosed in either single or double quotes.

- For example, `'hello'`, `"Mario Luigi Ken"`, `"I don't want to talk about it!"` are some valid strings.

- Strings are **immutable** ,i.e, once assigned, individual characters cannot be modified.

- Strings support 2 major operators: **indexing** and **slicing**


In [1]:
s = 'hello'
print(s)
print(type(s))

hello
<class 'str'>


In [3]:
s = 'Mario is famous!'  # This is variable reassignment
print(s)
print(type(s))

Mario is famous!
<class 'str'>


In [4]:
s = 'I don't wanna talk about it'
# Note the syntax highlighting, its invalid

SyntaxError: invalid syntax (3110106720.py, line 1)

In [5]:
s = "I don't wanna talk about it"
print(s)
print(type(s))

I don't wanna talk about it
<class 'str'>


## The `len()` function

- The `len()` function can be used to find the length of the string. (in general, an iterable)

- It returns the **length of the string**.


In [7]:
s = "Hello World"

len(s)

11

## String indexing

- Each character of a string is assigned an integer index.

- The first character has index `0`, the second character has index `1`, ..., the i<sup>th</sup> character has index `i`.

- Strings use the `[]` (square-bracket) notation for indexing, and we can use this to grab a character from the string.

- Indexing is 0 based.

- String also support **Reverse Indexing**, where the last character has index `-1`, penultimate character has index `-2` and so on.

![image.png](attachment:image.png)

In [9]:
s = "Hello World"

In [10]:
print(s[1]) # e
print(s[-10]) # e

e
e


In [11]:
print(s[6]) # W
print(s[-5]) # W

W
W


In [12]:
print(s[10]) # d
print(s[-1]) # d

d
d


## String Slicing

- String slicing is used to grab a sub section of the string. (Subsequence)

- It uses `[]` notation with 3 parameters: `start`, `stop`, `step`.

- The `stop` parameter is compulsory and the remaining 2 are optional.

- `start` specifies the starting index of the sub section.

- `stop` specifies the ending index (upto but not including => this index is not included in the sub section) of the sub section.

- `step` specifies the size of the jump between consecutive character of the sub section.

- The syntax is: `str_variable_name[start:stop:step]` which returns the sub section of the string.


In [13]:
s = "ABCDEFGHIJKLMN"

In [14]:
# Grab the entire string => method 1: dont specify any parameter
s[::]

'ABCDEFGHIJKLMN'

In [15]:
# Entire string => method 2
s[0:len(s):1]

'ABCDEFGHIJKLMN'

In [16]:
# Grab everything from index 3 upto end
s[3:]

'DEFGHIJKLMN'

In [17]:
# Grab everything from starting upto but not including index 5
s[:5]

'ABCDE'

In [18]:
# Grab all characters at even indices
s[::2]

'ACEGIKM'

In [22]:
# Grab subsection from index 2 to 8
s[2:9]

'CDEFGHI'

In [20]:
# Grab subsection from index 2 to 8 with jump size 3
print(s[2:9:3])
s[2:8+1:3]

CFI


'CFI'

In [21]:
# Reverse of a string trick
s[::-1]
# everything from start to end, with a negative step => -1 -> -2 -> -3 -> ... -> 0

'NMLKJIHGFEDCBA'

## String concatenation & multiplication

- String concatenation is basically joining together two|more strings. This is done by the `+` operator. (+ operator overloading. With strings, it does string concatenation while with numbers, it performs addition)

- With String multiplication, the same string is concatenated `N` number of times, using the `*` operator. (* operator overloading) 


In [23]:
first_name = 'Tucker'
last_name = 'Budzyn'

In [24]:
# concatenation
name = first_name + last_name
print(name)

TuckerBudzyn


In [25]:
# using white space
name = first_name + ' ' + last_name
print(name)

Tucker Budzyn


In [27]:
# string multiplication
first_name_1 = 'Todd'
lol = first_name_1 * 5 # Concatenated 5 times
print(lol)

ToddToddToddToddTodd


## Iterating through a string

- We can use `for` or `while` loop to iterate through a string.

Here are some useful methods:

- `len(s)` returns the length of string `s`.

- `range(n)` generates the indices from `0` to `n - 1` (inclusive).




In [16]:
name = 'sheldon'

for ch in name:
    print(ch, end = ' ')
    ch = 'a' # doesnt modify the original string

print('')
print(name)

s h e l d o n 
sheldon


In [13]:
n = len(name)

for i in range(n):
    print(name[i], end = ' ')
    # name[i] = 'a' ==> this raises an error

s h e l d o n 

## `chr()` and `ord()` methods

- `chr(n)` returns the character whose Unicode point is `n`. For example, `chr(97)` returns 'a'.

- `ord(c)` returns the Unicode point of the given character `c`. For example, `ord('a')` returns `97`.


In [2]:
name = 'tucker'
for ch in name:
    unicode = ord(ch)
    char = chr(unicode)
    print(ch, unicode, char)

t 116 t
u 117 u
c 99 c
k 107 k
e 101 e
r 114 r


## `upper()` and `lower()` methods

- Strings are objects of String class. They have some methods.

- Syntax for execution the methods is: `<variable_name>.<method_name>(argument_list)`

- `upper()` returns the uppercase version of the string.

- `lower()` returns the lowercase version of the string.

In [2]:
s = 'Sheldon Cooper'

# upper()
s_upper = s.upper()
print(s_upper)
print(s) # doesnt get modified

SHELDON COOPER
Sheldon Cooper


In [3]:
# lower()
s_lower = s.lower()
print(s_lower)
print(s)

sheldon cooper
Sheldon Cooper


## `split()` method

- The `split()` method breaks down the string based on a **delimiter** into a list of strings.

- Syntax is `.split(delimiter)`. Here, the use of `delimiter` argument is optional.

- If `delimiter` is not specified, the string is split based on white space (` `).

- Otherwise, the `delimiter` can be any character|string.


In [4]:
s = 'Sheldon Leonard Howard Raj Penny'
names_1 = s.split(' ')
names_2 = s.split()
print(names_1)
print(names_2)

['Sheldon', 'Leonard', 'Howard', 'Raj', 'Penny']
['Sheldon', 'Leonard', 'Howard', 'Raj', 'Penny']


In [6]:
# if you specify an empty separator, you get an error
s = 'ABCDEFGHIJK LMNOPQRSTUV'
arr = s.split('')
print(arr)

ValueError: empty separator

In [8]:
s = 'dsalgo algodsa dsalgoalgoalgo dsalgo algodsa'
arr = s.split('go')
print(arr)

['dsal', ' al', 'dsa dsal', 'al', 'al', ' dsal', ' al', 'dsa']


In [9]:
s = 'shawn_spencer@sbpd.com'
name = s.split('@')[0]
print(name)

shawn_spencer
