<img src="http://hilpisch.com/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br>

# NLP Basics

**Working with `str` Objects**

&copy; Dr. Yves J. Hilpisch

<a href="http://tpq.io" target="_blank">http://tpq.io</a> | <a href="http://twitter.com/dyjh" target="_blank">@dyjh</a> | <a href="mailto:team@tpq.io">team@tpq.io</a>

## The `string` Module

The `string` module in Python provides various functionalities related to string operations. Here's a systematic list of its capabilities along with code examples:

1. **Constants**:
   - `string.ascii_letters`: A string containing all ASCII letters (both lowercase and uppercase).
   - `string.ascii_lowercase`: A string containing all lowercase ASCII letters.
   - `string.ascii_uppercase`: A string containing all uppercase ASCII letters.
   - `string.digits`: A string containing all ASCII decimal digits.
   - `string.hexdigits`: A string containing all hexadecimal digits.
   - `string.octdigits`: A string containing all octal digits.
   - `string.punctuation`: A string containing ASCII punctuation characters.
   - `string.printable`: A string containing all printable ASCII characters.
   - `string.whitespace`: A string containing all ASCII whitespace characters.

   **Example**:
   ```python
   import string
   print(string.ascii_letters)  # abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
   ```

2. **Custom String Formatting** (`Formatter` class):
   - Allows for customized string formatting. This is an advanced feature that lets you redefine format string syntax.

   **Example**:
   ```python
   formatter = string.Formatter()
   formatted_string = formatter.format("Hello {0}", "World")
   print(formatted_string)  # Hello World
   ```

3. **Utility Functions**:
   - `string.capwords(s)`: Capitalize all words in the string `s`.
   - `string.Template`: A simpler and less powerful mechanism for string formatting, useful for user-facing messages which need to be translated (among other uses).

   **Example**:
   ```python
   print(string.capwords("hello world"))  # Hello World

   t = string.Template('Hello $name!')
   print(t.substitute(name='Alice'))  # Hello Alice!
   ```

These capabilities of the `string` module help in performing various operations related to string manipulation and

formatting in Python. The module is especially useful when dealing with raw strings and when you need to incorporate predefined string constants or perform basic string transformations.

Remember that while `string` module provides specific functionalities, many common string operations can also be directly performed using built-in methods on string objects in Python. For example, to convert a string to uppercase, you can use the `.upper()` method on the string itself, without needing to use the `string` module.

In [None]:
!git clone https://github.com/tpq-classes/natural_language_processing.git
import sys
sys.path.append('natural_language_processing')


In [None]:
lc = 'abcdefg'

In [None]:
lc[:4]

In [None]:
lc[4:]

In [None]:
import string

In [None]:
string.ascii_lowercase

In [None]:
string.ascii_uppercase

In [None]:
string.ascii_letters

In [None]:
string.ascii_lowercase + string.ascii_uppercase

In [None]:
string.digits

In [None]:
string.hexdigits

In [None]:
lc = string.ascii_lowercase

In [None]:
lc

In [None]:
lc[:6]

In [None]:
list(lc[:6])

In [None]:
list('We do NLP with Python.')

In [None]:
len(lc)

In [None]:
string.punctuation

In [None]:
import random

In [None]:
c = string.ascii_lowercase
c += string.ascii_uppercase
c += string.digits
c += string.punctuation
c

In [None]:
len(c)

In [None]:
random.choice(c)

In [None]:
pwl = [random.choice(c) for _ in range(15)]
pwl

In [None]:
''.join(pwl)

In [None]:
len('')

## IO of Text Files

Reading from and writing to text files is a common task in Python, and it's typically done using the built-in `open()` function. Here are several examples demonstrating how to read from and write to text files:

### 1. Writing a String to a Text File

```python
text_to_write = "Hello, Python!"
with open("example.txt", "w") as file:
    file.write(text_to_write)
```
This code writes the string `"Hello, Python!"` to a file named `example.txt`. The `"w"` mode opens the file for writing.

### 2. Reading a String from a Text File

```python
with open("example.txt", "r") as file:
    text_from_file = file.read()
print(text_from_file)
```
This reads the entire content of `example.txt` into a string and prints it.

### 3. Writing Multiple Lines to a Text File

```python
lines_to_write = ["First line", "Second line", "Third line"]
with open("multiple_lines.txt", "w") as file:
    for line in lines_to_write:
        file.write(line + "\n")
```
This writes each string from the list `lines_to_write` to a new line in `multiple_lines.txt`.

### 4. Reading Lines from a Text File into a List

```python
with open("multiple_lines.txt", "r") as file:
    lines_from_file = file.readlines()
print(lines_from_file)
```
This reads all lines from `multiple_lines.txt` into a list, where each element is a line from the file.

### 5. Appending a String to an Existing File

```python
text_to_append = "Appended line"
with open("example.txt", "a") as file:
    file.write("\n" + text_to_append)
```
This appends `"Appended line"` to the existing `example.txt` file.

### 6. Reading and Printing Each Line of a File

```python
with open("multiple_lines.txt", "r") as file:
    for line in file:
        print(line.strip())
```
This reads each line from `multiple_lines.txt` one by one and prints it, stripping the newline character.

Each of these examples shows basic file operations, a crucial skill for many Python applications, especially those involving data processing or text manipulation.

In [None]:
s = 'This is a simple string object.'
s

In [None]:
len(s)

In [None]:
f = open('example.txt', 'w')

In [None]:
!cat example.txt

In [None]:
f.write(s)

In [None]:
!cat example.txt

In [None]:
f.close()

In [None]:
!cat example.txt

In [None]:
!rm example.txt

In [None]:
with open('example.txt', 'w') as f:
    f.write(s)

In [None]:
!cat example.txt

In [None]:
!rm example.txt

In [None]:
l = ['word1', 'word2', 'word3']

In [None]:
' '.join(l)

In [None]:
with open('example.txt', 'w') as f:
    for w in l:
        f.write(w + '\n')

In [None]:
!cat example.txt

In [None]:
ml = '''
This is a multiline
string object.
Over many lines.
'''

In [None]:
print(ml)

In [None]:
with open('example.txt', 'w') as f:  # write mode
    f.write(ml)

In [None]:
!cat example.txt

In [None]:
with open('example.txt', 'r') as f:  # read mode
    from_disk = f.read()

In [None]:
from_disk

In [None]:
print(from_disk)

In [None]:
with open('example.txt', 'a') as f:  # append mode
    f.write('\n\nSome additional text.')

In [None]:
!cat example.txt

In [None]:
url = 'https://hilpisch.com/walden.txt'

In [None]:
!wget $url

In [None]:
%%time
with open('walden.txt', 'r') as f:
    text = f.read()

In [None]:
len(text)

In [None]:
print(text[:1000])

In [None]:
%time text_s = text.split()

In [None]:
len(text_s)

In [None]:
text_s[:10]

## `str` Objects as Iterables

Strings in Python are iterable, meaning you can loop over them character by character. Here are several examples demonstrating different ways to use strings as iterables:

### 1. Iterating Through Each Character

```python
text = "Hello"
for char in text:
    print(char)
```
This will print each character in the string `"Hello"` on a new line.

### 2. Enumerating Through a String

```python
text = "Python"
for index, char in enumerate(text):
    print(f"Index {index}: {char}")
```
This prints each character along with its index in the string `"Python"`.

### 3. List Comprehension with Strings

```python
text = "World"
char_list = [char for char in text]
print(char_list)
```
This creates a list of characters from the string `"World"`.

### 4. Filtering Characters in a String

```python
text = "Hello123"
letters_only = ''.join(filter(str.isalpha, text))
print(letters_only)
```
This filters out non-alphabetic characters, resulting in `"Hello"`.

### 5. Creating a Dictionary from a String

```python
text = "example"
char_count = {char: text.count(char) for char in text}
print(char_count)
```
This creates a dictionary with each character as a key and its count as the value from the string `"example"`.

### 6. Using `map` with a String

```python
text = "12345"
numbers = list(map(int, text))
print(numbers)
```
This converts each character in the string `"12345"` to an integer, resulting in a list of numbers.

### 7. Zipping Two Strings

```python
text1 = "ABC"
text2 = "123"
combined = zip(text1, text2)
for pair in combined:
    print(pair)
```
This creates pairs of characters from `"ABC"` and `"123"`.

### 8. Using `itertools` with Strings

```python
import itertools

text = "AB"
combinations = itertools.product(text, repeat=2)
for combo in combinations:
    print(''.join(combo))
```
This prints all combinations of characters in the string `"AB"` with a length of 2.

Each of these examples demonstrates a different way of treating strings as iterables in Python, showcasing the versatility and ease of string manipulation in the language.

In [None]:
text = "Hello"
for i in range(len(text)):  # not Pythonic (!)
    print(text[i], end='|')

In [None]:
text = "Hello"
for char in text:  # Pythonic way
    print(char, end='|')

In [None]:
text = "Hello"
for i, char in enumerate(text):  # Pythonic way, if index values are needed
    print(i, char)

In [None]:
import itertools

In [None]:
text = "ABCD1234"
combinations = itertools.product(text, repeat=2)
for combo in combinations:
    print(''.join(combo), end=' ')

## `ord` and `chr`

The `ord()` and `chr()` functions in Python are inverses of each other and are used for character and ordinal value conversions. Here are simple use cases and examples for both:

### 1. Using `ord()` to Get the Ordinal Value of a Character
`ord()` function takes a character (a string of length 1) and returns its corresponding Unicode code point (an integer).

**Example**:
```python
# Get the ordinal value of a character
char = 'A'
ord_value = ord(char)
print(f"The ordinal value of '{char}' is {ord_value}")
```
This code will output the ordinal value of `'A'`, which is `65`.

### 2. Using `chr()` to Convert an Ordinal Value to its Character
`chr()` function takes an integer (ordinal value) and returns the corresponding character.

**Example**:
```python
# Convert an ordinal value to its corresponding character
ord_value = 97
char = chr(ord_value)
print(f"The character for ordinal value {ord_value} is '{char}'")
```
This code will output the character `'a'`, which is the character for the ordinal value `97`.

### Combined Use of `ord()` and `chr()`
These functions are often used together in scenarios like encoding and decoding, or when performing operations that require conversion between characters and their corresponding Unicode code points.

**Example**:
```python
# Encrypting a character by shifting its ordinal value
def encrypt(char, shift):
    return chr(ord(char) + shift)

# Decrypting the character
def decrypt(char, shift):
    return chr(ord(char) - shift)

original_char = 'B'
shift = 3

encrypted_char = encrypt(original_char, shift)
decrypted_char = decrypt(encrypted_char, shift)

print(f"Original: {original_char}, Encrypted: {encrypted_char}, Decrypted: {decrypted_char}")
```
This code shifts the character `'B'` by 3 positions in the Unicode table, then shifts back to get the original character.

These examples demonstrate the simplicity and usefulness of `ord()` and `chr()` for character and number conversions in Python.

In [None]:
ord('a')

In [None]:
ord('Q')

In [None]:
ord('6')

In [None]:
ord('!')

In [None]:
chr(55)

In [None]:
chr(125)

In [None]:
chr(3)

In [None]:
chr(2000)

In [None]:
chr(20000)

In [None]:
s = 'This text is transformed into a list of numbers.'

In [None]:
n = [ord(c) for c in s]
n[:10]

In [None]:
n[10]

In [None]:
''.join([chr(v) for v in n])

<img src="http://hilpisch.com/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br>

<a href="http://tpq.io" target="_blank">http://tpq.io</a> | <a href="http://twitter.com/dyjh" target="_blank">@dyjh</a> | <a href="mailto:team@tpq.io">team@tpq.io</a>