# Text files

- Critically important file type, storing plain text data.

- Source code, web pages, and LaTeX documents are text files.

- Also commonly used for configuration files, logs, and simple data storage (e.g., CSV, JSON, XML).

# Characteristics

- Text file is a sequence of characters.

- For computer manipulation, each character is assigned a specific code based on a character encoding standard (e.g., ASCII, UTF-8).

- The simplest encoding is ASCII, which uses 7 bits to represent 128 characters.

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

# Python Text File Operations

In [None]:
# Open text file for writing.
# function open:
#  - file: str - The path to the file.
#  - mode: str - The mode in which the file is opened ('r' for reading, 'w' for writing, etc.).
#  - encoding: str - The encoding used to decode or encode the file. 
# Note that it's good to specify encoding to avoid issues with different default encodings on different systems.
f = open('my_text_file.txt', 'w', encoding = "utf-8")


In [None]:
# Write two lines to the file.
f.write('Hello, World!\n')
f.write('Welcome to text file handling in Python.\n')

# Close the file to ensure all data is written and resources are freed.
f.close()

### Problems

In [None]:
# Create your own text file named 'file_0.txt':
#  - Open the file in write mode.
#  - Specify "utf-8" encoding.
#  - Write two lines of some text to it.
#  - Close the file.

In [None]:
# Create a text file named 'file_1.txt':
#  - In the same way as above, but use the default encoding of your system (i.e., do not specify the encoding parameter).

# Reading/Writing mode

```python
f = open('my_file', 'r')   # for reading
f = open('my_file', 'w')  # for writing
f = open('my_file', 'r+') # for reading and writing
f = open('my_file', 'a')  # for appending to the end
```

In [None]:
# Read the file `my_text_file.txt` and print its contents.
f = open('my_text_file.txt', 'r', encoding = "utf-8")
contents = f.read()
print(contents)
f.close()

In [None]:
# Another method is `readlines()`, which reads all the contents into a list of lines.
f = open('my_text_file.txt', 'r', encoding = "utf-8")
lines = f.readlines()
for line in lines:
    print(line.strip())  # Using strip() to remove leading/trailing whitespace (we use print that adds its own newline)
f.close()

### Problems

In [None]:
# Read the text files you created in the previous exercise and print their contents to the console.
# If you wanted to do things properly, you would need to specify the encoding you used when writing the files.
# However, for this exercise, use the default encoding and see if it works correctly.
# The result will depend on your system's default encoding.

# File manipulation using with statement

- Using `with` statement is a best practice for file operations in Python as it ensures proper resource management.

- User does not need to explicitly close the file; it is automatically closed when the block inside `with` is exited, even if an error occurs.

- Any object that supports the context management protocol (i.e., has `__enter__` and `__exit__` methods) can be used with `with`.


In [None]:
with open('my_text_file.txt', 'r', encoding = "utf-8") as f:
    contents = f.read()
    print(contents)

### Problems

In [None]:
# Use the with statement to open a text file named 'file_2.txt' in write mode.
# Write the line "This is an example." to it.
# Note that there is no need to explicitly close the file when using with statement.

# More on reading

- When reading a file, imagine a cursor that points to the current position in the file.

- Each read operation advances the cursor by the number of characters read.

- Several methods are available for reading from a file object:

```python
s = f.read()      # reads the whole file
s = f.read(3)     # reads the next 3 characters
s = f.readline()  # reads the next line 
s = f.readlines() # reads all lines and returns a list of them
```

### Problems

In [None]:
# Read the file 'file_2.txt' character by character and print each character to the console.

In [None]:
# Read the file 'file_2.txt' in groups of 5 characters and print each group to the console.
# Note that the last chunk may contain fewer than 5 characters if the total number of characters is not a multiple of 5.
# After the last chunk, the read method will return an empty string, indicating the end of the file.

In [None]:
# Read the file 'file_2.txt' as a list of lines and print each line to the console.

# CSV files

- Comma-Separated Values (CSV) is a common text file format for storing/transferring tabular data.

- Each line in a CSV file represents a row, and values within a row are separated by commas (a different delimiter can also be used).

### Problems

In [2]:
# Create a CSV file named 'data.csv' and write some tabular data to it.
# E.g. two columns: Name, Age, and two rows of data.
# A third line of text (but put it to the first place) should be the header.
with open('data.csv', 'w', encoding = "utf-8") as f:
    f.write("Name,Age\n")
    f.write("Alice,30\n")
    f.write("Bob,25\n")

In [3]:
# Use the csv module to read the CSV file you created above.
import csv
with open('data.csv', 'r', encoding = "utf-8") as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

# Advertisement: pandas

- Pandas library provides powerful tools for handling tabular data.

- In a sense, pandas can be seen as "Excel for Python".

    - Being pythonic, pandas is a powerful tool, unlike Excel...

- Pandas can read from and write to various file formats, including CSV, Excel, JSON, and SQL databases.

- More information will come in a dedicated lesson and Jupyter notebook.

In [None]:
# Read the file `data.csv` using pandas and print its contents.
import pandas as pd
df = pd.read_csv('data.csv') # This creates a DataFrame class instance. More on DataFrames later.
print(df)

# Remark: binary files

- Binary files store data in a format that is not human-readable.

- Compiled programs are binary, as an example.

- Python can read and write binary files using the same `open` function, but with a 'b' added to the mode (e.g., 'rb' for reading binary, 'wb' for writing binary).

- The pickle module is commonly used for serializing and deserializing Python objects to and from binary files.

  - Pickle is Python-specific and not suitable for data exchange with other programming languages.

  - Pickle files store data as a byte stream. This stream represents the internal memory structure of Python objects (lists, dictionaries, custom classes).
  
  - Pickle files do not have a text encoding!

In [4]:
# Create a dictionary and pickle it to a file named 'data.pkl'.
import pickle

d = {'Alice': 30, 'Bob': 25}
with open('data.pkl', 'wb') as pkl_file:
    pickle.dump(d, pkl_file)

In [5]:
# Read the pickled data back.
with open('data.pkl', 'rb') as pkl_file:
    data = pickle.load(pkl_file)
print(data)

### Problems

In [None]:
# Take the `data.csv` file you created earlier and read it using the csv module.
# Experiment with the output:
#  - print the names of all individuals (first column)
#  - print the ages of all individuals (second column)

In [None]:
# Take the `data.csv` file you created earlier and read it using pandas.
# Experiment with the output DataFrame:
#  - print df['Name']
#  - print df['Age']
#  - print df.iloc[0]
#  - print df.iloc[1]

In [None]:
# Pickle the read DataFrame to a file named 'data_df.pkl'.
# Don't forget to open the file in binary write mode, 'wb'.