
# Python File I/O: Reading and Writing Files with `open`

In this notebook you will learn:
- Using `open()` correctly (modes: `r`, `w`, `a`, `x`, `b`, `t`)
- Safe handling with context managers (`with`)
- Reading strategies: whole file, line-by-line, chunks, iterating
- Writing & appending text and CSV
- JSON & JSON Lines
- Basic TSV and simple YAML



## 1) Paths and Encodings

- Use `encoding="utf-8"` when reading/writing text files.


In [7]:
#Paths can be relative or absolute
data_dir = 'sample_files/'
data_dir = '/Users/vanessalopera/study/python_data_course/Python_fundamentals/sample_files/'


## 2) The `open()` function and modes

Common text modes:
- `'r'` read (default), error if file not found
- `'w'` write (truncate or create)
- `'a'` append (create if missing)
- `'x'` exclusive create (error if exists)
- Add `'b'` for **binary** and `'t'` for **text** (default).

Always prefer using a **context manager** (`with`) to auto-close the file.


In [12]:
# Reading a whole file
with open(data_dir + "notes.txt", "r", encoding="utf-8") as f:
    text = f.read()
print(text)

This is a plain text file.
It has multiple lines.
We will read it line by line and count words.
¡Incluye caracteres con acentos como: corazón! 😀



In [29]:
# Reading line-by-line
with open(data_dir + "notes.txt", "r", encoding="utf-8") as f:
    for i, line in enumerate(f, start=1):
        print(f"Line {i}: {line.strip()}")

Line 1: This is a plain text file.
Line 2: It has multiple lines.
Line 3: We will read it line by line and count words.
Line 4: ¡Incluye caracteres con acentos como: corazón! 😀


In [30]:
#Aquí ya sabemos funciones, así que crearemos una que lea e imprima un archivo entero. Para segiurla reutilizando
# Este es un ejemplo de función que no devuelve nada, solo imprime. Notese que no tiene un "return"
def read_and_print_file(path):
    with open(path, "r", encoding="utf-8") as f:
        text = f.read()
        print(text)

In [31]:
# Writing (w) overwrites or creates a file
out_path = data_dir + "output.txt"
with open(out_path, "w", encoding="utf-8") as f:
    f.write("Hello from Python file I/O!\n")
    f.write("Second line.\n")


In [32]:
read_and_print_file(out_path)

Hello from Python file I/O!
Second line.



In [33]:
# Append (a) adds to the end
with open(out_path, "a", encoding="utf-8") as f:
    f.write("Appended line.\n")

In [34]:
read_and_print_file(out_path)

Hello from Python file I/O!
Second line.
Appended line.




## 4) Working with CSV (comma-separated values)

Use the built-in `csv` module.


In [36]:
import csv

csv_path = data_dir + "students.csv"

# Read CSV as dictionaries
with open(csv_path, "r", encoding="utf-8", newline="") as f:
    reader = csv.DictReader(f)
    rows = list(reader)

rows

[{'id': '1', 'name': 'Ana', 'age': '22', 'grade': '4.5'},
 {'id': '2', 'name': 'Luis', 'age': '24', 'grade': '3.9'},
 {'id': '3', 'name': 'Sofía', 'age': '21', 'grade': '4.2'},
 {'id': '4', 'name': 'Karla', 'age': '23', 'grade': '4.8'},
 {'id': '5', 'name': 'Marcos', 'age': '25', 'grade': '3.7'}]

In [37]:
# Write a filtered CSV (only passed students)
passed_path = data_dir + "passed_students.csv"
with open(passed_path, "w", encoding="utf-8", newline="") as f:
    fieldnames = ["id", "name", "age", "grade"]
    writer = csv.DictWriter(f, fieldnames=fieldnames)
    writer.writeheader()
    for r in rows:
        if float(r["grade"]) >= 4.0:
            writer.writerow(r)


In [38]:
read_and_print_file(passed_path)

id,name,age,grade
1,Ana,22,4.5
3,Sofía,21,4.2
4,Karla,23,4.8




## 5) JSON and JSON Lines

- Use the `json` module for read/write JSON.
- JSON Lines (`.jsonl`) stores one JSON object per line.


In [41]:
import json

# Read JSON
with open(data_dir + "data.json", "r", encoding="utf-8") as f:
    data = json.load(f)

data

{'project': 'file-io-demo',
 'version': 1,
 'students': [{'id': 1, 'name': 'Ana', 'passed': True, 'grade': 4.5},
  {'id': 2, 'name': 'Luis', 'passed': True, 'grade': 3.9},
  {'id': 3, 'name': 'Sofía', 'passed': True, 'grade': 4.2},
  {'id': 4, 'name': 'Karla', 'passed': True, 'grade': 4.8},
  {'id': 5, 'name': 'Marcos', 'passed': False, 'grade': 3.7}]}

In [42]:
data["project"], len(data["students"])

('file-io-demo', 5)

In [43]:
# Write JSON (pretty)
export_path = data_dir + "export.json"
with open(export_path, "w", encoding="utf-8") as f:
    json.dump(data, f, ensure_ascii=False, indent=2)


In [45]:
read_and_print_file(export_path)

{
  "project": "file-io-demo",
  "version": 1,
  "students": [
    {
      "id": 1,
      "name": "Ana",
      "passed": true,
      "grade": 4.5
    },
    {
      "id": 2,
      "name": "Luis",
      "passed": true,
      "grade": 3.9
    },
    {
      "id": 3,
      "name": "Sofía",
      "passed": true,
      "grade": 4.2
    },
    {
      "id": 4,
      "name": "Karla",
      "passed": true,
      "grade": 4.8
    },
    {
      "id": 5,
      "name": "Marcos",
      "passed": false,
      "grade": 3.7
    }
  ]
}


In [46]:
# Read JSON Lines
jsonl_path = data_dir + "reviews.jsonl"
with open(jsonl_path, "r", encoding="utf-8") as f:
    reviews = [json.loads(line) for line in f]

reviews[:2]

[{'user': 'u1', 'stars': 5, 'comment': 'Great course!'},
 {'user': 'u2', 'stars': 4, 'comment': 'Very helpful'}]


## 6) TSV (tab-separated values)

Same as CSV but with a tab delimiter (`\t`).


In [50]:
import csv

tsv_path = data_dir + "table.tsv"
with open(tsv_path, "r", encoding="utf-8", newline="") as f:
    reader = csv.reader(f, delimiter="\t")
    tsv_rows = list(reader)

tsv_rows

[['colA', 'colB', 'colC'], ['1', '2', '3'], ['4', '5', '6']]


## 7) YAML (quick peek without external libs)

We treat YAML here as plain text since `PyYAML` is not guaranteed to be installed.


In [52]:
yaml_path = data_dir + "config.yaml"
read_and_print_file(yaml_path)

app:
  name: file-io-demo
  debug: true
database:
  host: localhost
  port: 5432
  user: admin
  password: secret




## 8) Binary files (brief)

Use mode `'rb'` or `'wb'` for binary read/write (e.g., images, audio). Here we just write some bytes.


In [56]:
binary_path = data_dir + "bytes.bin"
with open(binary_path, "wb") as f:
    f.write(b"\x00\x01\x02\x03\xff")


## 9) `pathlib` tips

- `Path.exists()`, `is_file()`, `is_dir()`
- `Path.read_text()/write_text()` and `read_bytes()/write_bytes()`
- `glob()` and `rglob()` to find files


In [60]:
from pathlib import Path
data_dir = Path("sample_files")
# List all .csv files under data/
print(list((data_dir).glob("*.csv")))

# Read/write helpers
(Path("sample_files/scratch.txt")).write_text("hello", encoding="utf-8")
Path("sample_files/scratch.txt").read_text(encoding="utf-8")


[PosixPath('sample_files/passed_students.csv'), PosixPath('sample_files/students.csv')]


'hello'


## 10) Practice Exercises

1) **Read Lines & Count**: Open `notes.txt` and count the number of lines and total words. Print both.
2) **Copy File**: Read `notes.txt` and write a copy to `notes_copy.txt` (same content).
3) **Filter CSV**: From `students.csv`, write a new CSV `honor_roll.csv` containing only students with grade ≥ 4.2.
4) **CSV Stats**: Read `students.csv` and print the average grade.
5) **Append Log**: Append a line to `output.txt` with the current run number (e.g., `Run #N`), where N increments each time.
6) **JSON Transform**: Load `data.json`, add a new field `exported_at` with any timestamp string, and save to `data_with_ts.json` (pretty).
7) **JSONL Filter**: From `reviews.jsonl`, create `positive_reviews.jsonl` keeping only objects where `stars >= 4`.
8) **TSV to CSV**: Read `table.tsv` and save it as `table_converted.csv`.
9) **Word Replace**: Read `notes.txt`, replace the word `file` with `document`, and save to `notes_replaced.txt`.
10) **Safe Reader**: Write a function `safe_read(path)` that returns file content or `None` if the file does not exist (handle exceptions).
11) **Binary Size**: Open `bytes.bin` in binary mode and print how many bytes it contains.
12) **Glob**: Use `pathlib` to list all files in `data/` whose name contains `students`.
