# Topic 18: File Handling and I/O Operations

## Overview
File handling is essential for data persistence and processing. Learn to read, write, and manipulate files effectively in Python.

### What You'll Learn:
- Opening and closing files
- Reading and writing text and binary files
- File modes and encoding
- Context managers and the 'with' statement
- File system operations
- Working with different file formats

---

## 1. Basic File Operations

Opening, reading, writing, and closing files:

In [None]:
# Basic file operations
print("Basic File Operations:")
print("=" * 22)

# Writing to a file
print("1. Writing to a file:")
file_content = """Hello, World!
This is a sample text file.
It contains multiple lines.
Python file handling is powerful!

Numbers: 1, 2, 3, 4, 5
Special characters: @#$%^&*()
"""

# Method 1: Traditional file handling (not recommended)
print("   Method 1 - Traditional (not recommended):")
file = open('sample.txt', 'w')
file.write(file_content)
file.close()
print("   File written and closed manually")

# Method 2: Using with statement (recommended)
print("   Method 2 - With statement (recommended):")
with open('sample_with.txt', 'w') as file:
    file.write(file_content)
print("   File written using with statement (auto-closed)")

# Reading from a file
print("\n2. Reading from a file:")

# Read entire file
with open('sample.txt', 'r') as file:
    content = file.read()
print(f"   Entire file content ({len(content)} characters):")
print(f"   First 50 chars: {repr(content[:50])}...")

# Read line by line
print("\n   Reading line by line:")
with open('sample.txt', 'r') as file:
    line_number = 1
    for line in file:
        print(f"     Line {line_number}: {repr(line)}")
        line_number += 1
        if line_number > 3:  # Show only first 3 lines
            print("     ...")
            break

# Read all lines into a list
with open('sample.txt', 'r') as file:
    all_lines = file.readlines()
print(f"\n   Total lines read: {len(all_lines)}")
print(f"   Last line: {repr(all_lines[-1])}")

# Read specific number of characters
with open('sample.txt', 'r') as file:
    first_20_chars = file.read(20)
    next_20_chars = file.read(20)
print(f"\n   First 20 chars: {repr(first_20_chars)}")
print(f"   Next 20 chars: {repr(next_20_chars)}")

# File position and seeking
print("\n3. File position and seeking:")
with open('sample.txt', 'r') as file:
    print(f"   Initial position: {file.tell()}")
    data = file.read(10)
    print(f"   After reading 10 chars: {file.tell()}")
    file.seek(0)  # Go back to beginning
    print(f"   After seek(0): {file.tell()}")
    file.seek(5)  # Go to position 5
    remaining = file.read(15)
    print(f"   Reading from position 5: {repr(remaining)}")

# Appending to a file
print("\n4. Appending to a file:")
append_content = "\nThis line was appended later.\nAnother appended line.\n"
with open('sample.txt', 'a') as file:
    file.write(append_content)
print("   Content appended successfully")

# Verify the append
with open('sample.txt', 'r') as file:
    lines = file.readlines()
print(f"   Total lines after append: {len(lines)}")
print(f"   Last two lines: {[line.strip() for line in lines[-2:]]}")

## 2. File Modes and Encoding

Understanding different file modes and character encoding:

In [None]:
# File modes and encoding
print("File Modes and Encoding:")
print("=" * 25)

# Different file modes
print("1. File modes:")
file_modes = {
    'r': 'Read only (default)',
    'w': 'Write only (truncates existing file)',
    'a': 'Append only',
    'x': 'Create new file (fails if exists)',
    'r+': 'Read and write',
    'w+': 'Read and write (truncates)',
    'a+': 'Read and append',
    'rb': 'Read binary',
    'wb': 'Write binary',
    'ab': 'Append binary'
}

for mode, description in file_modes.items():
    print(f"   '{mode}': {description}")

# Demonstrate different modes
print("\n2. Mode demonstrations:")

# Write mode (creates new or overwrites)
with open('mode_test.txt', 'w') as file:
    file.write("Original content\n")
print("   'w' mode: File created with original content")

# Append mode
with open('mode_test.txt', 'a') as file:
    file.write("Appended content\n")
print("   'a' mode: Content appended")

# Read mode
with open('mode_test.txt', 'r') as file:
    content = file.read()
print(f"   'r' mode: Read content:\n{repr(content)}")

# r+ mode (read and write)
with open('mode_test.txt', 'r+') as file:
    original = file.read()
    file.seek(0, 2)  # Go to end of file
    file.write("Added with r+\n")
    file.seek(0)  # Go to beginning
    modified = file.read()
print(f"   'r+' mode: Added content at end")

# x mode (exclusive creation)
try:
    with open('exclusive_test.txt', 'x') as file:
        file.write("Created exclusively\n")
    print("   'x' mode: File created exclusively")
except FileExistsError:
    print("   'x' mode: File already exists")

# Try x mode again (should fail)
try:
    with open('exclusive_test.txt', 'x') as file:
        file.write("This won't work\n")
except FileExistsError:
    print("   'x' mode: Failed as expected (file exists)")

# Encoding examples
print("\n3. Character encoding:")

# UTF-8 encoding (default in Python 3)
text_with_unicode = "Hello! 🐍 Python supports Unicode: αβγδε 中文 العربية 🌍"

with open('unicode_test.txt', 'w', encoding='utf-8') as file:
    file.write(text_with_unicode)
print("   UTF-8: Unicode text written")

# Read with UTF-8
with open('unicode_test.txt', 'r', encoding='utf-8') as file:
    read_unicode = file.read()
print(f"   UTF-8: Read back: {read_unicode}")

# Different encodings
encodings_to_test = ['utf-8', 'latin-1', 'ascii']
test_text = "Hello, café!"

for encoding in encodings_to_test:
    filename = f'encoding_{encoding.replace("-", "_")}.txt'
    try:
        with open(filename, 'w', encoding=encoding) as file:
            file.write(test_text)
        with open(filename, 'r', encoding=encoding) as file:
            read_back = file.read()
        print(f"   {encoding}: Success - {read_back}")
    except UnicodeEncodeError as e:
        print(f"   {encoding}: Encode error - {e}")
    except UnicodeDecodeError as e:
        print(f"   {encoding}: Decode error - {e}")

# Handle encoding errors
print("\n4. Encoding error handling:")
problematic_text = "Café naïve résumé 中文"

# Strict (default) - raises exception
try:
    with open('ascii_test.txt', 'w', encoding='ascii') as file:
        file.write(problematic_text)
except UnicodeEncodeError as e:
    print(f"   Strict mode: {e}")

# Ignore errors
with open('ascii_ignore.txt', 'w', encoding='ascii', errors='ignore') as file:
    file.write(problematic_text)
with open('ascii_ignore.txt', 'r', encoding='ascii') as file:
    ignored_result = file.read()
print(f"   Ignore errors: '{ignored_result}'")

# Replace errors
with open('ascii_replace.txt', 'w', encoding='ascii', errors='replace') as file:
    file.write(problematic_text)
with open('ascii_replace.txt', 'r', encoding='ascii') as file:
    replaced_result = file.read()
print(f"   Replace errors: '{replaced_result}'")

# Check file encoding
print("\n5. Detecting file encoding:")
import locale
print(f"   System default encoding: {locale.getpreferredencoding()}")
print(f"   Python default encoding: utf-8 (since Python 3.0)")

# File object properties
with open('unicode_test.txt', 'r', encoding='utf-8') as file:
    print(f"   File encoding: {file.encoding}")
    print(f"   File mode: {file.mode}")
    print(f"   File name: {file.name}")
    print(f"   File closed: {file.closed}")

print(f"   File closed after context: {file.closed}")

## 3. Binary Files and Advanced Operations

Working with binary data and advanced file operations:

In [None]:
# Binary files and advanced operations
print("Binary Files and Advanced Operations:")
print("=" * 37)

# Working with binary files
print("1. Binary file operations:")

# Create some binary data
binary_data = bytes([0, 1, 2, 3, 255, 254, 253, 65, 66, 67])  # Mix of values and ASCII
print(f"   Binary data: {binary_data}")
print(f"   As list: {list(binary_data)}")

# Write binary data
with open('binary_test.bin', 'wb') as file:
    file.write(binary_data)
print("   Binary data written to file")

# Read binary data
with open('binary_test.bin', 'rb') as file:
    read_binary = file.read()
print(f"   Binary data read back: {read_binary}")
print(f"   Data matches: {binary_data == read_binary}")

# Working with different binary data types
print("\n2. Different binary data types:")

# Bytes from string
text = "Hello, World!"
text_bytes = text.encode('utf-8')
print(f"   Text: '{text}'")
print(f"   As bytes: {text_bytes}")
print(f"   Back to text: '{text_bytes.decode('utf-8')}'")

# Bytearray (mutable bytes)
ba = bytearray(b"mutable")
print(f"   Bytearray: {ba}")
ba[0] = ord('M')  # Change first character to 'M'
print(f"   Modified: {ba}")
print(f"   As string: '{ba.decode()}'")

# Working with integers and bytes
print("\n3. Integer and bytes conversion:")
number = 1000
number_bytes = number.to_bytes(4, byteorder='big')
print(f"   Number: {number}")
print(f"   As bytes (big-endian): {number_bytes}")
print(f"   Back to int: {int.from_bytes(number_bytes, byteorder='big')}")

# Little-endian
number_bytes_little = number.to_bytes(4, byteorder='little')
print(f"   As bytes (little-endian): {number_bytes_little}")
print(f"   Back to int: {int.from_bytes(number_bytes_little, byteorder='little')}")

# File copying (binary)
print("\n4. File copying:")

# Create a test file with mixed content
test_content = "Mixed content: text and binary data\n" + "\x00\x01\x02\xff\xfe\xfd"
with open('source.txt', 'w', encoding='utf-8') as file:
    # Note: This won't work for pure binary, but demonstrates concept
    file.write("Mixed content: text and binary markers")

# Copy file byte by byte
def copy_file(source, destination):
    """Copy file using binary mode"""
    with open(source, 'rb') as src:
        with open(destination, 'wb') as dst:
            while True:
                chunk = src.read(1024)  # Read in chunks
                if not chunk:
                    break
                dst.write(chunk)
    print(f"   Copied {source} to {destination}")

copy_file('source.txt', 'destination.txt')

# Verify copy
with open('source.txt', 'rb') as f1, open('destination.txt', 'rb') as f2:
    original = f1.read()
    copied = f2.read()
    print(f"   Copy successful: {original == copied}")

# Working with file chunks
print("\n5. Processing large files in chunks:")

# Create a larger test file
large_content = "Line {}\n".format(i) * 1000 for i in range(1000)
with open('large_file.txt', 'w') as file:
    for i in range(1000):
        file.write(f"Line {i+1}: This is line number {i+1} with some content.\n")
print("   Large file created (1000 lines)")

# Process file in chunks
def process_file_in_chunks(filename, chunk_size=1024):
    """Process file in chunks to handle large files"""
    line_count = 0
    char_count = 0
    
    with open(filename, 'r') as file:
        while True:
            chunk = file.read(chunk_size)
            if not chunk:
                break
            char_count += len(chunk)
            line_count += chunk.count('\n')
    
    return line_count, char_count

lines, chars = process_file_in_chunks('large_file.txt')
print(f"   Processed: {lines} lines, {chars} characters")

# File seeking and random access
print("\n6. Random file access:")

# Create a file with numbered records
record_size = 20  # Fixed record size
with open('records.txt', 'w') as file:
    for i in range(10):
        record = f"Record {i:02d}:{' '*8}\n"  # Pad to fixed size
        file.write(record)

print(f"   Created file with {10} fixed-size records")

# Random access to specific record
def read_record(filename, record_number, record_size):
    """Read specific record by number"""
    with open(filename, 'r') as file:
        file.seek(record_number * record_size)
        return file.read(record_size).strip()

for record_num in [0, 5, 9]:
    record_data = read_record('records.txt', record_num, record_size)
    print(f"   Record {record_num}: '{record_data}'")

# Memory-mapped files (for very large files)
print("\n7. Memory-mapped files:")
import mmap

# Create a test file
with open('mmap_test.txt', 'w') as file:
    file.write("0123456789" * 100)  # 1000 characters

# Use memory mapping
with open('mmap_test.txt', 'r+b') as file:
    with mmap.mmap(file.fileno(), 0) as mmapped_file:
        print(f"   File size: {len(mmapped_file)} bytes")
        print(f"   First 20 bytes: {mmapped_file[:20]}")
        
        # Modify through memory map
        mmapped_file[0:5] = b'HELLO'
        mmapped_file.flush()
        
        print(f"   After modification: {mmapped_file[:20]}")

# Verify modification
with open('mmap_test.txt', 'r') as file:
    content = file.read(20)
    print(f"   File content after mmap: '{content}'")

print("\nBinary file operations completed!")

## 4. File System Operations

Working with directories, paths, and file metadata:

In [None]:
# File system operations
import os
import pathlib
import glob
import shutil
import stat
import time
from datetime import datetime

print("File System Operations:")
print("=" * 23)

# Working with paths
print("1. Path operations:")

# Using os.path (traditional)
current_dir = os.getcwd()
parent_dir = os.path.dirname(current_dir)
print(f"   Current directory: {os.path.basename(current_dir)}")
print(f"   Parent directory: {os.path.basename(parent_dir)}")

# Join paths safely
test_path = os.path.join(current_dir, 'test_folder', 'subdir', 'file.txt')
print(f"   Joined path: {test_path}")

# Path components
print(f"   Directory: {os.path.dirname(test_path)}")
print(f"   Filename: {os.path.basename(test_path)}")
print(f"   Extension: {os.path.splitext(test_path)[1]}")
print(f"   Without extension: {os.path.splitext(test_path)[0]}")

# Using pathlib (modern approach)
print("\n2. Pathlib (modern approach):")
path = pathlib.Path.cwd()
print(f"   Current path: {path.name}")
print(f"   Parent: {path.parent.name}")
print(f"   Home directory: {pathlib.Path.home()}")

# Path operations with pathlib
test_pathlib = pathlib.Path('test_folder') / 'subdir' / 'file.txt'
print(f"   Pathlib joined: {test_pathlib}")
print(f"   Parts: {test_pathlib.parts}")
print(f"   Suffix: {test_pathlib.suffix}")
print(f"   Stem: {test_pathlib.stem}")
print(f"   Is absolute: {test_pathlib.is_absolute()}")

# Directory operations
print("\n3. Directory operations:")

# Create directory structure
test_dir = pathlib.Path('test_directory_structure')
subdirs = ['subdir1', 'subdir2', 'subdir1/nested']

for subdir in subdirs:
    full_path = test_dir / subdir
    full_path.mkdir(parents=True, exist_ok=True)
    print(f"   Created: {full_path}")

# Create some test files
test_files = [
    test_dir / 'file1.txt',
    test_dir / 'file2.py',
    test_dir / 'subdir1' / 'nested_file.txt',
    test_dir / 'subdir2' / 'another_file.py'
]

for file_path in test_files:
    file_path.write_text(f"Content of {file_path.name}")
    print(f"   Created file: {file_path}")

# List directory contents
print("\n4. Directory listing:")

# Using os.listdir
contents = os.listdir(test_dir)
print(f"   os.listdir: {contents}")

# Using pathlib
print("   pathlib contents:")
for item in test_dir.iterdir():
    item_type = "DIR" if item.is_dir() else "FILE"
    print(f"     {item_type}: {item.name}")

# Recursive directory walking
print("\n5. Recursive directory walking:")

# Using os.walk
print("   os.walk:")
for root, dirs, files in os.walk(test_dir):
    level = root.replace(str(test_dir), '').count(os.sep)
    indent = '  ' * (level + 1)
    print(f"{indent}{os.path.basename(root)}/")
    subindent = '  ' * (level + 2)
    for file in files:
        print(f"{subindent}{file}")

# Using pathlib recursively
print("\n   pathlib recursive:")
for item in test_dir.rglob('*'):
    if item.is_file():
        relative = item.relative_to(test_dir)
        print(f"     FILE: {relative}")

# Pattern matching with glob
print("\n6. Pattern matching:")

# Find all .py files
py_files = list(test_dir.glob('**/*.py'))
print(f"   Python files: {[f.name for f in py_files]}")

# Find all .txt files
txt_files = list(test_dir.rglob('*.txt'))
print(f"   Text files: {[f.name for f in txt_files]}")

# Using glob module
import glob as glob_module
all_py_files = glob_module.glob('**/*.py', recursive=True)
print(f"   All .py files in current area: {len(all_py_files)} files")

# File metadata and properties
print("\n7. File metadata:")

test_file = test_dir / 'file1.txt'
if test_file.exists():
    # Using pathlib
    stat_info = test_file.stat()
    print(f"   File: {test_file.name}")
    print(f"   Size: {stat_info.st_size} bytes")
    print(f"   Modified: {datetime.fromtimestamp(stat_info.st_mtime)}")
    print(f"   Is file: {test_file.is_file()}")
    print(f"   Is directory: {test_file.is_dir()}")
    print(f"   Permissions: {oct(stat_info.st_mode)}")
    
    # Using os.path
    print(f"   Exists (os.path): {os.path.exists(test_file)}")
    print(f"   Size (os.path): {os.path.getsize(test_file)} bytes")
    print(f"   Modified (os.path): {datetime.fromtimestamp(os.path.getmtime(test_file))}")

# File permissions
print("\n8. File permissions:")
if test_file.exists():
    # Get current permissions
    current_mode = test_file.stat().st_mode
    print(f"   Current permissions: {oct(current_mode)}")
    
    # Check specific permissions
    print(f"   Is readable: {os.access(test_file, os.R_OK)}")
    print(f"   Is writable: {os.access(test_file, os.W_OK)}")
    print(f"   Is executable: {os.access(test_file, os.X_OK)}")
    
    # Change permissions (be careful!)
    try:
        test_file.chmod(0o644)  # rw-r--r--
        print(f"   Permissions changed to 644")
    except PermissionError:
        print(f"   Permission denied changing file permissions")

# File operations
print("\n9. File operations:")

# Copy files
source_file = test_dir / 'file1.txt'
dest_file = test_dir / 'file1_copy.txt'

# Using shutil
shutil.copy2(source_file, dest_file)  # copy2 preserves metadata
print(f"   Copied {source_file.name} to {dest_file.name}")

# Move/rename files
old_name = dest_file
new_name = test_dir / 'renamed_file.txt'
shutil.move(old_name, new_name)
print(f"   Moved {old_name.name} to {new_name.name}")

# Delete files and directories
print("\n10. Cleanup operations:")

# Delete a file
if new_name.exists():
    new_name.unlink()  # pathlib way
    print(f"   Deleted file: {new_name.name}")

# Delete directory and contents
if test_dir.exists():
    shutil.rmtree(test_dir)  # Recursive delete
    print(f"   Deleted directory tree: {test_dir.name}")

# Temporary files and directories
print("\n11. Temporary files:")
import tempfile

# Temporary file
with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.txt') as tmp:
    tmp.write("Temporary content")
    tmp_name = tmp.name
print(f"   Created temporary file: {os.path.basename(tmp_name)}")

# Read from temporary file
with open(tmp_name, 'r') as tmp:
    content = tmp.read()
print(f"   Temporary file content: '{content}'")

# Clean up temporary file
os.unlink(tmp_name)
print(f"   Temporary file deleted")

# Temporary directory
with tempfile.TemporaryDirectory() as tmp_dir:
    tmp_path = pathlib.Path(tmp_dir)
    test_file = tmp_path / 'test.txt'
    test_file.write_text('Test content')
    print(f"   Temporary directory: {tmp_path.name}")
    print(f"   Contents: {list(tmp_path.iterdir())}")

print(f"   Temporary directory automatically deleted")

print("\nFile system operations completed!")

## 5. Working with Different File Formats

Handling CSV, JSON, and other common file formats:

In [None]:
# Working with different file formats
import csv
import json
import configparser
import pickle
from datetime import datetime, date

print("Working with Different File Formats:")
print("=" * 36)

# 1. CSV Files
print("1. CSV (Comma-Separated Values):")

# Sample data
employees = [
    ['Name', 'Age', 'Department', 'Salary'],
    ['Alice Johnson', 28, 'Engineering', 75000],
    ['Bob Smith', 35, 'Marketing', 65000],
    ['Charlie Brown', 42, 'Sales', 70000],
    ['Diana Wilson', 31, 'Engineering', 80000],
    ['Eve Davis', 29, 'HR', 60000]
]

# Write CSV file
with open('employees.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(employees)
print("   CSV file written with employee data")

# Read CSV file
with open('employees.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)
    csv_data = list(reader)
    print(f"   Read {len(csv_data)} rows from CSV")
    print(f"   Header: {csv_data[0]}")
    print(f"   First employee: {csv_data[1]}")

# Working with CSV DictReader/DictWriter
print("\n   Using DictReader/DictWriter:")

# Write using DictWriter
employee_dicts = [
    {'name': 'Frank Miller', 'age': 33, 'dept': 'IT', 'salary': 72000},
    {'name': 'Grace Lee', 'age': 27, 'dept': 'Design', 'salary': 68000},
    {'name': 'Henry Kim', 'age': 39, 'dept': 'Finance', 'salary': 75000}
]

with open('employees_dict.csv', 'w', newline='') as csvfile:
    fieldnames = ['name', 'age', 'dept', 'salary']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(employee_dicts)
print("   Dictionary-based CSV written")

# Read using DictReader
with open('employees_dict.csv', 'r') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        print(f"     {row['name']}: {row['age']} years, {row['dept']}, ${row['salary']}")

# 2. JSON Files
print("\n2. JSON (JavaScript Object Notation):")

# Sample data structure
company_data = {
    'company': 'Tech Corp',
    'founded': 2010,
    'employees': [
        {
            'id': 1,
            'name': 'Alice Johnson',
            'position': 'Senior Developer',
            'skills': ['Python', 'JavaScript', 'SQL'],
            'active': True,
            'hire_date': '2020-01-15'
        },
        {
            'id': 2,
            'name': 'Bob Smith',
            'position': 'Product Manager',
            'skills': ['Project Management', 'Analytics'],
            'active': True,
            'hire_date': '2019-03-22'
        }
    ],
    'locations': {
        'headquarters': 'San Francisco',
        'offices': ['New York', 'London', 'Tokyo']
    }
}

# Write JSON file
with open('company.json', 'w') as jsonfile:
    json.dump(company_data, jsonfile, indent=2)
print("   JSON file written with company data")

# Read JSON file
with open('company.json', 'r') as jsonfile:
    loaded_data = json.load(jsonfile)
    print(f"   Company: {loaded_data['company']}")
    print(f"   Number of employees: {len(loaded_data['employees'])}")
    print(f"   First employee: {loaded_data['employees'][0]['name']}")

# JSON with custom encoder (for dates)
class DateTimeEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (datetime, date)):
            return obj.isoformat()
        return super().default(obj)

data_with_dates = {
    'timestamp': datetime.now(),
    'date': date.today(),
    'info': 'Data with dates'
}

with open('data_with_dates.json', 'w') as jsonfile:
    json.dump(data_with_dates, jsonfile, cls=DateTimeEncoder, indent=2)
print("   JSON with custom date encoding written")

# Pretty print JSON
print("   Pretty-printed JSON sample:")
print(json.dumps(loaded_data['employees'][0], indent=2))

# 3. Configuration Files (INI format)
print("\n3. Configuration Files (INI format):")

# Create configuration
config = configparser.ConfigParser()
config['DEFAULT'] = {
    'debug': 'False',
    'timeout': '30'
}
config['database'] = {
    'host': 'localhost',
    'port': '5432',
    'name': 'myapp',
    'user': 'admin'
}
config['api'] = {
    'base_url': 'https://api.example.com',
    'version': 'v2',
    'timeout': '60'  # Overrides DEFAULT
}

# Write configuration file
with open('config.ini', 'w') as configfile:
    config.write(configfile)
print("   Configuration file written")

# Read configuration file
config_read = configparser.ConfigParser()
config_read.read('config.ini')

print(f"   Database host: {config_read['database']['host']}")
print(f"   API timeout: {config_read['api']['timeout']}")
print(f"   Default timeout: {config_read['DEFAULT']['timeout']}")

# List all sections and options
print(f"   Sections: {config_read.sections()}")
for section in config_read.sections():
    print(f"   {section} options: {list(config_read[section].keys())}")

# 4. Pickle (Python object serialization)
print("\n4. Pickle (Python object serialization):")

# Complex Python object
class Person:
    def __init__(self, name, age, hobbies):
        self.name = name
        self.age = age
        self.hobbies = hobbies
    
    def __repr__(self):
        return f"Person('{self.name}', {self.age}, {self.hobbies})"

complex_data = {
    'people': [
        Person('Alice', 28, ['reading', 'coding']),
        Person('Bob', 35, ['music', 'sports'])
    ],
    'numbers': [1, 2, 3, 4, 5],
    'nested': {'inner': {'deep': 'value'}},
    'timestamp': datetime.now()
}

# Write pickle file
with open('complex_data.pkl', 'wb') as picklefile:
    pickle.dump(complex_data, picklefile)
print("   Complex data pickled")

# Read pickle file
with open('complex_data.pkl', 'rb') as picklefile:
    loaded_complex = pickle.load(picklefile)
    print(f"   Loaded people: {loaded_complex['people']}")
    print(f"   Loaded timestamp: {loaded_complex['timestamp']}")
    print(f"   Object types preserved: {type(loaded_complex['people'][0])}")

print("   ⚠️  Warning: Only pickle files from trusted sources!")

# 5. Working with large files efficiently
print("\n5. Efficient handling of large files:")

# Create a large CSV for demonstration
with open('large_data.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['id', 'value', 'category'])
    for i in range(10000):
        writer.writerow([i, f'value_{i}', f'cat_{i % 10}'])
print("   Created large CSV file (10,000 rows)")

# Process large file in chunks
def process_large_csv(filename, chunk_size=1000):
    """Process large CSV file in chunks"""
    total_rows = 0
    categories = set()
    
    with open(filename, 'r') as csvfile:
        reader = csv.DictReader(csvfile)
        chunk = []
        
        for row in reader:
            chunk.append(row)
            categories.add(row['category'])
            
            if len(chunk) >= chunk_size:
                # Process chunk
                total_rows += len(chunk)
                chunk = []
        
        # Process remaining chunk
        if chunk:
            total_rows += len(chunk)
    
    return total_rows, len(categories)

rows, unique_cats = process_large_csv('large_data.csv')
print(f"   Processed {rows} rows with {unique_cats} unique categories")

# File format summary
print("\nFile Format Summary:")
formats = {
    'CSV': 'Tabular data, widely supported, human-readable',
    'JSON': 'Structured data, web APIs, human-readable',
    'INI': 'Configuration files, simple key-value pairs',
    'Pickle': 'Python objects, not human-readable, Python-specific',
    'Binary': 'Raw data, images, executables, most efficient',
    'Text': 'Plain text, logs, documents, human-readable'
}

for fmt, description in formats.items():
    print(f"   {fmt}: {description}")

print("\nFile format operations completed!")

## Summary

In this notebook, you learned about:

✅ **Basic File Operations**: Opening, reading, writing, and closing files  
✅ **File Modes**: Different modes and their purposes, encoding handling  
✅ **Binary Files**: Working with binary data and advanced operations  
✅ **File System**: Directory operations, paths, and metadata  
✅ **File Formats**: CSV, JSON, INI, Pickle, and format-specific operations  
✅ **Best Practices**: Context managers, error handling, and efficiency  

### Key Takeaways:
1. Always use context managers (with statement) for file operations
2. Specify encoding explicitly when working with text files
3. Use pathlib for modern path operations
4. Process large files in chunks to manage memory
5. Choose appropriate file formats for your data
6. Handle file operations errors gracefully

### Next Topic: 19_oop_basics.ipynb
Learn about Object-Oriented Programming fundamentals in Python.