# Introduction

First of all - **File object and file content are NOT the same**. A [file object][File] is the Pythonic way of "communicating" with the file, e.g. query its properties, manage its attributes, etc. One of the many actions applicable with a file object is to read/write its content. This "communication" is implemented by the built-in function [open()][open], which also sets some preliminary features of the object.

This will be a more intuitive differentiation when we'll be more acquainted with the Object-Oriented approach.

[File]: https://docs.python.org/2/library/stdtypes.html#file-objects "File object"
[open]: https://docs.python.org/2/library/functions.html#open "open() documentation"

## Open and close

_File_ objects are created by the _open(name[, mode])_ built-in function, where _name_ is the full file path and _mode_ is the mode in which the file is opened. Several modes are available, but the most common ones are **'r'** for reading (default), **'w'** for writing and **'a'** for appending.

It is not a healthy habit to leave open _File_ objects "hanging" in the file system, so we make sure to close them after we are done with them. The following three scripts illustrate exceedingly better syntaxes for addressing a file.

In [3]:
fname = "example.abc"

#### open() 1

In [4]:
my_file = open(fname, 'r')
# Here do something with the file...

# my_file.closed
# my_file.close()

#### open() 2

To make sure one does not forget to close the file, Python provides the **_with_** block, which **automatically closes the corresponding file** when the block ends. It is highly recommended to use it.

In [5]:
my_file = open(fname, 'r')
with my_file:
    # Here do something with the file...
    pass

#### open() 3

Finally, Python supports the following syntax to wrap it all compactly. **This is how it is usually done.**

In [6]:
with open(fname, 'r') as my_file:
    # Here do something with the file...
    pass

# Reading

There are several ways to read the data of a file, and we will see two of them:
* Iteratively with a `for`-loop
* As a whole with the `read()` method

## Read with a `for`-loop

_File_ objects are their own iterators, and their "elements" are their lines. Iterating a _File_ object with a _for_ loop will ieterate the lines of the file. Note that the lines include the "\n" at the end of each line (therefore the double-space print).

In [8]:
fname = "example.abc"

with open(fname) as f:
    for line in f:
        print(line)

This is the first line.

This is the second line.

This is the third and last line.


> **Note:** Why are there double spaces in the output in the example above?

## Read with _read()_

This method is the most simple one, as it simply reads the entire content of the file into a single string.

In [26]:
fname = "example.txt"

with open(fname) as f:
    print(f.read())

This is the first line.
This is the second line.
This is the third and last line.


> **Your turn:** Read the file "christmas.txt". Can you tell how many lines does it have?

# Writing

### Writing methods

Similarly to `read()`, there is `write()` for writing. `write()` expects a single string and writes it directly to the file. `write()` automatically creates a new file if required, and overwrites the content of the file if it already exists.

In [0]:
fname = "example.txt"

str1 = "This is the first line."
str2 = "This is the second line."
str3 = "This is the third and last line."

In [0]:
with open(fname, 'w') as f:
    f.write(str1 + '\n')
    f.write(str2 + '\n')
    f.write(str3)

In [29]:
with open(fname, 'r') as f:
    print(f.read())

This is the first line.
This is the second line.
This is the third and last line.


### Writing modes

In standard writing mode, indicated by 'w', a new file will be created and an existing file will be overwritten. 

Compare the example above with the following:

In [0]:
with open(fname, 'w') as f:
    f.write(str1 + '\n')
   
with open(fname, 'w') as f:
    f.write(str2 + '\n')
    
with open(fname, 'w') as f:
    f.write(str3)

Testing...

In [31]:
with open(fname, 'r') as f:
    print(f.read())

This is the third and last line.


If we want to append the data to what is already in the file, then we should use the append mode, indicated by 'a'.

In [0]:
with open(fname, 'w') as f:
    f.write(str1 + '\n')
   
with open(fname, 'a') as f:
    f.write(str2 + '\n')
    
with open(fname, 'a') as f:
    f.write(str3)

Testing...

In [33]:
with open(fname, 'r') as f:
    print(f.read())

This is the first line.
This is the second line.
This is the third and last line.


## Example

The file "players.txt" contains the names and ages of seven band members. Use the data of the file to create a new file called "sorted players.txt", in which the members are listed by the alphabetical order of their names.

We note that for sorting, it is easier to have the entire data in our hands.

### Solution

In [0]:
# Get the data
with open("players.txt", 'r') as f:
    data = f.read()

# Manipulate the data
data = data.split('\n')
sorted_data = sorted(data)
sorted_data = '\n'.join(sorted_data)

# Create the new file
with open("sorted_players.txt", 'w') as f:
    f.write(sorted_data)

# Pathlib
Working with paths is usually done with python's `pathlib` module

In [9]:
from pathlib import Path
base_dir = Path('.')
list(base_dir.glob("*.*"))

[PosixPath('players.txt'),
 PosixPath('customers.txt'),
 PosixPath('file_objects_exercises.ipynb'),
 PosixPath('.ipynb_checkpoints'),
 PosixPath('File objects.ipynb'),
 PosixPath('example.abc'),
 PosixPath('queue.txt')]

Most pathlib functions return generators and not lists, for memory efficiency

In [16]:
for p in base_dir.glob("*.*"):
    filename = str(p.name)
    print ("="*10 + filename + "="*10)
    if filename.endswith(".txt"):
        with p.open('r') as f:
            txt = f.read()
        print(txt)

Gidi 35
Dani 32
Efraim 39
Yitzhak 32
Meir 36
Yoni 31
Alon 36

name: Yul Brynner         , street: Hertzel   , house:   3, appartment:   1, floor:   1
name: Julie Christie      , street: Hertzel   , house:   8, appartment:   9, floor:   4
name: Reese Witherspoon   , street: Weizmann  , house:  10, appartment:   8, floor:   3
name: Russell Crowe       , street: Dizengoff , house:   1, appartment:   3, floor:   2
name: Charlton Heston     , street: Hertzel   , house:   7, appartment:   7, floor:   3
name: Burt Lancaster      , street: Dizengoff , house:   9, appartment:   6, floor:   3
name: Paul Scofield       , street: Basel     , house:   7, appartment:   5, floor:   2
name: Louise Fletcher     , street: Hertzel   , house:   5, appartment:   5, floor:   2
name: Kathy Bates         , street: Basel     , house:   7, appartment:   2, floor:   1
name: Adrien Brody        , street: Dizengoff , house:   1, appartment:   6, floor:   3
name: Vivien Leigh        , street: Hertzel   , house:  10

## S3Path
We would be working with remote `s3` files, that could be reached with the `S3Path` module

In [21]:
from s3path import S3Path
list(S3Path("/uatt-data/").iterdir())

[S3Path('/ugoren/recipe_schduler'), S3Path('/ugoren/recipe_scheduler')]