![Py4Eng](img/logo.png)

# I/O: `input`, files, filesystem
## Yoav Ram

# User prompt

The `input` function is useful to get string prompt from the user. It works in the notebook, as well as when running scripts in the console.

In [1]:
name = input("What's your name?\n")

print("Hi", name)

What's your name?
 Yoav


Hi Yoav


In [2]:
n_icecreams = input("How many icecreams would you like?")
price = input("How much does an icecream cost?")
print("That would be", price * n_icecreams)

How many icecreams would you like? 2
How much does an icecream cost? 3.5


TypeError: can't multiply sequence by non-int of type 'str'

For security reasons, `input` returns strings. It is the program's responsibility to convert the string to the desired type:

In [3]:
n_icecreams = int(input("How many icecreams would you like?"))
price = float(input("How much does an icecream cost?"))
print("That would be", price * n_icecreams)

How many icecreams would you like? 2
How much does an icecream cost? 3.5


That would be 7.0


You can use `eval` to evaluate the input string into a Python expression, but **don't do it** if you don't trust the user because it can lead to strange behaviour and side effects.

Let's see what happens when we give valud input (`2` and `1.5`) and when we give invalid input (`2` and `[1,2,3]`). Try it with `eval` and with the above code(`int` and `float`).

In [4]:
n_icecreams = eval(input("How many icecreams would you like?"))
price = eval(input("How much does an icecream cost?"))
print("That would be", price * n_icecreams)

How many icecreams would you like? 2
How much does an icecream cost? 3.5


That would be 7.0


## Exercise

Ask the user for a number between 1 and 10; if the number is not within that range, let him know and ask him again.

# Files

We'll start with simple text files and proceed to more complex formats.  
Let's read the list of crop plants located in `data/crops.txt` or you can download it from [GitHub](https://github.com/yoavram/Py4Eng/blob/master/data/crops.txt).

## Reading files

Whenever we want to work with a file, we first need to _open_ it using the `open` function.  
This function returns an IO object which we can then use for reading or writing.

In [5]:
f = open('../data/crops.txt', 'rt') # rt = read text
print(type(f))

<class '_io.TextIOWrapper'>


In [6]:
crops = f.read()
f.close()
print(crops[:100])

Abelmoschus caillei
Abelmoschus esculentus
Acacia mearnsii
Acacia senegal
Acacia seyal
Acca sellowia


The `open` function receives two parameters: 

- the path to the file you want to open.
- the mode of opening - `r` for reading, `w` for writing, `a` for appending, `t` for text, `b` for binary.

`read` returns *all* the text from the file as a string. 

`close` then closes the file handle.

A more idiomatic way to do this, in which Python takes care of closing the file, is using a [context manager](https://docs.python.org/3/reference/datamodel.html#with-statement-context-managers):

In [7]:
with open('../data/crops.txt','r') as f:
    crops = f.read()
print(crops[:100])

Abelmoschus caillei
Abelmoschus esculentus
Acacia mearnsii
Acacia senegal
Acacia seyal
Acca sellowia


This idiom uses a [context manager](https://docs.python.org/3.5/library/stdtypes.html?highlight=context%20manager), and the file handle `f` is closed when the context manager block ends, even if it ends due to an error.

## Iterating over files

### Using a `for` loop

We can simply use a _for_ loop to go over all lines in a text file. 
This is the _best practice_, and also very simple to use:

In [8]:
with open('../data/crops.txt','r') as f:
    for line in f:
        if line.startswith('Musa'):   # check if line starts with a given string
            print(line.strip())       # strip removes the newline character from the end of the line

Musa balbisiana
Musa spp.
Musa textilis


### Reading line by line with `readline`

The `readline()` method allows us to read a single line each time. 
It works well when combined with a `while` loop, giving us control of the program flow.

In [9]:
with open('../data/crops.txt','r') as f:
    line = f.readline().strip()    # read first line
    print(line)
    while line:
        line = f.readline().strip()
        if line.startswith('Triticum'):
            print(line)        

Abelmoschus caillei
Triticum aestivum
Triticum dicoccum
Triticum durum
Triticum monococcum
Triticum spelta
Triticum turanicum


There are other methods you can use to read files. For example, the `readlines()` returns all the lines as a list of strings.

## Exercise

1) Print the last line in the file.  
2) Find out how many _Garcinia_ species are in the file (use the `startswith()` string method).

## Writing to a file

To write to a file, we first have to open it for writing. This is done using one of two modes: 'w' or 'a'.

'w', for write, will let you write into the file. If it doesn't exist, it'll be automatically created. If it exists and already has some content, __the content will be overwritten__.

'a', for append, is very similar, only it will not overwrite, but append your text to the end of an existing file. 

Writing is done using `print()` by adding the argument `file = <file object>`.

In [10]:
with open(r'tmp.txt','w') as f:
    print('This is the first line', file=f)
    line = 'Another line'
    print(line, file=f)
    msg1 = 'Hello '
    msg2 = 'World!'
    print(msg1 + msg2, file=f)

In [16]:
%less tmp.txt

This is the first line
Another line
Hello World!


# Temporary files

Temporary files can be created using the _tempfile_ module:

In [1]:
import tempfile

In [2]:
_, fname = tempfile.mkstemp()
print("Writing to temp file", fname)
with open(fname, 'w') as f:
     print("This is a temporary file", file=f)

Writing to temp file /var/folders/qn/3hj7mcx56k19b_09n6dymw8h0000gn/T/tmp5mvzb1dr


In [3]:
%less $fname

See other methods in *tempfile* on how to create temporary directories, named temporary files, etc.

## Exercise

In the last example we wrote to a temporary file. In this exercise we will copy that file contents to a new temporary file that has an extension `.txt` (use the `suffix` keyword when creating the temporary file). Copy the contents by reading from the existing file and writing to a new file (this is not the efficient way to do it, but it's just an exercise!). Don't forget to close the files and print the new temporary filename so that you can check that the writing was successful.

C:\Users\yoavram\AppData\Local\Temp\tmpk6qauup4.txt


# Filesystem

Python offers plenty of ways to interact with the filesystem through the `os` and `os.path` modules.

Let's import `os`:

In [17]:
import os

Showcase some of the capabilities of `os`:

In [18]:
files = os.listdir()
for fname in files:
    if os.path.isdir(fname):
        print(fname, "is a folder")
    elif os.path.isfile(fname):
        size = os.path.getsize(fname)
        print(fname, "is a file with size", size, "bytes")

tmp.txt is a file with size 49 bytes
ctypes.ipynb is a file with size 15106 bytes
update-colophon.ipynb is a file with size 3491 bytes
strings-lists-loops.ipynb is a file with size 49935 bytes
Untitled1.ipynb is a file with size 2611 bytes
.DS_Store is a file with size 6148 bytes
datetime.ipynb is a file with size 10617 bytes
iteration.ipynb is a file with size 45313 bytes
matlab.ipynb is a file with size 28650 bytes
conda-env.ipynb is a file with size 41304 bytes
image-processing.ipynb is a file with size 1178114 bytes
Untitled.ipynb is a file with size 1027 bytes
pid.py is a file with size 23 bytes
DSP.ipynb is a file with size 1894769 bytes
matplotlib.ipynb is a file with size 463205 bytes
modules.ipynb is a file with size 25325 bytes
exceptions.ipynb is a file with size 28720 bytes
io.ipynb is a file with size 30258 bytes
gui.ipynb is a file with size 37090 bytes
idioms.ipynb is a file with size 44801 bytes
oop.ipynb is a file with size 92288 bytes
cython-numba.ipynb is a file with

Here's a combination of functions to get the current directory (`os.getcwd`), change the directory (`os.chdir`), check if a file exists (`os.path.exists`), and split a filename from its extension:

In [19]:
curdir = os.getcwd()
os.chdir('../data')
fname = 'crops.txt'
print(fname, 'exists?', os.path.exists(fname))
fname = os.path.splitext('crops.txt')[0] + '.csv'
print(fname, 'exists?', os.path.exists(fname))
os.chdir(curdir)

crops.txt exists? True
crops.csv exists? False


See the [os](https://docs.python.org/3.5/library/os.html) and [os.path](https://docs.python.org/3.5/library/os.path.html#module-os.path) modules for more functions.

# Serializing objects

The `json` module allows to encode Python objects to text and decode them back again. It implements the [JSON](http://json.org/) (JavaScript Object Notation) format, a lightweight data interchange format inspired by JavaScript object literal syntax, and is therefore interoperable and widely used outside of the Python ecosystem. Also, the format is human-readable, which allows the developer to inspect the data from file without requiring him to deserialize the data.

We start by importing the module and creating an example data dictionary:

In [1]:
import json

data = { 
    'a_string': 'Hello JSON', 
    'ints_in_a_tuple': (5, 6, 7, 2, 3, 5, 6), 
    'some_number': 5768.4454,
    'list_as_well': [True, False, 'This', 'That']
} 
data

{'a_string': 'Hello JSON',
 'ints_in_a_tuple': (5, 6, 7, 2, 3, 5, 6),
 'list_as_well': [True, False, 'This', 'That'],
 'some_number': 5768.4454}

We **dump** the dictionary into a string:

In [2]:
data_string = json.dumps(data)
data_string

'{"a_string": "Hello JSON", "ints_in_a_tuple": [5, 6, 7, 2, 3, 5, 6], "some_number": 5768.4454, "list_as_well": [true, false, "This", "That"]}'

If we want to save this to a file, we can either write the string to a file or dump directly to a file:

In [3]:
fname = tempfile.mktemp(suffix='.json')

with open(fname, 'w') as f:
     json.dump(data, f)

In [4]:
%less $fname

We can make the file more readable with some configuration:

In [5]:
_, fname = tempfile.mkstemp(suffix='.json')

with open(fname, 'w') as f:
     json.dump(data, f, sort_keys=True, indent=4, separators=(',', ': '))

In [6]:
%less $fname

Not everything is supported by `json`, for example, `complex` numbers:

In [7]:
json.dumps([1 + 2j, 4 + 5j])

TypeError: Object of type 'complex' is not JSON serializable

In [8]:
def encode_complex(obj):
    if isinstance(obj, complex):
        return {'real': obj.real, 'imag': obj.imag}

In [17]:
data = [1 + 2j, 4 + 5j, 5]
dump = json.dumps(data, default=encode_complex)
dump

'[{"real": 1.0, "imag": 2.0}, {"real": 4.0, "imag": 5.0}, 5]'

And to decode:

In [18]:
def decode_complex(o):
    if 'real' in o and 'imag' in o: # no need for isinstance(o, dict) as o is always dict, see docstring
        return complex(o['real'], o['imag'])
    return o

data2 = json.loads(dump, object_hook=decode_complex)
print(data2, data2 == data)

[(1+2j), (4+5j), 5] True


# References

- The [pickle](https://docs.python.org/3.5/library/pickle.html?highlight=pickle#module-pickle) module implements binary protocols for serializing and de-serializing a Python object structure. It can deal with (almost) any Python object, but produces binary rather than text files, and is Python specific. The `pickle` API is similar to that of `json`.
- The [io](https://docs.python.org/3.5/library/io.html) module provides facilities for dealing with various types of I/O.

## Colophon
This notebook was written by [Yoav Ram](http://python.yoavram.com) and is part of the [_Python for Engineers_](https://github.com/yoavram/Py4Eng) course.

The notebook was written using [Python](http://python.org/) 3.7.
Dependencies listed in [environment.yml](../environment.yml), full versions in [environment_full.yml](../environment_full.yml).

This work is licensed under a CC BY-NC-SA 4.0 International License.

![Python logo](https://www.python.org/static/community_logos/python-logo.png)