![Py4Eng](img/logo.png)

# I/O: input, files, filesystem

# User prompt

The `input` function is useful to get string prompt from the user. It works in the notebook, as well as when running scripts in the console.

In [2]:
name = input("What's your name?\n")

print("Hi", name)

What's your name?
 Yoav


Hi Yoav


In [3]:
n_icecreams = input("How many icecreams would you like?")
price = input("How much does an icecream cost?")
print("That would be", price * n_icecreams)

How many icecreams would you like? 3
How much does an icecream cost? 2.5


TypeError: can't multiply sequence by non-int of type 'str'

For security reasons, `input` returns strings. 

It is the program's responsibility to convert the string to the desired type:

In [4]:
n_icecreams = int(input("How many icecreams would you like?"))
price = float(input("How much does an icecream cost?"))
print("That would be", price * n_icecreams)

How many icecreams would you like? 3
How much does an icecream cost? 2.5


That would be 7.5


But what if we give an invalid input?

Try replying 'two'.

In [5]:
n_icecreams = int(input("How many icecreams would you like?"))
price = float(input("How much does an icecream cost?"))
print("That would be", price * n_icecreams)

How many icecreams would you like? two


ValueError: invalid literal for int() with base 10: 'two'

## Exercise: pick a number

Ask the user to pick a number between 1 and 10; if the number is not within that range, let him know and ask him again.

# Files

We'll start with simple text files and proceed to more complex formats.  
Let's read the list of crop plants located in `data/crops.txt` or you can download it from [GitHub](https://github.com/yoavram/Py4Eng/blob/master/data/crops.txt).

## Reading files

Whenever we want to work with a file, we first need to _open_ it using the `open` function.  
This function returns an IO object which we can then use for reading or writing.

In [6]:
#f = open(r'C:\Users\Owner\Desktop\DataSciPy\data\crops.txt') # using full path
f = open('../data/crops.txt') # using relative path (relative to the code-file location)
print(type(f))

<class '_io.TextIOWrapper'>


In [7]:
crops = f.read()
f.close()
print(crops[:100])

Abelmoschus caillei
Abelmoschus esculentus
Acacia mearnsii
Acacia senegal
Acacia seyal
Acca sellowia


The `open` function receives two parameters: 

- the path to the file you want to open.
- the mode of opening: `r` for reading, `w` for writing, `a` for appending.

`read` returns *all* the text from the file as a string. 

`close` then closes the file handle.
#### __important - not to forget to close a file object__

A more idiomatic way to do this, in which Python takes care of closing the file, is using a [context manager](https://docs.python.org/3/reference/datamodel.html#with-statement-context-managers):

In [8]:
with open('../data/crops.txt') as f:
    crops = f.read()
print(crops[:100])

Abelmoschus caillei
Abelmoschus esculentus
Acacia mearnsii
Acacia senegal
Acacia seyal
Acca sellowia


This idiom uses a [context manager](https://docs.python.org/3.5/library/stdtypes.html?highlight=context%20manager), and the file handle `f` is closed when the context manager block ends, even if it ends due to an error.

## Iterating over files

### Using a __for__ loop

We can simply use a _for_ loop to go over all lines in a text file. 
This is the _best practice_, and also very simple to use:

In [9]:
with open('../data/crops.txt') as f: 
    for line in f:
        if line.startswith('Musa'):   # check if line starts with a given string
            print(line)

Musa balbisiana

Musa spp.

Musa textilis



The double-line spaces is due to the fact the each line from the text ends with a new-line character `\n`.

We can clean such characters using `strip` command

In [10]:
with open('../data/crops.txt') as f:
    for line in f:
        if line.startswith('Musa'):   # check if line starts with a given string
            print(line.strip())       # strip removes the newline character from the end of the line

Musa balbisiana
Musa spp.
Musa textilis


### Reading line by line with __readline__

The `readline()` method allows us to read a single line each time. 
It works well when combined with a `while` loop, giving us control of the program flow.

In [11]:
with open('../data/crops.txt') as f:
    line = f.readline().strip()    # read first line
    print(line)
    while line:
        line = f.readline().strip()
        if line.startswith('Triticum'):
            print(line)        

Abelmoschus caillei
Triticum aestivum
Triticum dicoccum
Triticum durum
Triticum monococcum
Triticum spelta
Triticum turanicum


There are other methods you can use to read files. For example, the `readlines()` returns all the lines as a list of strings.

In [12]:
with open('../data/crops.txt') as f:
    lines = f.readlines()
print(lines[:5])

['Abelmoschus caillei\n', 'Abelmoschus esculentus\n', 'Acacia mearnsii\n', 'Acacia senegal\n', 'Acacia seyal\n']


## Exercise: file iteration

1) Print the last line in the file. 

2) Find out how many _Garcinia_ species are in the file (use the `startswith()` string method).

## Writing to a file

To write to a file, we first have to open it for writing. This is done using one of two modes: 'w' or 'a'.

'w', for write, will let you write into the file. If it doesn't exist, it'll be automatically created. 
#### NOTE: __If the file exists and already has some content, the content will be overwritten.__

'a', for append, is very similar, only it will not overwrite, but append your text to the end of an existing file. 

Writing is done using `print()` by adding the argument `file = <file object>`.

In [13]:
with open(r'tmp.txt','w') as f:
    print('This is the first line', file=f)
    line = 'Another line'
    print(line, file=f)
    msg1 = 'Hello '
    msg2 = 'World!'
    print(msg1 + msg2, file=f)

In [14]:
%less tmp.txt

This is the first line
Another line
Hello World!


## Exercise: copy content

Copy the `tmp.txt` file content to a new file `new.txt`. 

Copy the contents by reading from the existing file and writing to a new file (this is not the efficient way to do it, but it's just an exercise!). 

Don't forget to close the files and check that the writing was successful.

# Filesystem

Python offers plenty of ways to interact with the filesystem through the `os` and `os.path` modules.

Let's import `os`:

In [15]:
import os

Showcase some of the capabilities of `os`:

In [16]:
files = os.listdir()
print(files)

['tmp.txt', 'movies.ipynb', 'strings-lists-loops.ipynb', '.DS_Store', 'iteration.ipynb', 'conda-env.ipynb', 'image-processing.ipynb', 'matplotlib.ipynb', 'modules.ipynb', 'exceptions.ipynb', 'io.ipynb', 'oop.ipynb', 'regression.ipynb', 'img', 'classification.ipynb', 'functions.ipynb', 'numpy.ipynb', 'if-while.ipynb', 'dictionaries.ipynb', 'PCA.ipynb', '.ipynb_checkpoints', 'pandas-seaborn.ipynb', 'types-operators.ipynb', 'memory-model.ipynb']


In [17]:
for fname in files:
    if os.path.isdir(fname):
        print(fname, "is a folder")
    elif os.path.isfile(fname):
        size = os.path.getsize(fname)
        print(fname, "is a file with size", size, "bytes")

tmp.txt is a file with size 49 bytes
movies.ipynb is a file with size 166901 bytes
strings-lists-loops.ipynb is a file with size 60175 bytes
.DS_Store is a file with size 6148 bytes
iteration.ipynb is a file with size 41770 bytes
conda-env.ipynb is a file with size 41617 bytes
image-processing.ipynb is a file with size 3936643 bytes
matplotlib.ipynb is a file with size 16611 bytes
modules.ipynb is a file with size 11628 bytes
exceptions.ipynb is a file with size 28020 bytes
io.ipynb is a file with size 20963 bytes
oop.ipynb is a file with size 74287 bytes
regression.ipynb is a file with size 5743759 bytes
img is a folder
classification.ipynb is a file with size 484851 bytes
functions.ipynb is a file with size 21046 bytes
numpy.ipynb is a file with size 42134 bytes
if-while.ipynb is a file with size 12684 bytes
dictionaries.ipynb is a file with size 24486 bytes
PCA.ipynb is a file with size 2763089 bytes
.ipynb_checkpoints is a folder
pandas-seaborn.ipynb is a file with size 31697 bytes

Here's a combination of functions to get the current directory (`os.getcwd`), change the directory (`os.chdir`), check if a file exists (`os.path.exists`), and split a filename from its extension:

In [18]:
curdir = os.getcwd()
print(curdir)
print(os.path.split(curdir))
os.chdir('../data')
fname = 'crops.txt'
print(fname, 'exists?', os.path.exists(fname))
fname = os.path.splitext('crops.txt')[0] + '.csv'
print(fname, 'exists?', os.path.exists(fname))
os.chdir(curdir)

/Users/yoavram/Work/Teaching/DataSciPy/sessions
('/Users/yoavram/Work/Teaching/DataSciPy', 'sessions')
crops.txt exists? True
crops.csv exists? False


See the [os](https://docs.python.org/3.5/library/os.html) and [os.path](https://docs.python.org/3.5/library/os.path.html#module-os.path) modules for more functions.

# Bonus: Serializing objects

The `json` module allows to encode Python objects to text and decode them back again. It implements the [JSON](http://json.org/) (JavaScript Object Notation) format, a lightweight data interchange format inspired by JavaScript object literal syntax, and is therefore interoperable and widely used outside of the Python ecosystem. Also, the format is human-readable, which allows the developer to inspect the data from file without requiring him to deserialize the data.

We start by importing the module and creating an example data dictionary:

In [19]:
import json

data = { 
    'a_string': 'Hello JSON', 
    'ints_in_a_tuple': (5, 6, 7, 2, 3, 5, 6), 
    'some_number': 5768.4454,
    'list_as_well': [True, False, 'This', 'That']
} 
data

{'a_string': 'Hello JSON',
 'ints_in_a_tuple': (5, 6, 7, 2, 3, 5, 6),
 'some_number': 5768.4454,
 'list_as_well': [True, False, 'This', 'That']}

We **dump** the dictionary into a string:

In [20]:
data_string = json.dumps(data)
data_string

'{"a_string": "Hello JSON", "ints_in_a_tuple": [5, 6, 7, 2, 3, 5, 6], "some_number": 5768.4454, "list_as_well": [true, false, "This", "That"]}'

If we want to save this to a file, we can either write the string to a file or dump directly to a file:

In [21]:
fname = 'json_example.json'

with open(fname, 'w') as f:
     json.dump(data, f)

In [22]:
%less $fname

{"a_string": "Hello JSON", "ints_in_a_tuple": [5, 6, 7, 2, 3, 5, 6], "some_number": 5768.4454, "list_as_well": [true, false, "This", "That"]}

In [23]:
with open(fname, 'r') as myfile:
    data_str=myfile.read()
    obj = json.loads(data_str)

In [24]:
obj

{'a_string': 'Hello JSON',
 'ints_in_a_tuple': [5, 6, 7, 2, 3, 5, 6],
 'some_number': 5768.4454,
 'list_as_well': [True, False, 'This', 'That']}

We can make the file more readable with some configuration:

In [25]:
fname = 'json_example.json'

with open(fname, 'w') as f:
     json.dump(data, f, sort_keys=True, indent=4)#, separators=(',', ': '))

In [26]:
%less $fname

{
    "a_string": "Hello JSON",
    "ints_in_a_tuple": [
        5,
        6,
        7,
        2,
        3,
        5,
        6
    ],
    "list_as_well": [
        true,
        false,
        "This",
        "That"
    ],
    "some_number": 5768.4454
}

# References

- The [pickle](https://docs.python.org/3.5/library/pickle.html?highlight=pickle#module-pickle) module implements binary protocols for serializing and de-serializing a Python object structure. It can deal with (almost) any Python object, but produces binary rather than text files, and is Python specific. The `pickle` API is similar to that of `json`.
- The [io](https://docs.python.org/3.5/library/io.html) module provides facilities for dealing with various types of I/O.

# Solutions

## Solution: pick a number

In [None]:
n = 0

while not 1 <= n <= 10:
    user_input = input("Pick a number between 1 and 10: ")
    try:
        n = int(user_input)
    except ValueError:
        print(user_input, "is not a number...")
        n = 0
print("You picked the number", n)

## Solution: file iteration

In [None]:
with open('../data/crops.txt','r') as f:
    for line in f:
        pass
print(line)

In [None]:
count = 0
with open('../data/crops.txt','r') as f:
    for line in f:
        count += line.startswith('Garcinia')
print(count)

## Colophon
This notebook was written by [Yoav Ram](http://python.yoavram.com).

The notebook was written using [Python](http://python.org/) 3.7.
Dependencies listed in [environment.yml](../environment.yml).

This work is licensed under a CC BY-NC-SA 4.0 International License.

![Python logo](https://www.python.org/static/community_logos/python-logo.png)