# Reading from and Writing to Files using Python

![](https://i.imgur.com/rv8wZ7l.png)



This tutorial covers the following topics:

- Interacting with the filesystem using the `os` module
- Downloading files from the internet using the `urllib` module
- Reading and processing data from text files
- Parsing data from CSV files into dictionaries & lists
- Writing formatted data back to text files

## Interacting with the OS and filesystem

The `os` module in Python provides many functions for interacting with the OS and the filesystem. Let's import it and try out some examples.

In [1]:
import os

We can check the present working directory using the `os.getcwd` function.

In [2]:
os.getcwd()

'/content'

To get the list of files in a directory, use `os.listdir`. You pass an absolute or relative path of a directory as the argument to the function.

In [3]:
help(os.listdir)

Help on built-in function listdir in module posix:

listdir(path=None)
    Return a list containing the names of the files in the directory.
    
    path can be specified as either str, bytes, or a path-like object.  If path is bytes,
      the filenames returned will also be bytes; in all other circumstances
      the filenames returned will be str.
    If path is None, uses the path='.'.
    On some platforms, path may also be specified as an open file descriptor;\
      the file descriptor must refer to a directory.
      If this functionality is unavailable, using it raises NotImplementedError.
    
    The list is in arbitrary order.  It does not include the special
    entries '.' and '..' even if they are present in the directory.



In [4]:
os.listdir('.') # relative path

['.config', '.ipynb_checkpoints', 'ianalyst', 'sample_data']

In [5]:
os.listdir('/content') # absolute path

['.config', '.ipynb_checkpoints', 'ianalyst', 'sample_data']

You can create a new directory using `os.makedirs`. Let's create a new directory called `data`, where we'll later download some files.

In [7]:
os.makedirs('/content/ianalyst/', exist_ok=True)

Can you figure out what the argument `exist_ok` does? Try using the `help` function or [read the documentation](https://docs.python.org/3/library/os.html#os.makedirs).

Let's verify that the directory was created and is currently empty.

In [7]:
os.getcwd()

'/content'

In [None]:
'data' in os.listdir('.')

True

In [None]:
os.listdir('./data')

[]

Let us download some files into the `data` directory using the `urllib` module.

In [8]:
url1 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans1.txt'
url2 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans2.txt'
url3 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans3.txt'

In [18]:
from urllib.request import urlretrieve
import urllib

In [21]:
#urlretrieve(url1, './content/loans1.txt')
urllib.request.urlretrieve(url1, '/content/ianalyst/loans1.txt')


('/content/ianalyst/loans1.txt', <http.client.HTTPMessage at 0x7f02a137bf10>)

In [22]:
urllib.request.urlretrieve(url2, '/content/ianalyst/loans2.txt')
urllib.request.urlretrieve(url3, '/content/ianalyst/loans3.txt')


('/content/ianalyst/loans3.txt', <http.client.HTTPMessage at 0x7f02a1351c90>)

Let's verify that the files were downloaded.

In [23]:
os.listdir('/content/ianalyst')

['loans3.txt', 'loans2.txt', 'loans1.txt']

You can also use the [`requests`](https://docs.python-requests.org/en/master/) library to dowload URLs, although you'll need to [write some additional code](https://stackoverflow.com/questions/44699682/how-to-save-a-file-downloaded-from-requests-to-another-directory) to save the contents of the page to a file.

## Reading from a file 

To read the contents of a file, we first need to open the file using the built-in `open` function. The `open` function returns a file object and provides several methods for interacting with the file's contents.

In [24]:
file1 = open('/content/ianalyst/loans1.txt', mode='r')

The `open` function also accepts a `mode` argument to specifies how we can interact with the file. The following options are supported:

```
    ========= ===============================================================
    Character Meaning
    --------- ---------------------------------------------------------------
    'r'       open for reading (default)
    'w'       open for writing, truncating the file first
    'x'       create a new file and open it for writing
    'a'       open for writing, appending to the end of the file if it exists
    'b'       binary mode
    't'       text mode (default)
    '+'       open a disk file for updating (reading and writing)
    'U'       universal newline mode (deprecated)
    ========= ===============================================================
```

To view the contents of the file, we can use the `read` method of the file object.

In [25]:
file1_contents = file1.read()

In [26]:
print(file1_contents)

amount,duration,rate,down_payment
100000,36,0.08,20000
200000,12,0.1,
628400,120,0.12,100000
4637400,240,0.06,
42900,90,0.07,8900
916000,16,0.13,
45230,48,0.08,4300
991360,99,0.08,
423000,27,0.09,47200


The file contains information about loans. It is a set of comma-separated values (CSV). 

> **CSVs**: A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. A CSV file typically stores tabular data (numbers and text) in plain text, in which case each line will have the same number of fields. (Wikipedia)

The first line of the file is the header, indicating what each of the numbers on the remaining lines represents. Each of the remaining lines provides information about a loan. Thus, the second line `10000,36,0.08,20000` represents a loan with:

* an *amount* of `$10000`, 
* *duration* of `36` months, 
* *rate of interest* of `8%` per annum, and 
* a down payment of `$20000`

The CSV is a standard file format used for sharing data for analysis and visualization. Over the course of this tutorial, we will read the data from these CSV files, process it, and write the results back to files. Before we continue, let's close the file using the `close` method (otherwise, Python will continue to hold the entire file in the RAM)

In [27]:
file1.close()

Once a file is closed, you can no longer read from it.

In [28]:
file1.read()

ValueError: ignored

## Closing files automatically using `with`

To close a file automatically after you've processed it, you can open it using the `with` statement.

In [29]:
with open('/content/ianalyst/loans1.txt') as file2:
    file2_contents = file2.read()
    print(file2_contents)

amount,duration,rate,down_payment
100000,36,0.08,20000
200000,12,0.1,
628400,120,0.12,100000
4637400,240,0.06,
42900,90,0.07,8900
916000,16,0.13,
45230,48,0.08,4300
991360,99,0.08,
423000,27,0.09,47200


Once the statements within the `with` block are executed, the `.close` method on `file2` is automatically invoked. Let's verify this by trying to read from the file object again.

In [30]:
file2.read()

ValueError: ignored

## Reading a file line by line


File objects provide a `readlines` method to read a file line-by-line. 

In [31]:
with open('/content/ianalyst/loans1.txt', 'r') as file3:
    file3_lines = file3.readlines()

In [32]:
file3_lines

['amount,duration,rate,down_payment\n',
 '100000,36,0.08,20000\n',
 '200000,12,0.1,\n',
 '628400,120,0.12,100000\n',
 '4637400,240,0.06,\n',
 '42900,90,0.07,8900\n',
 '916000,16,0.13,\n',
 '45230,48,0.08,4300\n',
 '991360,99,0.08,\n',
 '423000,27,0.09,47200']

## Using Pandas to Read and Write CSVs

There are some limitations to the `read_csv` and `write_csv` functions we've defined above:

* The `read_csv` function fails to create a proper dictionary if any of the values in the CSV files contains commas
* The `write_csv` function fails to create a proper CSV if any of the values to be written contains commas

When a value in a CSV file contains a comma (`,`), the value is generally placed within double quotes. Double quotes (`"`) in values are converted into two double quotes (`""`). Here's an example:

```
title,description
Fast & Furious,"A movie, a race, a franchise"
The Dark Knight,"Gotham, the ""Batman"", and the Joker"
Memento,A guy forgets everything every 15 minutes

```

Let's try it out.

In [33]:
movies_url = "https://gist.githubusercontent.com/aakashns/afee0a407d44bbc02321993548021af9/raw/6d7473f0ac4c54aca65fc4b06ed831b8a4840190/movies.csv"

In [34]:
urlretrieve(movies_url, '/content/ianalyst/movies.csv')

('/content/ianalyst/movies.csv', <http.client.HTTPMessage at 0x7f02a1370510>)

In [38]:
import pandas as pd
movies = pd.read_csv('/content/ianalyst/movies.csv')

In [39]:
movies

Unnamed: 0,title,description
0,Fast & Furious,"A movie, a race, a franchise"
1,The Dark Knight,"Gotham, the ""Batman"", and the Joker"
2,Memento,A guy forgets everything every 15 minutes


As you can seen above, the movie descriptions weren't parsed properly.

To read this CSV properly, we can use the `pandas` library.

In [None]:
!pip install pandas --upgrade --quiet

In [None]:
import pandas as pd

The `pd.read_csv` function can be used to read the CSV file into a pandas data frame: a spreadsheet-like object for analyzing and processing data. We'll learn more about data frames in a future lesson.

In [None]:
movies_dataframe = pd.read_csv('data/movies.csv')

In [None]:
movies_dataframe

Unnamed: 0,title,description
0,Fast & Furious,"A movie, a race, a franchise"
1,The Dark Knight,"Gotham, the ""Batman"", and the Joker"
2,Memento,A guy forgets everything every 15 minutes


## Summary and Further Reading

With this, we complete our discussion of reading from and writing to files in Python. We've covered the following topics in this tutorial:

* Interacting with the file system using the `os` module
* Downloading files from URLs using the `urllib` module
* Opening files using the `open` built-in function
* Reading the contents of a file using `.read`
* Closing a file automatically using `with`
* Reading a file line by line using `readlines`
* Processing data from a CSV file by defining functions
* Using helper functions to build more complex functions
* Writing data to a file using `.write`

This tutorial on working with files in Python is by no means exhaustive. Following are some more resources you should check out:

* Python Tutorial at W3Schools: https://www.w3schools.com/python/
* Practical Python Programming: https://dabeaz-course.github.io/practical-python/Notes/Contents.html
* Python official documentation: https://docs.python.org/3/tutorial/index.html



## Questions for Revision

Try answering the following questions to test your understanding of the topics covered in this notebook:

1. What is the purpose of the `os` module in Python?
2. How do you identify the current working directory in a Jupyter notebook?
3. How do you retrieve the list of files within a directory using Python?
4. How do you create a directory using Python?
5. How do you check whether a file or directory exists on the filesystem? Hint: `os.path.exists`.
6. Where can you find the full list of functions contained in the `os` module?
7. Give examples of 5 useful functions from the `os` and `os.path` modules.
8. How do you download a file from a URL using Python?
9. How do you open a file using Python? Give an example?
10. What are the different modes for opening a file in Python?
11. Can you open a file in multiple modes? Illustrate with an example.
12. What is the file object? How is it useful?
13. How do you read the contents of a file into a string?
14. What is a CSV file? Give an example.
15. How do you close an open file?
16. Why is it essential to close a file after processing it?
17. How do you ensure that files are closed automatically after processing? Give an example.
18. How is the `with` statement useful for working with files?
19. What happens if you try to read from a closed file?
20. How do you read the contents of a file line by line?
21. Write a function to convert the contents of a CSV file into a list of dictionaries (one dictionary for each row of the file).
22. Write a function to convert the contents of a CSV file into a dictionary of lists (one dictionary for each column of the file).
23. How do you write to a file using Python?
24. How is the string `.format` method for writing data to a file in CSV format?
25. Write a function to write data from a list of dictionaries into a CSV file.
26. Write a function to write data from a dictionary of lists into a CSV file.
27. Where can you learn about the methods supported by the file object in Python?
28. How can you read from and write to CSV files using Pandas?
