# File I/O
Discusses techniques for storing and retrieving data from files.

## open()
`open()` provides an object for reading or writing files. It has a required parameter called `file` which is a string that contains the path of the file to be opened. It also has an optional parameter called `mode`, which determines whether we get read or write access to the file. By default, it will only give us `read` access.

Some common `mode` values:
- `r`: Allows reading the file as string
- `w`: Allows writing to the file as string (overwriting the contents)
- `w+`: Allows reading and writing to the file as string (overwriting the contents)
- `a`: Allows writing to the end of the file as string
- `a+`: Allows reading and writing to the end of the file as string

More details can be found on Python's documentation:

https://docs.python.org/3/library/functions.html#open

**If not using in a `with` statement, always remember to close the file so that other applications can use it**

In [31]:
readme_file = open('README.md')
readme_file.close()

with open('README.md') as readme_file:
    pass

## read()
`read()` fetches the file and returns the output. Generally, this will output a string value unless the mode contains `b`, indicating a `bytes` instead. Useful when the file needs to be read and processed all at once (useful for **JSONs**).

`read()` starts from the first value in the file and moves up to the end of the file. Once the end is reached, it no longer returns anything.

In [1]:
with open('author_patents.json') as json_file:
    print(json_file.read())

    # NOTE: This allows re-reading of the file
    # If the file is small enough, store its contents in a variable instead to
    # avoid reading too many times since it's slower than fetching from memory
    # json_file.seek(0)

    print(json_file.read())

{
    "author": "Bob",
    "email": "bob@gmail.com",
    "patents": [
        "computer",
        "electronics",
        "website"
    ],
    "state": "California"
}



## readlines()
`readlines()` fetches the file and returns the output as a list (separated by the `newline`). Generally, the list will contain string values unless the mode contains `b`, indicating a `bytes` instead. Useful when the file needs to be read line by line (useful for **CSVs**).

`readlines()` starts from the first line in the file and moves up to the end of the file. Once the end is reached, it no longer returns anything.

In [2]:
with open('author_patents.csv') as csv_file:
    for line in csv_file.readlines():
        print(line)

    # NOTE: This allows re-reading of the file
    # If the file is small enough, store its contents in a variable instead to
    # avoid reading too many times since it's slower than fetching from memory
    # csv_file.seek(0)

    print(csv_file.read())

author,email,patents,state

Ana,ana@yahoo.com,"[""computer"",""electronics"",""light bulb""]",California

Bob,bob@gmail.com,"[""computer"",""electronics"",""website""]",California

Kay,kay@hotmail.com,"[""aviation""]","New York"



## csv module
`csv` module allows us to handle CSVs which we can use for storing a list of named values. Useful when used as temporary storage before moving the data to a database or when viewing the data in a spreadsheet.

In [3]:
import json
from csv import DictReader

with open('author_patents.csv') as csv_file:
    csv_reader = DictReader(csv_file)
    # csv_reader (DictReader) returns a list of dictionaries
    for row in csv_reader:
        # Convert the `patents` JSON String to a dictionary
        # row['patents'] = json.loads(row['patents'])
        print(
            # Pretty print
            json.dumps(row, indent=2)
        )

{
  "author": "Ana",
  "email": "ana@yahoo.com",
  "patents": "[\"computer\",\"electronics\",\"light bulb\"]",
  "state": "California"
}
{
  "author": "Bob",
  "email": "bob@gmail.com",
  "patents": "[\"computer\",\"electronics\",\"website\"]",
  "state": "California"
}
{
  "author": "Kay",
  "email": "kay@hotmail.com",
  "patents": "[\"aviation\"]",
  "state": "New York"
}
