# Reading and Writing Files in Python

We will learn how to read and write data into files, such as CSV, JSON, text files in Python using io and os modules.<br/>
Docs: https://docs.python.org/3/library/io.html

File formats can be divided into two groups: 1) Flat Files 2) Non-Flat Files.<br/>
#### Flat File: 
In a flat file, records follow a uniform format, and there are no structures for indexing or recognizing relationships between records.<br/>
A flat file can be a plain text file having a TSV (tab separated values), CSV (comma separated format, or a binary file format.<br/>
CSV files contain data values that are separated by ",".

In [None]:
ID,COUNTRY,POP,AREA,GDP,CONT,IND_DAY
CHN,China,1398.72,9596.96,12234.8,Asia,NaN
IND,India,1351.16,3287.26,2575.67,Asia,8/15/1947
USA,US,329.74,9833.52,19485.4,N.America,1776-07-04
IDN,Indonesia,268.07,1910.93,1015.54,Asia,8/17/1945
BRA,Brazil,210.32,8515.77,2055.51,S.America,1822-09-07
PAK,Pakistan,205.71,881.91,302.14,Asia,8/14/1947
NGA,Nigeria,200.96,923.77,375.77,Africa,10/1/1960
BGD,Bangladesh,167.09,147.57,245.63,Asia,3/26/1971
RUS,Russia,146.79,17098.2,1530.75,NaN,6/12/1992
MEX,Mexico,126.58,1964.38,1158.23,N.America,1810-09-16
JPN,Japan,126.22,377.97,4872.42,Asia,NaN
DEU,Germany,83.02,357.11,3693.2,Europe,NaN
FRA,France,67.02,640.68,2582.49,Europe,1789-07-14
GBR,UK,66.44,242.5,2631.23,Europe,NaN
ITA,Italy,60.36,301.34,1943.84,Europe,NaN
ARG,Argentina,44.94,2780.4,637.49,S.America,1816-07-09
DZA,Algeria,43.38,2381.74,167.56,Africa,7/5/1962
CAN,Canada,37.59,9984.67,1647.12,N.America,1867-07-01
AUS,Australia,25.47,7692.02,1408.68,Oceania,NaN
KAZ,Kazakhstan,18.53,2724.9,159.41,Asia,12/16/1991

The above given records in a file can be stored with a delimited file. The most common delimited file is TSV which uses \t.

#### Non-Flat File:
A non-flat file is a file where an index is assigned to every record. The exact location of the record can be known using the index of that record. You would normally need some applications like a database management system to read this type of file.

`XML` and `JSON` are examples of a non-flat file.

Example `XML` file format:

In [None]:
<?xml version="1.0"?>
<table>
<row>
<ID>CHN</ID>
<COUNTRY>China</COUNTRY>
<POP>1398.72</POP>
<AREA>9596.96</AREA>
<GDP>12234.8</GDP>
<CONT>Asia</CONT>
<IND_DAY>null</IND_DAY>
</row>
<row>
#second row starts

Example `JSON` file format:

In [None]:
[{"ID":"CHN","COUNTRY":"China","POP":1398.72,"AREA":9596.96,"GDP":12234.8,"CONT":"Asia","IND_DAY":null},
{"ID":"IND","COUNTRY":"India","POP":1351.16,"AREA":3287.26,"GDP":2575.67,"CONT":"Asia","IND_DAY":"8\/15\/1947"},
 #...
 {"ID":"KAZ","COUNTRY":"Kazakhstan","POP":18.53,"AREA":2724.9,"GDP":159.41,"CONT":"Asia","IND_DAY":"12\/16\/1991"}]

In [3]:
import pandas as pd
csv_file = pd.DataFrame(pd.read_csv("countries.csv", sep = ",", header = 0, index_col = False))
csv_file.to_json("countries.json", orient = "records", date_format = "epoch", double_precision = 10, force_ascii = True, date_unit = "ms", default_handler = None)

## Python File Objects


Python has in-built functions to create, read, write, and manipulate accessible files. The io module is the default module for accessing files that can be used off the shelf without even importing it. Before you read, write, or manipulate the file, you need to make use of the module open(filename, access_mode) that returns a file object called "handle". After which you can simply use this handle to read from or write to a file. Like everything else, files in Python are also treated as an object, which has its own attributes and methods.

An IOError exception is raised if, while opening the file, the operation fails. It could be due to various reasons like trying to read a file that is opened in write mode or accessing a file that is already closed.

As you might have expected from reading the previous section, text files have an End-Of-Line (EOL) character to indicate each line's termination. In Python, the new line character (`\n`) is the default EOL terminator.

#### Open()
The built-in Python function `open()` has the following arguments: `open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)`<br/> 
`open()` function has almost 8 parameters along with their default values for each argument as shown above.

You would be focusing on the first and second parameters for now, which are essential for reading and writing files. And go through other parameters one by one as the tutorial progresses.

Let's understand the first argument, i.e., file.

#### file
`file` is a mandatory argument that you have to provide to the open function while rest all arguments are optional and use their default values. The file argument represents the path where your file resides in your system.

If the path is in the current working directory, you can just provide the filename. If not then you have to provide the absolute path of the file, just like in the following examples: `my_file=open("countries.txt")`<br/> 
If the file resides in a directory other than the current directory, you have to provide the absolute path with the file name. 

In [21]:
help(open)

Help on built-in function open in module io:

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
    Open file and return a stream.  Raise OSError upon failure.
    
    file is either a text or byte string giving the name (and the path
    if the file isn't in the current working directory) of the file to
    be opened or an integer file descriptor of the file to be
    wrapped. (If a file descriptor is given, it is closed when the
    returned I/O object is closed, unless closefd is set to False.)
    
    mode is an optional string that specifies the mode in which the file
    is opened. It defaults to 'r' which means open for reading in text
    mode.  Other common values are 'w' for writing (truncating the file if
    it already exists), 'x' for creating and writing to a new file, and
    'a' for appending (which on some Unix systems, means that all writes
    append to the end of the file regardless of the current seek position

We will check the current Python working directory and then set it to the directory where the file is located.

In [1]:
%pwd

'C:\\Users\\nvanb\\Documents\\NathanPC\\USF\\python4DS'

In [6]:
# %cd "C:\\Users\\yasin.unlu\\Documents\\Original Docs\\Documents1\\Docs\\Teaching\\PythonForDataScienceSummer2020\\Week-6"

C:\Users\yasin.unlu\Documents\Original Docs\Documents1\Docs\Teaching\PythonForDataScienceSummer2020\Week-6


In [9]:
my_file = open("data\\countries - comma.txt") #open the file
my_file.read() #read a comma separated values file

'ID,COUNTRY,POP,AREA,GDP,CONT,IND_DAY\nCHN,China,1398.72,9596.96,12234.8,Asia,NaN\nIND,India,1351.16,3287.26,2575.67,Asia,8/15/1947\nUSA,US,329.74,9833.52,19485.4,N.America,1776-07-04\nIDN,Indonesia,268.07,1910.93,1015.54,Asia,8/17/1945\nBRA,Brazil,210.32,8515.77,2055.51,S.America,1822-09-07\nPAK,Pakistan,205.71,881.91,302.14,Asia,8/14/1947\nNGA,Nigeria,200.96,923.77,375.77,Africa,10/1/1960\nBGD,Bangladesh,167.09,147.57,245.63,Asia,3/26/1971\nRUS,Russia,146.79,17098.2,1530.75,NaN,6/12/1992\nMEX,Mexico,126.58,1964.38,1158.23,N.America,1810-09-16\nJPN,Japan,126.22,377.97,4872.42,Asia,NaN\nDEU,Germany,83.02,357.11,3693.2,Europe,NaN\nFRA,France,67.02,640.68,2582.49,Europe,1789-07-14\nGBR,UK,66.44,242.5,2631.23,Europe,NaN\nITA,Italy,60.36,301.34,1943.84,Europe,NaN\nARG,Argentina,44.94,2780.4,637.49,S.America,1816-07-09\nDZA,Algeria,43.38,2381.74,167.56,Africa,7/5/1962\nCAN,Canada,37.59,9984.67,1647.12,N.America,1867-07-01\nAUS,Australia,25.47,7692.02,1408.68,Oceania,NaN\nKAZ,Kazakhstan,18.5

In [12]:
my_file.close()

In [14]:
my_file.closed

True

We could provide exact directory of the file. Then we can read it from any working directory.

In [11]:
# my_file = open("C:\\Users\\yasin.unlu\\Documents\\Original Docs\\Documents1\\Docs\\Teaching\\PythonForDataScienceSummer2020\\Week-6\\countries - comma.txt") #open the file
my_file.read()

''

If we provide a wrong directory, it will throw an error.

In [19]:
my_file = open("SomeDirectory\\countries - comma.txt") #This will throw an error.

FileNotFoundError: [Errno 2] No such file or directory: 'SomeDirectory\\countries - comma.txt'

In [17]:
my_file = open("countries - tab.txt") #open the file
my_file.read() #read a tab separated values file

'ID\tCOUNTRY\tPOP\tAREA\tGDP\tCONT\tIND_DAY\nCHN\tChina\t1398.72\t9596.96\t12234.8\tAsia\tNaN\nIND\tIndia\t1351.16\t3287.26\t2575.67\tAsia\t8/15/1947\nUSA\tUS\t329.74\t9833.52\t19485.4\tN.America\t1776-07-04\nIDN\tIndonesia\t268.07\t1910.93\t1015.54\tAsia\t8/17/1945\nBRA\tBrazil\t210.32\t8515.77\t2055.51\tS.America\t1822-09-07\nPAK\tPakistan\t205.71\t881.91\t302.14\tAsia\t8/14/1947\nNGA\tNigeria\t200.96\t923.77\t375.77\tAfrica\t10/1/1960\nBGD\tBangladesh\t167.09\t147.57\t245.63\tAsia\t3/26/1971\nRUS\tRussia\t146.79\t17098.2\t1530.75\tNaN\t6/12/1992\nMEX\tMexico\t126.58\t1964.38\t1158.23\tN.America\t1810-09-16\nJPN\tJapan\t126.22\t377.97\t4872.42\tAsia\tNaN\nDEU\tGermany\t83.02\t357.11\t3693.2\tEurope\tNaN\nFRA\tFrance\t67.02\t640.68\t2582.49\tEurope\t1789-07-14\nGBR\tUK\t66.44\t242.5\t2631.23\tEurope\tNaN\nITA\tItaly\t60.36\t301.34\t1943.84\tEurope\tNaN\nARG\tArgentina\t44.94\t2780.4\t637.49\tS.America\t1816-07-09\nDZA\tAlgeria\t43.38\t2381.74\t167.56\tAfrica\t7/5/1962\nCAN\tCanada\t37

Let's understand the second argument of the open function, i.e., access modes.

#### Access Modes
Access modes define in which way you want to open a file, whether you want to open a file in:

*read-only mode<br/>
*write-only mode<br/>
*append mode<br/>
*both read and write mode<br/>

Though a lot of access modes exist as shown in the below table, the most commonly used ones are read and write modes. It specifies where you want to start reading or writing in the file.

You use `'r'`, the default mode, to read the file. In other cases where you want to write or append, you use `'w'` or `'a'`, respectively.

#### Reading From a File
Let's try out all the reading methods for reading from a file, and you will also explore the access modes along with it! There are three ways to read from a file.

`read([n])`<br/>
`readline([n])`<br/>
`readlines()`<br/>

Here n is the number of bytes to be read. If nothing is passed to n, then the complete file is considered to be read.

Create a file as below: 1st line 2nd line 3rd line 4th line 5th line Let's understand what each read method does:

In [15]:
my_file=open("data\countries - comma.txt","r") #default mode is already "r"
print(my_file.read()) #read all lines

ID,COUNTRY,POP,AREA,GDP,CONT,IND_DAY
CHN,China,1398.72,9596.96,12234.8,Asia,NaN
IND,India,1351.16,3287.26,2575.67,Asia,8/15/1947
USA,US,329.74,9833.52,19485.4,N.America,1776-07-04
IDN,Indonesia,268.07,1910.93,1015.54,Asia,8/17/1945
BRA,Brazil,210.32,8515.77,2055.51,S.America,1822-09-07
PAK,Pakistan,205.71,881.91,302.14,Asia,8/14/1947
NGA,Nigeria,200.96,923.77,375.77,Africa,10/1/1960
BGD,Bangladesh,167.09,147.57,245.63,Asia,3/26/1971
RUS,Russia,146.79,17098.2,1530.75,NaN,6/12/1992
MEX,Mexico,126.58,1964.38,1158.23,N.America,1810-09-16
JPN,Japan,126.22,377.97,4872.42,Asia,NaN
DEU,Germany,83.02,357.11,3693.2,Europe,NaN
FRA,France,67.02,640.68,2582.49,Europe,1789-07-14
GBR,UK,66.44,242.5,2631.23,Europe,NaN
ITA,Italy,60.36,301.34,1943.84,Europe,NaN
ARG,Argentina,44.94,2780.4,637.49,S.America,1816-07-09
DZA,Algeria,43.38,2381.74,167.56,Africa,7/5/1962
CAN,Canada,37.59,9984.67,1647.12,N.America,1867-07-01
AUS,Australia,25.47,7692.02,1408.68,Oceania,NaN
KAZ,Kazakhstan,18.53,2724.9,159.41,Asia,

In [24]:
my_file.seek(0)

0

In [20]:
# my_file=open("countries - comma.txt","r")
my_file.seek(0)
print(my_file.read(3)) #first 3 bites

ID,


`readline(n)` outputs at most n bytes of a single line of a file. It does not read more than one line.

In [23]:
# my_file=open("countries - comma.txt","r")
my_file.seek(0)
print(my_file.readline(3)) #first 3 bites

ID,


In [38]:
# my_file=open("countries - comma.txt","r")
print(my_file.readline(100000)) #It does not read more than one line.

ID,COUNTRY,POP,AREA,GDP,CONT,IND_DAY



#### Closing Python Files with `close()`
Use the close() method with file handle to close the file. When you use this method, you clear all buffer and close the file.

In [39]:
my_file.close()

You can use a for `loop` to read the file line by line:

In [41]:
my_file=open("countries - comma.txt","r")

for line in my_file:
    print(line)
my_file.close()

ID,COUNTRY,POP,AREA,GDP,CONT,IND_DAY

CHN,China,1398.72,9596.96,12234.8,Asia,NaN

IND,India,1351.16,3287.26,2575.67,Asia,8/15/1947

USA,US,329.74,9833.52,19485.4,N.America,1776-07-04

IDN,Indonesia,268.07,1910.93,1015.54,Asia,8/17/1945

BRA,Brazil,210.32,8515.77,2055.51,S.America,1822-09-07

PAK,Pakistan,205.71,881.91,302.14,Asia,8/14/1947

NGA,Nigeria,200.96,923.77,375.77,Africa,10/1/1960

BGD,Bangladesh,167.09,147.57,245.63,Asia,3/26/1971

RUS,Russia,146.79,17098.2,1530.75,NaN,6/12/1992

MEX,Mexico,126.58,1964.38,1158.23,N.America,1810-09-16

JPN,Japan,126.22,377.97,4872.42,Asia,NaN

DEU,Germany,83.02,357.11,3693.2,Europe,NaN

FRA,France,67.02,640.68,2582.49,Europe,1789-07-14

GBR,UK,66.44,242.5,2631.23,Europe,NaN

ITA,Italy,60.36,301.34,1943.84,Europe,NaN

ARG,Argentina,44.94,2780.4,637.49,S.America,1816-07-09

DZA,Algeria,43.38,2381.74,167.56,Africa,7/5/1962

CAN,Canada,37.59,9984.67,1647.12,N.America,1867-07-01

AUS,Australia,25.47,7692.02,1408.68,Oceania,NaN

KAZ,Kazakhstan,18.53

`readlines()` method maintains a list of each line in the file which can be iterated using a for loop:

In [25]:
# my_file=open("countries - comma.txt","r")
my_file.seek(0)
my_file.readlines() # each row is a string in a list

['ID,COUNTRY,POP,AREA,GDP,CONT,IND_DAY\n',
 'CHN,China,1398.72,9596.96,12234.8,Asia,NaN\n',
 'IND,India,1351.16,3287.26,2575.67,Asia,8/15/1947\n',
 'USA,US,329.74,9833.52,19485.4,N.America,1776-07-04\n',
 'IDN,Indonesia,268.07,1910.93,1015.54,Asia,8/17/1945\n',
 'BRA,Brazil,210.32,8515.77,2055.51,S.America,1822-09-07\n',
 'PAK,Pakistan,205.71,881.91,302.14,Asia,8/14/1947\n',
 'NGA,Nigeria,200.96,923.77,375.77,Africa,10/1/1960\n',
 'BGD,Bangladesh,167.09,147.57,245.63,Asia,3/26/1971\n',
 'RUS,Russia,146.79,17098.2,1530.75,NaN,6/12/1992\n',
 'MEX,Mexico,126.58,1964.38,1158.23,N.America,1810-09-16\n',
 'JPN,Japan,126.22,377.97,4872.42,Asia,NaN\n',
 'DEU,Germany,83.02,357.11,3693.2,Europe,NaN\n',
 'FRA,France,67.02,640.68,2582.49,Europe,1789-07-14\n',
 'GBR,UK,66.44,242.5,2631.23,Europe,NaN\n',
 'ITA,Italy,60.36,301.34,1943.84,Europe,NaN\n',
 'ARG,Argentina,44.94,2780.4,637.49,S.America,1816-07-09\n',
 'DZA,Algeria,43.38,2381.74,167.56,Africa,7/5/1962\n',
 'CAN,Canada,37.59,9984.67,1647.12

In [27]:
my_file.close()
my_file.closed

True

#### Writing to a File
You can use three methods to write to a file in Python:

`write(string)` (for text)<br/>
`writelines(list)`<br/>

Let's create a new file. The following will create a new file in the specified folder because it does not exist. Remember to give correct path with correct filename; otherwise, you will get an error:

Create a notepad file and write some text in it. Make sure to save the file as .txt and save it to the working directory of Python.

In [40]:
new_file=open("my_file.txt", mode="w") #will create a brand new every time.

In [41]:
new_file.write("This is the first line.\n")
new_file.write("This is the second line.\n")

# new_file.close()

25

In [42]:
new_file.write("This is the 3rd line.\n")
new_file.write("This is the 4th line.\n")

new_file.close()

In [45]:
new_file.closed

True

### alternative method: `with`
It will automatically close the file when finished! (by calling the dunder method `__exit__()` behind the scenes at the end)   
Also, since a file must be opened to read or write, we might as well put the operations together in an indented block.


In [38]:
with open("my_file.txt", "w") as file:
    file.write("Here is a new line.\n")
    file.write("The old lines were overwritten.")

In [44]:
file.closed

True

#### Append Mode
Now let's write a list to this file with `a+` mode:

In [47]:
new_file.closed

True

In [13]:
new_lines=["This is the third line.\n","This is the fourth line\n"] #we want to write each of elements in a new line
new_file=open("my_file.txt", mode = "a+") #append the existing file
new_file.writelines(new_lines)
for line in new_file:
    print(line)
new_file.close()

#### `seek` Method<br/>
If the file is open, reading from a file does not print anything because the file cursor is at the end of the file. To set the cursor at the beginning, you can use `seek()` method of file object:


In [14]:
new_lines = ["This is the fifth line.\n","This is the sixth line.\n"]
new_file = open("my_file.txt", mode = "a+")
for line in new_lines:
    new_file.write(line)
print("Current byte at which the file cursor is:", new_file.tell()) #the end of file.

Current byte at which the file cursor is: 150


In [15]:
new_file.seek(0) #bring the cursor at the beginning of the file.
for line in new_file:
    print(line)

This is the first line.

This is the second line.

This is the third line

This is the fourth line

This is the fifth line.

This is the sixth line.



We can use `seek(offset, reference_point)` to set the cursor anywhere we wannt in the file.<br/>
The reference points are 0 (the beginning of the file and is the default), 1 (the current position of file), and 2 (the end of the file).

In [16]:
#Notice that the file is stil open.
new_file.seek(4, 0) #place the cursor 4 bytes from the beginning
print(new_file.readline())
new_file.close()

 is the first line.



#### `next` Method
This method allows us to iterate trough new lines in the file.

In [19]:
new_file = open("my_file.txt","r")
for index in range(6):
    line = next(new_file)
    print(line)
new_file.close()

This is the first line.

This is the second line.

This is the third line

This is the fourth line

This is the fifth line.

This is the sixth line.



In [20]:
new_file = open("my_file.txt","r") #we can accomplish the same via readline()
for index in range(6):
    line = new_file.readline()
    print(line)
new_file.close()

This is the first line.

This is the second line.

This is the third line

This is the fourth line

This is the fifth line.

This is the sixth line.



#### `flush` Method
We cannot write back to file by using `write()` method before `close()` method is called. If we want to keep the file open, then we first need to clear the buffer through `flush()`.

In [44]:
new_file = open("my_file.txt", mode="a+") #open the file

In [47]:
new_file.write("This is a new line.............................\n") #this will not write until we call close() method.

48

In [51]:
new_file.close() #now check to see the new line is written. 

In [52]:
new_file = open("my_file.txt", mode="a+") #open the file
new_file.write("This is a new line.............................\n") #this will not write until we call close() method.
new_file.flush() #when we call flush() method, it will write while the file is open.
new_file.closed #will return false

False

In [53]:
new_file.close()
new_file.closed #will return true now

True

#### Reading and Writing to a JSON File 
You can also write your data to `.json` files.<br/>

JSON stands for **J**ava**S**cript **O**bject **N**otation.<br/> 

JSON has become a popular method for delivering structured information over a network and or between different platforms.<br/> 

It is basically text with some structure and saving it as `.json` tells how to read the structure; otherwise, it is just a plain text file. It stores data as key: value pairs. The structure can be simple to complex.

Take a look at the following simple JSON for countries and their features:

In [None]:
json_data = '[
    {"ID":"CHN","COUNTRY":"China","POP":1398.72,"AREA":9596.96,"GDP":12234.8,"CONT":"Asia","IND_DAY":null},\
{"ID":"IND","COUNTRY":"India","POP":1351.16,"AREA":3287.26,"GDP":2575.67,"CONT":"Asia","IND_DAY":"8\/15\/1947"}
]'

JSON consists of an array of key: value pairs.<br/> 
Anything before : is called key and after : is called value. This is very similar to Python dictionaries.

You can see that the data is separated by, and that curly braces define objects. Square brackets are used to define arrays in more complex JSON files.

In [None]:
json_data = '[
{
  "colors": 
    [
    {
      "color": "black",
      "category": "hue",
      "type": "primary",
      "code": {
        "rgba": [255,255,255,1],
        "hex": "#000"
      }
    },
    {
      "color": "white",
      "category": "value",
      "code": {
        "rgba": [0,0,0,1],
        "hex": "#FFF"
      }
    }
    ]
}
]'

Note that JSON files can hold different data types in one object as well.

When you read the file with read(), you read strings from a file. That means that when you read numbers, you would need to convert them to integers with data type conversion functions like int(). For more complex use cases, you can always use the JSON module.

If you have an object x, you can view its JSON string representation with a simple line of code:

In [48]:
import json as JSON

my_data = '[{"ID":"CHN","COUNTRY":"China","POP":1398.72,"AREA":9596.96,"GDP":12234.8,"CONT":"Asia","IND_DAY":null},\
{"ID":"IND","COUNTRY":"India","POP":1351.16,"AREA":3287.26,"GDP":2575.67,"CONT":"Asia","IND_DAY":"8\/15\/1947"}]'

In [51]:
json_object = JSON.loads(my_data)
json_object

[{'ID': 'CHN',
  'COUNTRY': 'China',
  'POP': 1398.72,
  'AREA': 9596.96,
  'GDP': 12234.8,
  'CONT': 'Asia',
  'IND_DAY': None},
 {'ID': 'IND',
  'COUNTRY': 'India',
  'POP': 1351.16,
  'AREA': 3287.26,
  'GDP': 2575.67,
  'CONT': 'Asia',
  'IND_DAY': '8/15/1947'}]

In [52]:
json_formatted_str = JSON.dumps(json_object, indent = 4)

print(json_formatted_str) #we can print it in a nice format

[
    {
        "ID": "CHN",
        "COUNTRY": "China",
        "POP": 1398.72,
        "AREA": 9596.96,
        "GDP": 12234.8,
        "CONT": "Asia",
        "IND_DAY": null
    },
    {
        "ID": "IND",
        "COUNTRY": "India",
        "POP": 1351.16,
        "AREA": 3287.26,
        "GDP": 2575.67,
        "CONT": "Asia",
        "IND_DAY": "8/15/1947"
    }
]


In [91]:
# my_file = open("countries - JSON.txt", "r")
# my_data = my_file.read()
# my_file.close()

In [53]:
with open("data\countries - JSON.txt", "r") as json_file:
    json_data = json_file.read()

In [54]:
json_file.closed

True

In [55]:
json_data

'[{"ID":"CHN","COUNTRY":"China","POP":1398.72,"AREA":9596.96,"GDP":12234.8,"CONT":"Asia","IND_DAY":null},\n{"ID":"IND","COUNTRY":"India","POP":1351.16,"AREA":3287.26,"GDP":2575.67,"CONT":"Asia","IND_DAY":"8\\/15\\/1947"},\n{"ID":"USA","COUNTRY":"US","POP":329.74,"AREA":9833.52,"GDP":19485.4,"CONT":"N.America","IND_DAY":"1776-07-04"},\n{"ID":"IDN","COUNTRY":"Indonesia","POP":268.07,"AREA":1910.93,"GDP":1015.54,"CONT":"Asia","IND_DAY":"8\\/17\\/1945"},\n{"ID":"BRA","COUNTRY":"Brazil","POP":210.32,"AREA":8515.77,"GDP":2055.51,"CONT":"S.America","IND_DAY":"1822-09-07"},\n{"ID":"PAK","COUNTRY":"Pakistan","POP":205.71,"AREA":881.91,"GDP":302.14,"CONT":"Asia","IND_DAY":"8\\/14\\/1947"},\n{"ID":"NGA","COUNTRY":"Nigeria","POP":200.96,"AREA":923.77,"GDP":375.77,"CONT":"Africa","IND_DAY":"10\\/1\\/1960"},\n{"ID":"BGD","COUNTRY":"Bangladesh","POP":167.09,"AREA":147.57,"GDP":245.63,"CONT":"Asia","IND_DAY":"3\\/26\\/1971"},\n{"ID":"RUS","COUNTRY":"Russia","POP":146.79,"AREA":17098.2,"GDP":1530.75,"C

In [93]:
json_object = JSON.loads(my_data)
print(json_object) #not pretty

[{'ID': 'CHN', 'COUNTRY': 'China', 'POP': 1398.72, 'AREA': 9596.96, 'GDP': 12234.8, 'CONT': 'Asia', 'IND_DAY': None}, {'ID': 'IND', 'COUNTRY': 'India', 'POP': 1351.16, 'AREA': 3287.26, 'GDP': 2575.67, 'CONT': 'Asia', 'IND_DAY': '8/15/1947'}, {'ID': 'USA', 'COUNTRY': 'US', 'POP': 329.74, 'AREA': 9833.52, 'GDP': 19485.4, 'CONT': 'N.America', 'IND_DAY': '1776-07-04'}, {'ID': 'IDN', 'COUNTRY': 'Indonesia', 'POP': 268.07, 'AREA': 1910.93, 'GDP': 1015.54, 'CONT': 'Asia', 'IND_DAY': '8/17/1945'}, {'ID': 'BRA', 'COUNTRY': 'Brazil', 'POP': 210.32, 'AREA': 8515.77, 'GDP': 2055.51, 'CONT': 'S.America', 'IND_DAY': '1822-09-07'}, {'ID': 'PAK', 'COUNTRY': 'Pakistan', 'POP': 205.71, 'AREA': 881.91, 'GDP': 302.14, 'CONT': 'Asia', 'IND_DAY': '8/14/1947'}, {'ID': 'NGA', 'COUNTRY': 'Nigeria', 'POP': 200.96, 'AREA': 923.77, 'GDP': 375.77, 'CONT': 'Africa', 'IND_DAY': '10/1/1960'}, {'ID': 'BGD', 'COUNTRY': 'Bangladesh', 'POP': 167.09, 'AREA': 147.57, 'GDP': 245.63, 'CONT': 'Asia', 'IND_DAY': '3/26/1971'}, 

In [92]:
json_object = JSON.loads(my_data)

json_formatted_str = JSON.dumps(json_object, indent = 2) #we can print it in a nice format

print(json_formatted_str) 

[
  {
    "ID": "CHN",
    "COUNTRY": "China",
    "POP": 1398.72,
    "AREA": 9596.96,
    "GDP": 12234.8,
    "CONT": "Asia",
    "IND_DAY": null
  },
  {
    "ID": "IND",
    "COUNTRY": "India",
    "POP": 1351.16,
    "AREA": 3287.26,
    "GDP": 2575.67,
    "CONT": "Asia",
    "IND_DAY": "8/15/1947"
  },
  {
    "ID": "USA",
    "COUNTRY": "US",
    "POP": 329.74,
    "AREA": 9833.52,
    "GDP": 19485.4,
    "CONT": "N.America",
    "IND_DAY": "1776-07-04"
  },
  {
    "ID": "IDN",
    "COUNTRY": "Indonesia",
    "POP": 268.07,
    "AREA": 1910.93,
    "GDP": 1015.54,
    "CONT": "Asia",
    "IND_DAY": "8/17/1945"
  },
  {
    "ID": "BRA",
    "COUNTRY": "Brazil",
    "POP": 210.32,
    "AREA": 8515.77,
    "GDP": 2055.51,
    "CONT": "S.America",
    "IND_DAY": "1822-09-07"
  },
  {
    "ID": "PAK",
    "COUNTRY": "Pakistan",
    "POP": 205.71,
    "AREA": 881.91,
    "GDP": 302.14,
    "CONT": "Asia",
    "IND_DAY": "8/14/1947"
  },
  {
    "ID": "NGA",
    "COUNTRY": "Nigeria",

### Question?
Why do we need to use the `.loads()` method?

It seems like `.dumps()` will work without doing `.loads()` first

* `json.dumps()` formats a pyhton object as a **string** of JSON

In [56]:
# json_object = JSON.loads(json_data)

json_strformat = JSON.dumps(json_data, indent = 2) #we can print it in a nice format

print(json_formatted_str) 

[
    {
        "ID": "CHN",
        "COUNTRY": "China",
        "POP": 1398.72,
        "AREA": 9596.96,
        "GDP": 12234.8,
        "CONT": "Asia",
        "IND_DAY": null
    },
    {
        "ID": "IND",
        "COUNTRY": "India",
        "POP": 1351.16,
        "AREA": 3287.26,
        "GDP": 2575.67,
        "CONT": "Asia",
        "IND_DAY": "8/15/1947"
    }
]


***

To write the JSON in a file, you can use the `write()` method. 

In [101]:
import json as JSON
my_file = open("countries - JSON.txt", "r")
my_data = my_file.read()
my_file.close()

json_object = JSON.loads(my_data)
json_formatted_str = JSON.dumps(json_object, indent = 2) #we can print it in a nice format
my_file = open("countries.json","w") # create and open an new file to write to
my_file.write(json_formatted_str)
my_file.close()

Now let's read a JSON file.

In [103]:
my_file = open('countries.json', 'r') 
json_object = JSON.load(my_file)  #
print(JSON.dumps(json_object, indent=2))
my_file.close()

[
  {
    "ID": "CHN",
    "COUNTRY": "China",
    "POP": 1398.72,
    "AREA": 9596.96,
    "GDP": 12234.8,
    "CONT": "Asia",
    "IND_DAY": null
  },
  {
    "ID": "IND",
    "COUNTRY": "India",
    "POP": 1351.16,
    "AREA": 3287.26,
    "GDP": 2575.67,
    "CONT": "Asia",
    "IND_DAY": "8/15/1947"
  },
  {
    "ID": "USA",
    "COUNTRY": "US",
    "POP": 329.74,
    "AREA": 9833.52,
    "GDP": 19485.4,
    "CONT": "N.America",
    "IND_DAY": "1776-07-04"
  },
  {
    "ID": "IDN",
    "COUNTRY": "Indonesia",
    "POP": 268.07,
    "AREA": 1910.93,
    "GDP": 1015.54,
    "CONT": "Asia",
    "IND_DAY": "8/17/1945"
  },
  {
    "ID": "BRA",
    "COUNTRY": "Brazil",
    "POP": 210.32,
    "AREA": 8515.77,
    "GDP": 2055.51,
    "CONT": "S.America",
    "IND_DAY": "1822-09-07"
  },
  {
    "ID": "PAK",
    "COUNTRY": "Pakistan",
    "POP": 205.71,
    "AREA": 881.91,
    "GDP": 302.14,
    "CONT": "Asia",
    "IND_DAY": "8/14/1947"
  },
  {
    "ID": "NGA",
    "COUNTRY": "Nigeria",