# Data Has To Go Somewhere

Now that we know how to read in a generic text file into our Python modules, we can now look into the different kinds of structured text files that you will encounter regularly. The benefit of having structured text files that everyone agrees upon is the ability to share and consume information from different applications easily. Let's look into some of these formats and how we can access them in Python.

## CSV: Comma Separated Value

CSV files are files that are typically separated by commas for each item, and lines for each row, and they denote a table. To get a good visual, think of Microsoft Excel: Excel can open CSV files and display them to you as a table. 

However, even though the "C" in CSV stands for comma, there can be other delimiters such as ' ', '|', '\t', and so on. And sometimes the first row stands for the header row and does not denote data. But luckily Python has one object that makes reading and writing CSV files easy: 

In [1]:
import csv

grades = [
    ['John', 88],
    ['Kate', 93],
    ['Harry', 93],
    ['Linda', 87],
    ['Harriet', 91]
]

grades_csv_write = open('grades.csv', 'wt')
csvout = csv.writer(grades_csv_write)
csvout.writerows(grades)

grades_csv_write.close()

Now let's take a look at the file that was just created:

In [2]:
!cat grades.csv

John,88
Kate,93
Harry,93
Linda,87
Harriet,91


We have just created a comma separated file of a 5 by 2 table of names and grades. You could open this file in Excel and see the data in tabular format, but the data itself is just a plain text file.

Notice we created a csvout object that is a writer object? This is how Python makes it very simple to write (and read) csv files: just use the functions they provide.

We can read these files back in just as easily:

In [3]:
import csv

grades_csv_read = open('grades.csv', 'rt')
csvin = csv.reader(grades_csv_read)

for row in csvin:
    print(row)

['John', '88']
['Kate', '93']
['Harry', '93']
['Linda', '87']
['Harriet', '91']


Here you can see that once you create a csv reader object, you can interate row by row and get each row in the csv file as a list that you can use in your application. No need to write a special parser to split and manipulate the original text file: this is all done for you.

## XML: eXtensible Markup Language

For more structured data, XML is used to be able to represent relationships between data. Python gives us lots of ways to read the data.

Let's first create our XML text file:

In [4]:
xml_data = '''<?xml version="1.0"?>
<students>
	<student name="John">
		<grade value="88" />
	</student>
	<student name="Kate">
		<grade value="93" />
	</student>
	<student name="Harry">
		<grade value="93" />
	</student>
	<student name="Linda">
		<grade value="87" />
	</student>
	<student name="Harriet">
		<grade value="91" />
	</student>
</students>'''

xml_data_file = open('grades.xml', 'wt')
xml_data_file.write(xml_data)
xml_data_file.close()

In [5]:
!cat grades.xml

<?xml version="1.0"?>
<students>
	<student name="John">
		<grade value="88" />
	</student>
	<student name="Kate">
		<grade value="93" />
	</student>
	<student name="Harry">
		<grade value="93" />
	</student>
	<student name="Linda">
		<grade value="87" />
	</student>
	<student name="Harriet">
		<grade value="91" />
	</student>
</students>

So this is just a regular text file that happens to be XML. Let's now read in that text file as an XML file so that we can programatically access this xml file 

In [6]:
from xml.etree import ElementTree  
tree = ElementTree.ElementTree(file='grades.xml')
root = tree.getroot()
print(root.tag)

students


So we use the module ElementTree from the xml.etree package that gives us the ability to traverse the xml tree programmatically. By creating a tree and retrieving the root, we can get the name of the root tag by accessing the tag property.

Let's now traverse the tree to demonstrate how you can access xml data:

In [7]:
for child in root:
    print(' tag:', child.tag, 'attributes:', child.attrib)

    for grandchild in child:
        print('\ttag:', grandchild.tag, 'attributes:', grandchild.attrib)

 tag: student attributes: {'name': 'John'}
	tag: grade attributes: {'value': '88'}
 tag: student attributes: {'name': 'Kate'}
	tag: grade attributes: {'value': '93'}
 tag: student attributes: {'name': 'Harry'}
	tag: grade attributes: {'value': '93'}
 tag: student attributes: {'name': 'Linda'}
	tag: grade attributes: {'value': '87'}
 tag: student attributes: {'name': 'Harriet'}
	tag: grade attributes: {'value': '91'}


The ElementTree objects have tag and attribute properties that gives you assess to the tag name and attributes. The nodes are also lists that give you access to that node's children as well. There are a number of other operations you can use for XML files: check out the [documentation](https://docs.python.org/3.3/library/xml.etree.elementtree.html) for details.

You can also try [xml.dom](https://docs.python.org/3/library/xml.dom.html) and [xml.sax](https://docs.python.org/3/library/xml.sax.html) for alternative xml processing libraries. 

## JSON: Javascript Object Notation

JSON has become increasingly the data format of choice for a number of applications outside of the frontend website world. The good thing is that JSON format is very similar in syntax to Python so it will be very easy to understand. 

Python's support of JSON is very straightforward: there is one library that handles json and it's conveniently called "json".

Let's first write our test json file:

In [8]:
json_data = '''{
	"students": {
		
		"John": {
			"grades": [88]
		},

		"Kate": {
			"grades": [93]
		},

		"Harry": {
			"grades": [93]
		},

		"Linda": {
			"grades": [87]
		},

		"Harriet": {
			"grades": [91]
		}
	}
}'''

json_data_file = open('grades.json', 'wt')
json_data_file.write(json_data)
json_data_file.close()

In [9]:
!cat grades.json

{
	"students": {
		
		"John": {
			"grades": [88]
		},

		"Kate": {
			"grades": [93]
		},

		"Harry": {
			"grades": [93]
		},

		"Linda": {
			"grades": [87]
		},

		"Harriet": {
			"grades": [91]
		}
	}
}

Please note: json looks a lot like how you would define a dictionary in Python. Anything with quotes are converted to strings, numbers are converted to int or floats, and brakets are interpreted as lists.

Let's see this in action:

In [2]:
import json
json_data_file = open("grades.json", "rt")
json_data = json.loads(json_data_file.read())
json_data_file.close()

print("root:", json_data)
print()
print("students:", json_data["students"])

root: {'students': {'Harry': {'grades': [93]}, 'Harriet': {'grades': [91]}, 'Linda': {'grades': [87]}, 'John': {'grades': [88]}, 'Kate': {'grades': [93]}}}

students: {'Harry': {'grades': [93]}, 'Harriet': {'grades': [91]}, 'Linda': {'grades': [87]}, 'John': {'grades': [88]}, 'Kate': {'grades': [93]}}


We have now loaded our json file from the file system, and used the json.loads() function to convert that text file into a Python object. As we print both the root json_data and the "students" property, we can see how Python has coverted that json file into a Python dictionary.

We can also convert Python dictionaries into json format and write them to disk as well:

In [11]:
python_dict = {'students': 
                   {'Harriet': {'grades': [91]}, 
                    'John': {'grades': [88]}, 
                    'Kate': {'grades': [93]}, 
                    'Linda': {'grades': [87]}, 
                    'Harry': {'grades': [93]}
                   }
              }
python_dict_json = json.dumps(python_dict)

python_dict_json_file = open("grades_python.json", "wt")
python_dict_json_file.write(python_dict_json)
python_dict_json_file.close()

In [12]:
!cat grades_python.json

{"students": {"Kate": {"grades": [93]}, "Harry": {"grades": [93]}, "Linda": {"grades": [87]}, "Harriet": {"grades": [91]}, "John": {"grades": [88]}}}

We juse the `dumps()` function to convert a python dictionary into a json compliant string. We then write the results to grades_python.json

There are a number of other structured file formats, such as HTML, YAML, and INI files that all have specialized modules that handle the reading and writing to those special format. A rule of thumb is to make sure you search for a previously written module before you try to write such a parser yourself! You will save your self a lot of time.