# Worksheet 6: Smorgasbord

Remember: Google is your friend! Functions are explained in more detail (amongst others) on the Python website, and the
website "Stackoverflow.com" probably has the answer to all of life's (programming related) issues. So if you add "site:stackoverflow.com" to your
Google query, you should be able to find someone who was experiencing a similar issue, and the way to solve it.

I tried to kinda fit a voting theme into this week's Worksheet, as the Provincial States elections are coming up here in the Netherlands at the time that I'm making this. However, at a certain point, my inspiration for exercises on that topic faded away. So there's not really a theme this week. Just a bunch of random exercises. :)

## Opening, modifying, and saving CSV (.csv) files

As explained in the video lecture, CSV files are closely related to Excel files. CSV files are still the standard when working with tabular formatted data with a programming language. File sizes are small, and quick to process, so that is nice. But, annoyingly, Excel (.xls(x)) and .csv files require a different approach when we use Python. Last week, we have looked into opening, modifying, and saving the former. Now we will look into doing the same with the latter. First off: how do we open these files? For this code block, download `Areas_in_the_Netherlands.csv` from Canvas and place this file in the same folder as this worksheet.

In [None]:
## Code block 1: Opening CSV files
with open('./Areas_in_the_Netherlands.csv', mode='r') as f:
    area_list = f.read()

print(area_list)

We have now loaded the CSV file the same way you would load a text file. We get to see all the text, but it is still completely without structure. We get a long string where the records and this record's attributes are not loaded as separate items. So we want to load it in a slightly nicer way. For that we have to use the csv library.

In [None]:
## Code block 2: Opening CSV files (part 2)

import csv

with open('./Areas_in_the_Netherlands.csv', mode='r') as f:
    area_csv = csv.reader(f)

print(area_csv)

This still doesn't quite work. With the csv library, for some strange, unknown reason you have to store the information we read from the csv file in another variable (and do this before closing the file) before you can do anything with that information. So you have to write a little more code.

In [None]:
## Code block 3: Opening CSV files (part 3)

import csv

area_list = []

with open('./Areas_in_the_Netherlands.csv', mode='r') as f:
    area_csv = csv.reader(f)
    for record in area_csv:
        area_list.append(record)

for record in area_list:
    print(record)

Now it works the way we want it to! Remember from last week, a CSV (and therefore also an Excel file) is just a kind of list-of-lists: you have one big list that represents the entire file. And in that big list you will find individual lists that each represent a record. Those record lists consist of several items that represent the attributes.

**[Exercise 1]** Suppose you are the world's biggest fan of the municipality of *Montferland*. You really want to know everything about that---in your opinion---fantastic municipality, including the region name ("region_name" in the header of the CSV file). With the knowledge of the structure of CSV files described above, can you extend code block 3 so that you can find and print the region name of Montferland?

In [None]:
## Exercise 1: Printing the region name of the municipality of Montferland.

import csv

area_list = []

with open('./Areas_in_the_Netherlands.csv', mode='r') as f:
    area_csv = csv.reader(f)
    for record in area_csv:
        area_list.append(record)



**[Exercise 2]** We are going to play *Word Chain*. Word Chain is a word game
in which players come up with words that begin with the letter or letters that the previous word ended with. In this case,
you give the first word using the input() function. The script then finds the last letter of that word and searches the
`Random_Words.csv` worksheet for a word that starts with this last letter.

Take a look at the CSV file, it is a file that contains 150 words: 15 rows of 10 columns each. Every "cell" is a random word. The output should look something like this:

`Input: chris`
`Chained word: sail`

**[Bonus exercise 1]** Try to find the longest word in the csv file that starts with the last letter of the input word.

In [None]:
import csv

## Exercise 2: CSV Word Chain



Okay, so much for opening CSV files. In addition to opening, you sometimes also want to save files as a CSV file. How does saving work then?

In [None]:
## Code block 4: Saving CSV files

import csv

# First, we load Areas_in_the_Netherlands.csv
area_list = []

with open('./Areas_in_the_Netherlands.csv', mode='r') as f:
    area_csv = csv.reader(f)
    for record in area_csv:
        area_list.append(record)

#Then, let's make a variable of the header and the first record (and print it)
area_short = area_list[0:2]
print(area_short)

#And now, let's save our shortened information
with open('./Areas_in_the_Netherlands_Short.csv', mode='w', newline='') as f:
    writer = csv.writer(f)
    #Iterate over every row in the area_short variable
    for row in area_short:
        writer.writerow(row)

When we use the `with open()` function, two things deserve extra attention: when reading the files, we use the attribute `mode='r'`. That tells Python to open the file in read only mode. In that mode no changes can be made to the file. To be able to create a file, we use `mode='w'`. This mode let's us save files. There's also other modes that let us read and save files both, but I never really use that because if I make a mistake and accidentally save over my original file, I'll be sad. So best to avoid that risk as much as possible by using separate modes for both.

We also use `newline=''` when saving a csv file. This is due to a sort-of bug with Python on some computers. On some computers an empty row would be added between each record in Python if we don't use this.

You will now be able to find the saved file in the same folder where you opened this Worksheet. You can also open this in Excel to check whether everything is working properly. If you have opened the file in Excel, select all cells (Ctrl+A). Then go to `Data` at the top of the screen, then you will see `Text to columns` in the buttons drawer, on the right side. Click on that. You choose `Separated`, `Next`. With `Separators`, choose `comma`, and then `Finish`. Now, the CSV data is converted to an Excel-friendly format.

**[Exercise 3]** Suppose you are going around asking all of your friends which party they are going to vote for during the elections. Because you keep going around asking boring questions like that, you only have three friends left. They gave you the following answers:

| Name    | Date asked | Party           |
|---------|------------|-----------------|
| Alfred  | March 9th  | OPA             |
| Barbara | March 9th  | 50PLUS          |
| Roland  | March 7th  | Senioren Belang |

Save this information as a .csv file in the same folder as the Worksheet.

In [None]:
## Exercise 3: Saving your friends' party preferences



## Opening, modifying, and saving JSON (.json) files

Okay, we're almost there. One more data structure to go and it should look familiar to you. We have already worked with dictionaries and lists and JSON files are just that, saved as a file. First, loading the files. For this we use `indy.json`, which you can find on Canvas. Put this file in the same folder as this Worksheet.

In [None]:
## Code block 5: Opening a JSON file
import json

with open('./indy.json', 'r') as infile:
    indy_data = json.load(infile)

And now the file is loaded as a dictionary, to which we can apply our knowledge about dictionaries to work with the file. As said in the video lecture, JSON files are just lists and/or dictionaries in disguise.

If we want to save a JSON file, we can do that as follows.

In [None]:
## Code block 6: Saving a JSON file

# We load the file just like in code block 10
import json

with open('./indy.json', 'r') as infile:
    indy_data = json.load(infile)

# We take the first movie from our JSON-file
indy_first = indy_data['indy movies'][0]
print(indy_first)

#And save this as a new JSON-bestand
with open('./firstindy.json', 'w') as outfile:
    json.dump(indy_first, outfile, separators=(',', ': '), indent=4)

We add some extra information to the `json.dump()` function. First, we specify the information we want to write (the `indy_first` variable), then the file we want to write it to (`outfile`, which actually passes on the information from `'./firstindy.json'`). The `separators` indicate how we want to separate the information. Here you mainly use the `',', ': '` that you see. This keeps the file similar to a dictionary. Finally, we use `indent=4`. This is the indentation that is applied, so that it all remains a little clearer for you as a human being if you want to read the JSON file in Notepad (or a variant thereof).

-------------------------------

**[Exercise 4]** To get in the mood to vote (for the people who are allowed to vote in the Provincial States, if you feel left out, I'm sorry :( ), I have collected the municipality election results from 2018 and have created two JSON files with voter turnout numbers. These JSON files contain the number of votes for 10 random municipalities in the Gelderland province (_Gelderland.json_) and the Noord-Brabant province (*Noord_Brabant.json*). Furthermore, I have only included the results for 4 of the biggest parties in the Netherlands to keep the file somewhat comprehensible: *D66*, *VVD*, *CDA* and *PvdA*.

Write a script that loads the JSON file, and prints the highest, lowest, and average number of votes for each province.
The output should look something like this:

`Gelderland`
`CDA: min 1026, max 7421, average 3022.4`
`VVD: min 955, max 8019, average 3020.6`
`D66: min 710, max 8013, average 2378.7`
`PvdA: min 524, max 5401, average 1844.6`
`Noord-Brabant`
`CDA: min 1065, max 10631, average 4181.0`
`VVD: min 995, max 17503, average 5154.7`
`D66: min 635, max 9918, average 3568.2`
`PvdA: min 563, max 10180, average 2748.5`

**[Bonus exercise 2]** Save the output as a nicely formatted CSV file.

In [None]:
## Exercise 4: Voter turnout numbers
import os
import csv
import json



**[Exercise 5]** For Exercise 2, you have used a CSV file that contains random words (`Random_Words.csv`). Create and save a JSON file where all these words are ordered by their first letter (in alphabetical order).
You can find an example of what the output should look like on Canvas (`Worksheet_6_Exercise_5.json`)

**Hint**: You may start with a dictionary that already contains all the letters in the alphabet, as is shown in the code below (the alphabetdict variable that is commented out).

**[Bonus Exercise 3]** Do not use the alphabetdict, but try to find a clever way to create this dictionary structure from an empty dictionary (and sort it alphabetically). Also, order the words per letter from smallest to largest. You can find an example of what the output should look like on Canvas (`Worksheet_6_Bonus_3.json`)

In [None]:
## Exercise 3: Ordering the Word CSV
import os
import json
import csv

#alphabetdict = {'a': [], 'b': [], 'c': [], 'd': [], 'e': [], 'f': [], 'g': [], 'h': [], 'i': [], 'j': [], 'k': [], 'l': [], 'm': [], 'n': [], 'o': [], 'p': [], 'q': [], 'r': [], 's': [], 't': [], 'u': [], 'v': [], 'w': [], 'x': [], 'y': [], 'z': []}


That is all! If you run into any issues, do not forget to ask about this on the Discussion board, or during the Practical session!

## Saving & Submitting

Jupyter Notebook files save your work automatically. So you can hand in the file that you are currently looking at. If you don't want to take any risks, you can also use "Save As" to save a copy of the notebook. In any case, submit the Worksheet **via Canvas, Assignments**. Submission date: **22 March (23:59)**.

## Skills & Further Resources

This worksheet was about opening, modifying, and saving CSV/JSON files using Python.

After working through it, you should be able to:

1. understand how to open, modify, and save a CSV file
2. understand how to open, modify, and save a JSON file

If you want a bit more information on some of these topics, besides the chapters in Automate the Boring Stuff, I recommend:

- How to Read and Write With CSV Files in Python: Ultimate Guide to Working With CSV Files in Data Science, by Harika Bonthu on [https://www.analyticsvidhya.com/blog/2021/08/python-tutorial-working-with-csv-file-for-data-science/](https://www.analyticsvidhya.com/blog/2021/08/python-tutorial-working-with-csv-file-for-data-science/)
- The easy way to work with CSV, JSON, and XML in Python, by George Seif on: [https://towardsdatascience.com/the-easy-way-to-work-with-csv-json-and-xml-in-python-5056f9325ca9](https://towardsdatascience.com/the-easy-way-to-work-with-csv-json-and-xml-in-python-5056f9325ca9) (if you can't read the text on this website, try to open it in an incognito window).

## Overview of new information

| Python Code                |     does what     |
|----------------------------|:-----------------:|
| `csv.reader(f)`            |  Read a CSV file  |
| `csv.writer()`             | Write a CSV file  |
| `json.load()`              | Read a JSON file  |
| `json.dump()`              | Write a JSON file |