## Introduction

The objective of this exercise is to check your ability to use basic Python Data Structures, control program flow and, define and use functions


We will be using the [EU referendum results data](https://www.electoralcommission.org.uk/who-we-are-and-what-we-do/elections-and-referendums/past-elections-and-referendums/eu-referendum/results-and-turnout-eu-referendum). 

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/c/c4/2016_EU_Referendum_Ballot_Paper.svg/1280px-2016_EU_Referendum_Ballot_Paper.svg.png" width="450"/>



For this exercise, we have split the full data into three files. 

```uk_demo.json```

| Field       | Description                    |
|-------------|--------------------------------|
| ```Area_Code```   | Unique identifier for the area  |
| ```Area```        | Area Name               |
| ```Region_Code``` | Unique identifier for the region |
| ```Region```      | Region Name             |
| ```Electorate```  | Number of registered voters in the area   |

```uk_results.json```

| Field       | Description                            |
|-------------|----------------------------------------|
| ```Area_Code```   | Unique identifier for an area          |
| ```Votes_Cast```  | Number of Votes Cast in the area       |
| ```Valid_Votes``` | Number of Valid Votes cast in the area |
| ```Remain```      | Number of votes for Remain             |
| ```Leave```       | Number of votes for Leave              |

```uk_rejected_ballots.json```

| Field                         | Description                                                            |
|-------------------------------|------------------------------------------------------------------------|
| ```Area_Code```               | Unique identifier for an area                                          |
| ```Rejected_Ballots```        | Number of Rejected Ballots in the Area                                 |
| ```No_official_mark```        | Number of ballots rejected because they did not have any offical mark  |
| ```Voting_for_both_answers``` | Number of ballots rejected because they voted for both answers         |
| ```Writing_or_mark```         | Number of ballots rejected because they had a writing instead of check |

### Without using the Pandas / NumPy Libraries

### Q1. Load the data

In [1]:
import requests

In [2]:
uk_demo_src = r"https://raw.githubusercontent.com/viveknest/statascratch-solutions/main/UK%20Referendum%20Data/uk_demo.json"
f = requests.get(uk_demo_src)
uk_demo = f.json()
uk_demo[:5]

[{'Area_Code': 'E06000031',
  'Area': 'Peterborough',
  'Region_Code': 'E12000006',
  'Region': 'East',
  'Electorate': 120892},
 {'Area_Code': 'E06000032',
  'Area': 'Luton',
  'Region_Code': 'E12000006',
  'Region': 'East',
  'Electorate': 127612},
 {'Area_Code': 'E06000033',
  'Area': 'Southend-on-Sea',
  'Region_Code': 'E12000006',
  'Region': 'East',
  'Electorate': 128856},
 {'Area_Code': 'E06000034',
  'Area': 'Thurrock',
  'Region_Code': 'E12000006',
  'Region': 'East',
  'Electorate': 109897},
 {'Area_Code': 'E06000055',
  'Area': 'Bedford',
  'Region_Code': 'E12000006',
  'Region': 'East',
  'Electorate': 119530}]

In [3]:
uk_results_src = r"https://raw.githubusercontent.com/viveknest/statascratch-solutions/main/UK%20Referendum%20Data/uk_results.json"
f = requests.get(uk_results_src)
uk_results = f.json()
uk_results[:5]

[{'Area_Code': 'E06000031',
  'Votes_Cast': 87469,
  'Valid_Votes': 87392,
  'Remain': 34176,
  'Leave': 53216},
 {'Area_Code': 'E06000032',
  'Votes_Cast': 84616,
  'Valid_Votes': 84481,
  'Remain': 36708,
  'Leave': 47773},
 {'Area_Code': 'E06000033',
  'Votes_Cast': 93939,
  'Valid_Votes': 93870,
  'Remain': 39348,
  'Leave': 54522},
 {'Area_Code': 'E06000034',
  'Votes_Cast': 79950,
  'Valid_Votes': 79916,
  'Remain': 22151,
  'Leave': 57765},
 {'Area_Code': 'E06000055',
  'Votes_Cast': 86135,
  'Valid_Votes': 86066,
  'Remain': 41497,
  'Leave': 44569}]

In [4]:
uk_rejected_src = r"https://github.com/viveknest/statascratch-solutions/raw/main/UK%20Referendum%20Data/uk_rejected_ballots.json"
f = requests.get(uk_rejected_src)
uk_rejected = f.json()
uk_rejected[:5]

[{'Area_Code': 'E06000031',
  'Rejected_Ballots': 77,
  'No_official_mark': 0,
  'Voting_for_both_answers': 32,
  'Writing_or_mark': 7,
  'Unmarked_or_void': 38},
 {'Area_Code': 'E06000032',
  'Rejected_Ballots': 135,
  'No_official_mark': 0,
  'Voting_for_both_answers': 85,
  'Writing_or_mark': 0,
  'Unmarked_or_void': 50},
 {'Area_Code': 'E06000033',
  'Rejected_Ballots': 69,
  'No_official_mark': 0,
  'Voting_for_both_answers': 21,
  'Writing_or_mark': 0,
  'Unmarked_or_void': 48},
 {'Area_Code': 'E06000034',
  'Rejected_Ballots': 34,
  'No_official_mark': 0,
  'Voting_for_both_answers': 8,
  'Writing_or_mark': 3,
  'Unmarked_or_void': 23},
 {'Area_Code': 'E06000055',
  'Rejected_Ballots': 69,
  'No_official_mark': 0,
  'Voting_for_both_answers': 26,
  'Writing_or_mark': 1,
  'Unmarked_or_void': 42}]

### Q2. Join the data

Consolidate the three lists into a single list by joining on the Area Code

In [5]:
# Convert lists into a dictionary of dictionaries
def dict_convert(data, key = 'Area_Code'):
    out_dict = {}
    for row in data:
        out_dict.update({
            row[key]: row
        })
    return out_dict


In [6]:
uk_demo_dict = dict_convert(uk_demo)
uk_demo_dict

{'E06000031': {'Area_Code': 'E06000031',
  'Area': 'Peterborough',
  'Region_Code': 'E12000006',
  'Region': 'East',
  'Electorate': 120892},
 'E06000032': {'Area_Code': 'E06000032',
  'Area': 'Luton',
  'Region_Code': 'E12000006',
  'Region': 'East',
  'Electorate': 127612},
 'E06000033': {'Area_Code': 'E06000033',
  'Area': 'Southend-on-Sea',
  'Region_Code': 'E12000006',
  'Region': 'East',
  'Electorate': 128856},
 'E06000034': {'Area_Code': 'E06000034',
  'Area': 'Thurrock',
  'Region_Code': 'E12000006',
  'Region': 'East',
  'Electorate': 109897},
 'E06000055': {'Area_Code': 'E06000055',
  'Area': 'Bedford',
  'Region_Code': 'E12000006',
  'Region': 'East',
  'Electorate': 119530},
 'E06000056': {'Area_Code': 'E06000056',
  'Area': 'Central Bedfordshire',
  'Region_Code': 'E12000006',
  'Region': 'East',
  'Electorate': 204004},
 'E07000008': {'Area_Code': 'E07000008',
  'Area': 'Cambridge',
  'Region_Code': 'E12000006',
  'Region': 'East',
  'Electorate': 80108},
 'E07000009': {

In [7]:
uk_results_dict = dict_convert(uk_results)
uk_rejected_dict = dict_convert(uk_rejected)

In [8]:
merged_dict = {}

for key in uk_demo_dict.keys():
    merged_dict.update({
        key: {**uk_demo_dict[key], **uk_results_dict[key], **uk_rejected_dict[key]}
    })
merged_dict

{'E06000031': {'Area_Code': 'E06000031',
  'Area': 'Peterborough',
  'Region_Code': 'E12000006',
  'Region': 'East',
  'Electorate': 120892,
  'Votes_Cast': 87469,
  'Valid_Votes': 87392,
  'Remain': 34176,
  'Leave': 53216,
  'Rejected_Ballots': 77,
  'No_official_mark': 0,
  'Voting_for_both_answers': 32,
  'Writing_or_mark': 7,
  'Unmarked_or_void': 38},
 'E06000032': {'Area_Code': 'E06000032',
  'Area': 'Luton',
  'Region_Code': 'E12000006',
  'Region': 'East',
  'Electorate': 127612,
  'Votes_Cast': 84616,
  'Valid_Votes': 84481,
  'Remain': 36708,
  'Leave': 47773,
  'Rejected_Ballots': 135,
  'No_official_mark': 0,
  'Voting_for_both_answers': 85,
  'Writing_or_mark': 0,
  'Unmarked_or_void': 50},
 'E06000033': {'Area_Code': 'E06000033',
  'Area': 'Southend-on-Sea',
  'Region_Code': 'E12000006',
  'Region': 'East',
  'Electorate': 128856,
  'Votes_Cast': 93939,
  'Valid_Votes': 93870,
  'Remain': 39348,
  'Leave': 54522,
  'Rejected_Ballots': 69,
  'No_official_mark': 0,
  'Voti

In [9]:
{**uk_demo_dict, **uk_results_dict}

{'E06000031': {'Area_Code': 'E06000031',
  'Votes_Cast': 87469,
  'Valid_Votes': 87392,
  'Remain': 34176,
  'Leave': 53216},
 'E06000032': {'Area_Code': 'E06000032',
  'Votes_Cast': 84616,
  'Valid_Votes': 84481,
  'Remain': 36708,
  'Leave': 47773},
 'E06000033': {'Area_Code': 'E06000033',
  'Votes_Cast': 93939,
  'Valid_Votes': 93870,
  'Remain': 39348,
  'Leave': 54522},
 'E06000034': {'Area_Code': 'E06000034',
  'Votes_Cast': 79950,
  'Valid_Votes': 79916,
  'Remain': 22151,
  'Leave': 57765},
 'E06000055': {'Area_Code': 'E06000055',
  'Votes_Cast': 86135,
  'Valid_Votes': 86066,
  'Remain': 41497,
  'Leave': 44569},
 'E06000056': {'Area_Code': 'E06000056',
  'Votes_Cast': 158894,
  'Valid_Votes': 158804,
  'Remain': 69670,
  'Leave': 89134},
 'E07000008': {'Area_Code': 'E07000008',
  'Votes_Cast': 57852,
  'Valid_Votes': 57799,
  'Remain': 42682,
  'Leave': 15117},
 'E07000009': {'Area_Code': 'E07000009',
  'Votes_Cast': 48124,
  'Valid_Votes': 48086,
  'Remain': 23599,
  'Leave'

In [10]:
merged_list = [v for k,v in merged_dict.items()]
merged_list

[{'Area_Code': 'E06000031',
  'Area': 'Peterborough',
  'Region_Code': 'E12000006',
  'Region': 'East',
  'Electorate': 120892,
  'Votes_Cast': 87469,
  'Valid_Votes': 87392,
  'Remain': 34176,
  'Leave': 53216,
  'Rejected_Ballots': 77,
  'No_official_mark': 0,
  'Voting_for_both_answers': 32,
  'Writing_or_mark': 7,
  'Unmarked_or_void': 38},
 {'Area_Code': 'E06000032',
  'Area': 'Luton',
  'Region_Code': 'E12000006',
  'Region': 'East',
  'Electorate': 127612,
  'Votes_Cast': 84616,
  'Valid_Votes': 84481,
  'Remain': 36708,
  'Leave': 47773,
  'Rejected_Ballots': 135,
  'No_official_mark': 0,
  'Voting_for_both_answers': 85,
  'Writing_or_mark': 0,
  'Unmarked_or_void': 50},
 {'Area_Code': 'E06000033',
  'Area': 'Southend-on-Sea',
  'Region_Code': 'E12000006',
  'Region': 'East',
  'Electorate': 128856,
  'Votes_Cast': 93939,
  'Valid_Votes': 93870,
  'Remain': 39348,
  'Leave': 54522,
  'Rejected_Ballots': 69,
  'No_official_mark': 0,
  'Voting_for_both_answers': 21,
  'Writing_or

### Q3. Summary Statistics

Find the maximum, minimum, median, 25th and 75th percentile values for 
- Electorate
- Number of Votes Cast
- Number of Valid Votes
- Number of Remain Votes
- Number of Leave Votes
- Number of Rejected Votes


In [11]:
def summ_stats(source_data, variable):
    # Get the variable data
    out_list = []
    for k,v in source_data.items():
        out_list.append(v[variable])
    out_list.sort()
    max_value = max(out_list)
    min_value = min(out_list)
    num_items = len(out_list)
    # Median Value
    if num_items //2 == num_items / 2:
        mid_value1 = num_items //2 - 1
        mid_value2 = num_items //2 
        median_value = (out_list[mid_value1] + out_list[mid_value2]) / 2
    else:
        mid_value = (num_items + 1) // 2 - 1
        median_value = out_list[mid_value]
    
    # Quartiles, using the nearest rank method
    q1_rank = (num_items * 0.25).__ceil__() - 1
    q3_rank = (num_items * 0.75).__ceil__() - 1
    
    q1 = out_list[q1_rank]
    q3 = out_list[q3_rank]
    
    out_dict = {
        'N': num_items
        , 'Max': max_value
        , 'Min': min_value
        , 'Median' : median_value
        , 'q1' : q1
        , 'q3' : q3
    }
        
    return out_dict

summ_stats(merged_dict, 'Electorate')

{'N': 382,
 'Max': 1260955,
 'Min': 1799,
 'Median': 96425.5,
 'q1': 72487,
 'q3': 141486}

In [12]:
summ_stats(merged_dict, 'Votes_Cast')

{'N': 382,
 'Max': 790523,
 'Min': 1424,
 'Median': 72544.5,
 'q1': 54864,
 'q3': 104809}

In [13]:
summ_stats(merged_dict, 'Valid_Votes')

{'N': 382,
 'Max': 790149,
 'Min': 1424,
 'Median': 72511.5,
 'q1': 54833,
 'q3': 104699}

In [14]:
summ_stats(merged_dict, 'Remain')

{'N': 382,
 'Max': 440707,
 'Min': 803,
 'Median': 33475.0,
 'q1': 23515,
 'q3': 48257}

In [15]:
summ_stats(merged_dict, 'Leave')

{'N': 382,
 'Max': 349442,
 'Min': 621,
 'Median': 37573.5,
 'q1': 28631,
 'q3': 54198}

In [16]:
summ_stats(merged_dict, 'Rejected_Ballots')

{'N': 382, 'Max': 614, 'Min': 0, 'Median': 46.5, 'q1': 33, 'q3': 74}

### Q4. Top and Bottom Areas

- Find the Area with the highest and lowest Electorates
- Find the Area with the highest and lowest Remain Voters
- Find the Area with the highest and lowest Leave Voters


In [17]:
sorted(merged_list, key = lambda x:x['Electorate'])[0]['Area']

'Isles of Scilly'

In [18]:
sorted(merged_list, key = lambda x:x['Electorate'])[-1]['Area']

'Northern Ireland'

### Q5. Region-wise Totals

Aggregate the values at region level


In [19]:
regions = set([area['Region'] for area in merged_list])
regions

{'East',
 'East Midlands',
 'London',
 'North East',
 'North West',
 'Northern Ireland',
 'Scotland',
 'South East',
 'South West',
 'Wales',
 'West Midlands',
 'Yorkshire and The Humber'}

In [20]:
region_dict = {}
for region in regions:
    reg_sum = sum([area['Electorate'] for area in merged_list if area['Region'] == region])
    region_dict.update({region: reg_sum})
    
region_dict

{'London': 5424768,
 'Wales': 2270272,
 'West Midlands': 4116572,
 'North West': 5241568,
 'Scotland': 3987112,
 'Northern Ireland': 1260955,
 'South East': 6465404,
 'East': 4398796,
 'East Midlands': 3384299,
 'South West': 4138134,
 'Yorkshire and The Humber': 3877780,
 'North East': 1934341}

### Q6. Contribution

Find the Contribution of each region to the total

In [21]:
total_val = sum([reg[1] for reg in region_dict.items()])
cont_dict = {}
for k,v in region_dict.items():
    cont_dict.update({k: v/total_val * 100})
cont_dict

{'London': 11.66616749105016,
 'Wales': 4.882305271348273,
 'West Midlands': 8.852842820368972,
 'North West': 11.272189004899161,
 'Scotland': 8.574434224205715,
 'Northern Ireland': 2.711731124478901,
 'South East': 13.904094324643133,
 'East': 9.459776140649975,
 'East Midlands': 7.278062209073931,
 'South West': 8.899212711844887,
 'Yorkshire and The Humber': 8.339311648616954,
 'North East': 4.159873028819935}