In [1]:
%logstop
%logstart -rtq ~/.logs/pw.py append
import seaborn as sns
sns.set()

In [49]:
from static_grader import grader

In [9]:
%%bash
#pip install ipyparall
#ls -al /home/jovyan/.jupyter/jupyter_notebook_config.json
#jupyter --config-dir
#vi /home/jovyan/.jupyter/jupyter_notebook_config.jhttps://github.com/ipython/ipyparallel.git
#pip install notebook  ipyparallel
#wget https://github.com/ipython/ipyparallel.git
#git clone /home/jovyan/datacourse/data-wrangling/miniprojects/ipyparallel.git
#pip install /home/jovyan/datacourse/data-wrangling/miniprojects/ipyparallel.git
#export PIP_REQUIRE_VIRTUALENV=false
#pip install ipyparallel
#pip install ipyparallel
#pip install ipcluster
#ipcluster nbextension enable
#echo $PATH

# PW Miniproject
## Introduction

The objective of this miniproject is to exercise your ability to use basic Python data structures, define functions, and control program flow. We will be using these concepts to perform some fundamental data wrangling tasks such as joining data sets together, splitting data into groups, and aggregating data into summary statistics.
**Please do not use `pandas` or `numpy` to answer these questions.**

We will be working with medical data from the British NHS on prescription drugs. Since this is real data, it contains many ambiguities that we will need to confront in our analysis. This is commonplace in data science, and is one of the lessons you will learn in this miniproject.

## Downloading the data

We first need to download the data we'll be using from Amazon S3:

In [2]:
!pwd

/home/jovyan/datacourse/data-wrangling/miniprojects


In [123]:
%%bash
mkdir pw-data
wget http://dataincubator-wqu.s3.amazonaws.com/pwdata/201701scripts_sample.json.gz -nc -P ./pw-data
wget http://dataincubator-wqu.s3.amazonaws.com/pwdata/practices.json.gz -nc -P ./pw-data

mkdir: cannot create directory ‘pw-data’: File exists
File ‘./pw-data/201701scripts_sample.json.gz’ already there; not retrieving.

File ‘./pw-data/practices.json.gz’ already there; not retrieving.



## Loading the data

The first step of the project is to read in the data. We will discuss reading and writing various kinds of files later in the course, but the code below should get you started.

In [8]:
import gzip
import simplejson as json

In [9]:
with gzip.open('./pw-data/201701scripts_sample.json.gz', 'rb') as f:
    scripts = json.load(f)

with gzip.open('./pw-data/practices.json.gz', 'rb') as f:
    practices = json.load(f)


This data set comes from Britain's National Health Service. The `scripts` variable is a list of prescriptions issued by NHS doctors. Each prescription is represented by a dictionary with various data fields: `'practice'`, `'bnf_code'`, `'bnf_name'`, `'quantity'`, `'items'`, `'nic'`, and `'act_cost'`. 

In [129]:
scripts[8000:8500]

[{'bnf_code': '0601022B0BBAEAV',
  'items': 2,
  'practice': 'N81632',
  'bnf_name': 'Glucophage SR_Tab 1000mg',
  'nic': 17.04,
  'act_cost': 15.8,
  'quantity': 112},
 {'bnf_code': '0601023ABAAAAAA',
  'items': 3,
  'practice': 'N81632',
  'bnf_name': 'Liraglutide_Inj 6mg/ml 3ml PF Pen',
  'nic': 274.68,
  'act_cost': 254.43,
  'quantity': 7},
 {'bnf_code': '0601023ACAAAAAA',
  'items': 2,
  'practice': 'N81632',
  'bnf_name': 'Saxagliptin_Tab 5mg',
  'nic': 63.2,
  'act_cost': 58.54,
  'quantity': 56},
 {'bnf_code': '0601023AEAAAAAA',
  'items': 12,
  'practice': 'N81632',
  'bnf_name': 'Linagliptin_Tab 5mg',
  'nic': 399.12,
  'act_cost': 369.64,
  'quantity': 336},
 {'bnf_code': '0601023AGAAAAAA',
  'items': 1,
  'practice': 'N81632',
  'bnf_name': 'Dapagliflozin_Tab 5mg',
  'nic': 36.59,
  'act_cost': 33.89,
  'quantity': 28},
 {'bnf_code': '0601023AGAAABAB',
  'items': 32,
  'practice': 'N81632',
  'bnf_name': 'Dapagliflozin_Tab 10mg',
  'nic': 1170.88,
  'act_cost': 1084.34,
  

A [glossary of terms](http://webarchive.nationalarchives.gov.uk/20180328130852tf_/http://content.digital.nhs.uk/media/10686/Download-glossary-of-terms-for-GP-prescribing---presentation-level/pdf/PLP_Presentation_Level_Glossary_April_2015.pdf/) and [FAQ](http://webarchive.nationalarchives.gov.uk/20180328130852tf_/http://content.digital.nhs.uk/media/10048/FAQs-Practice-Level-Prescribingpdf/pdf/PLP_FAQs_April_2015.pdf/) is available from the NHS regarding the data. Below we supply a data dictionary briefly describing what these fields mean.

| Data field |Description|
|:----------:|-----------|
|`'practice'`|Code designating the medical practice issuing the prescription|
|`'bnf_code'`|British National Formulary drug code|
|`'bnf_name'`|British National Formulary drug name|
|`'quantity'`|Number of capsules/quantity of liquid/grams of powder prescribed|
| `'items'`  |Number of refills (e.g. if `'quantity'` is 30 capsules, 3 `'items'` means 3 bottles of 30 capsules)|
|  `'nic'`   |Net ingredient cost|
|`'act_cost'`|Total cost including containers, fees, and discounts|

The `practices` variable is a list of member medical practices of the NHS. Each practice is represented by a dictionary containing identifying information for the medical practice. Most of the data fields are self-explanatory. Notice the values in the `'code'` field of `practices` match the values in the `'practice'` field of `scripts`.

In [6]:
practices[:4]

[{'code': 'A81001',
  'name': 'THE DENSHAM SURGERY',
  'addr_1': 'THE HEALTH CENTRE',
  'addr_2': 'LAWSON STREET',
  'borough': 'STOCKTON ON TEES',
  'village': 'CLEVELAND',
  'post_code': 'TS18 1HU'},
 {'code': 'A81002',
  'name': 'QUEENS PARK MEDICAL CENTRE',
  'addr_1': 'QUEENS PARK MEDICAL CTR',
  'addr_2': 'FARRER STREET',
  'borough': 'STOCKTON ON TEES',
  'village': 'CLEVELAND',
  'post_code': 'TS18 2AW'},
 {'code': 'A81003',
  'name': 'VICTORIA MEDICAL PRACTICE',
  'addr_1': 'THE HEALTH CENTRE',
  'addr_2': 'VICTORIA ROAD',
  'borough': 'HARTLEPOOL',
  'village': 'CLEVELAND',
  'post_code': 'TS26 8DB'},
 {'code': 'A81004',
  'name': 'WOODLANDS ROAD SURGERY',
  'addr_1': '6 WOODLANDS ROAD',
  'addr_2': None,
  'borough': 'MIDDLESBROUGH',
  'village': 'CLEVELAND',
  'post_code': 'TS1 3BE'}]

In the following questions we will ask you to explore this data set. You may need to combine pieces of the data set together in order to answer some questions. Not every element of the data set will be used in answering the questions.

## Question 1: summary_statistics

Our beneficiary data (`scripts`) contains quantitative data on the number of items dispensed (`'items'`), the total quantity of item dispensed (`'quantity'`), the net cost of the ingredients (`'nic'`), and the actual cost to the patient (`'act_cost'`). Whenever working with a new data set, it can be useful to calculate summary statistics to develop a feeling for the volume and character of the data. This makes it easier to spot trends and significant features during further stages of analysis.

Calculate the sum, mean, standard deviation, and quartile statistics for each of these quantities. Format your results for each quantity as a list: `[sum, mean, standard deviation, 1st quartile, median, 3rd quartile]`. We'll create a `tuple` with these lists for each quantity as a final result.

In [22]:

def describe(key):

    total = 0
    avg = 0
    s = 0
    q25 = 0
    med = 0
    q75 = 0
    median=[]
    for i in range(len(scripts)):
        median.append(scripts[i][key])
        total+=scripts[i][key]
    avg=total/len(scripts)
    for i in range(len(scripts)):
        s=s+(scripts[i][key]-avg)**2
    s=(s/len(scripts))**(1/2)
    median.sort()
    if len(median)/2==0:
        med=(median[(len(median)/2)-1]+median[(len(median)/2)])/2
        
    else:
        med=median[(len(median)//2)]
        q25=median[((len(median)//2)//2)+1]
        q75=median[(len(median)//2)+(len(median)//4)+1]
    
    return (total, avg, s, q25, med, q75)
print(describe('items'))

(4410054, 11.522744731217633, 33.11216633980368, 1, 3, 8)


In [23]:
summary = [('items', describe('items')),
           ('quantity', describe('quantity')),
           ('nic', describe('nic')),
           ('act_cost', describe('act_cost'))]

In [24]:
grader.score.pw__summary_statistics(summary)

Your score: 1.000


## Question 2: most_common_item

Often we are not interested only in how the data is distributed in our entire data set, but within particular groups -- for example, how many items of each drug (i.e. `'bnf_name'`) were prescribed? Calculate the total items prescribed for each `'bnf_name'`. What is the most commonly prescribed `'bnf_name'` in our data?

To calculate this, we first need to split our data set into groups corresponding with the different values of `'bnf_name'`. Then we can sum the number of items dispensed within in each group. Finally we can find the largest sum.

We'll use `'bnf_name'` to construct our groups. You should have *5619* unique values for `'bnf_name'`.

In [28]:
bnf_names = []
for i in range(len(scripts)):
    if scripts[i]['bnf_name'] not in bnf_names:
        bnf_names.append(scripts[i]['bnf_name'])
assert(len(bnf_names) == 5619)

We want to construct "groups" identified by `'bnf_name'`, where each group is a collection of prescriptions (i.e. dictionaries from `scripts`). We'll construct a dictionary called `groups`, using `bnf_names` as the keys. We'll represent a group with a `list`, since we can easily append new members to the group. To split our `scripts` into groups by `'bnf_name'`, we should iterate over `scripts`, appending prescription dictionaries to each group as we encounter them.

In [49]:
groups = {name: [] for name in bnf_names}
for script in scripts:
    key=script['bnf_name']
    groups[key].append(script)
    key=[]
#print(groups)

In [None]:
jupyter notebook --NotebookApp.iopub_data_rate_limit=1.0e10



Now that we've constructed our groups we should sum up `'items'` in each group and find the `'bnf_name'` with the largest sum. The result, `max_item`, should have the form `[(bnf_name, item total)]`, e.g. `[('Foobar', 2000)]`.

In [37]:
max_item = [("", 0)]
for key in groups:
    temp=0
    for i in range(len(groups[key])):
        temp+=groups[key][i]['items']
        
    item=[(key,temp)]
    #print(item)
    if temp>max_item[0][1]:
        max_item=item
        
#print(max_item)

**TIP:** If you are getting an error from the grader below, please make sure your answer conforms to the correct format of `[(bnf_name, item total)]`.

In [61]:
grader.score.pw__most_common_item(max_item)

Your score: 1.000


**Challenge:** Write a function that constructs groups as we did above. The function should accept a list of dictionaries (e.g. `scripts` or `practices`) and a tuple of fields to `groupby` (e.g. `('bnf_name',)` or `('bnf_name', 'post_code')`) and returns a dictionary of groups. The following questions will require you to aggregate data in groups, so this could be a useful function for the rest of the miniproject.

In [6]:
def group_by_field(data,fields):
    names={tuple(dict_[field] for field in fields)
           for dict_ in data}
    groups= {name: [] for name in names}
    for dict_ in data:
        name = tuple(dict_[field] for field in fields)
        groups[name].append(dict_)
    return groups

In [38]:
most_groups_by_bnf=[]
groups = group_by_field(scripts, ('bnf_name',))
test_max_item = [('',0)]
for key in groups:
    temp=0
    for i in range(len(groups[key])):
        temp+=groups[key][i]['items']
        
    item=[(key,temp)]
    most_groups_by_bnf.append((key,temp))
    #print(item)
    if temp>test_max_item[0][1]:
        test_max_item=item
print(test_max_item)
most_groups_by_bnf.sort()
assert test_max_item == max_item

[(('Omeprazole_Cap E/C 20mg',), 113826)]


## Question 3: postal_totals

Our data set is broken up among different files. This is typical for tabular data to reduce redundancy. Each table typically contains data about a particular type of event, processes, or physical object. Data on prescriptions and medical practices are in separate files in our case. If we want to find the total items prescribed in each postal code, we will have to _join_ our prescription data (`scripts`) to our clinic data (`practices`).

Find the total items prescribed in each postal code, representing the results as a list of tuples `(post code, total items prescribed)`. Sort your results ascending alphabetically by post code and take only results from the first 100 post codes. Only include post codes if there is at least one prescription from a practice in that post code.

**NOTE:** Some practices have multiple postal codes associated with them. Use the alphabetically first postal code.

We can join `scripts` and `practices` based on the fact that `'practice'` in `scripts` matches `'code'` in `practices`. However, we must first deal with the repeated values of `'code'` in `practices`. We want the alphabetically first postal codes.

In [12]:
practice_postal = {}
for practice in practices:
    if practice['code'] in practice_postal:
        if practice_postal[practice['code']]>practice['post_code']:
            practice_postal[practice['code']] = practice['post_code']
            print(practice_postal[practice['code']])
    else:
        practice_postal[practice['code']] = practice['post_code']
        
print(len(practice_postal))

MK45 1DW
L24 2SP
S64 5UP
HD2 1GQ
TR1 2JA
TR1 2JA
HD2 1GQ
HU1 3SA
LS17 5DT
LS17 5DT
LN2 2JP
CM3 4EY
GU17 0DB
PO7 7AH
RG28 7AE
BS10 6SP
BS10 6SP
EX20 3JT
ST19 9BQ
HU3 4AE
S1 2PJ
FY4 4EW
OL6 9RW
DN34 4GB
N19 5BZ
PR5 6JD
PO1 2GJ
GU9 9QL
GU9 9QL
GU9 9QL
GU9 9QL
GU9 9QL
GU9 9QL
GU9 9QL
GU9 9QL
GU9 9QL
NE12 8EW
LA11 7DJ
HU13 0RG
PL5 1PL
M7 4NX
WA10 2BD
WR1 2AE
HU17 0HB
YO24 4HD
YO24 4HD
NR31 6QB
NW9 9AD
TQ3 2EZ
BB18 6QT
BB1 4LA
B6 5RQ
WA1 4LS
SE18 6PZ
OX3 7JX
LE11 2TZ
DY10 1PE
RG40 1XJ
DN32 0QE
WA3 2AP
DN32 7DJ
SS4 1RB
M11 3BB
LU7 1LB
RM18 8SD
TN28 8ER
CT1 1WL
PR1 6LL
WA1 1UQ
DL3 6HX
ME7 4PN
TW3 3EL
LA14 3HY
TS25 1QU
CA1 1DG
CA1 1DG
SG1 3QA
RM14 2YN
SW11 1JD
CV37 6HJ
CV4 9PL
KT6 6EZ
KT2 7AZ
BS16 1EQ
NE12 8EU
L35 5DR
BN11 2AA
LE11 5DX
L20 3BG
CA17 4AY
BR6 0DP
B20 2ES
LS11 5LQ
TN23 1LQ
TW1 4HF
SR6 0AB
NE38 7NQ
NE38 7NQ
L31 0DJ
IP1 1HE
E8 3SG
BH10 4BX
SO15 3UA
PO3 5AF
B77 2ED
WA2 7LY
L36 0UB
DN17 1BS
BB1 2FD
BB1 2FD
BB1 2FD
B49 6QR
BB1 2FD
CM1 7ET
NG17 1ES
LS10 2AY
BL3 6PY
BB1 2FD
NR32 1DE
BB11 

In [10]:
practice_postal_=group_by_field(practices,('code',))
print(len(practice_postal_))

10843


**Challenge:** This is an aggregation of the practice data grouped by practice codes. Write an alternative implementation of the above cell using the `group_by_field` function you defined previously.

In [13]:
assert practice_postal['K82019'] == 'HP21 8TR'

Now we can join `practice_postal` to `scripts`.

In [14]:
joined = scripts
for i in range(len(joined)):
    joined[i]['post_code']=practice_postal[joined[i]['practice']]
    #script['post_code'] = practice_postal[script['practice']]
joined[9900]

{'bnf_code': '21010900703',
 'items': 1,
 'practice': 'N81010',
 'bnf_name': 'CarePoint Needles Pen Inj Screw On 4mm',
 'nic': 9.95,
 'act_cost': 9.22,
 'quantity': 100,
 'post_code': 'CW5 5NX'}

In [29]:
assert practice_postal['K82019'] == 'HP21 8TR'

In [189]:
!cd /home/jovyan/
!pwd


/home/jovyan/datacourse/data-wrangling/miniprojects


Finally we'll group the prescription dictionaries in `joined` by `'post_code'` and sum up the items prescribed in each group, as we did in the previous question.

In [22]:
#items_by_post = ...
items_by_post=group_by_field(joined,('post_code',))

In [26]:
postal_total = []

for postitems in items_by_post:
    temp=0
    for i in range(len(items_by_post[postitems])):
        temp+=items_by_post[postitems][i]['items']
    
    postal_total.append((postitems[0],temp))
postal_total.sort()
print(postal_total[:10])
postal_totals=postal_total[:100]
grader.score.pw__postal_totals(postal_totals)

[('B11 4BW', 20673), ('B18 7AL', 19001), ('B21 9RY', 29103), ('B23 6DJ', 24859), ('B70 7AW', 36531), ('BB11 2DL', 34356), ('BB2 1AX', 28254), ('BB3 1PY', 54514), ('BB4 5SL', 29388), ('BB7 2JG', 44585)]
Your score: 1.000


## Question 4: items_by_region

Now we'll combine the techniques we've developed to answer a more complex question. Find the most commonly dispensed item in each postal code, representing the results as a list of tuples (`post_code`, `bnf_name`, amount dispensed as proportion of total). Sort your results ascending alphabetically by post code and take only results from the first 100 post codes.

**NOTE:** We'll continue to use the `joined` variable we created before, where we've chosen the alphabetically first postal code for each practice. Additionally, some postal codes will have multiple `'bnf_name'` with the same number of items prescribed for the maximum. In this case, we'll take the alphabetically first `'bnf_name'`.

There are several approaches to solve this problem but we will guide you through one of them. Feel free to solve it your own way if it is easier for you to understand and implement. If your kernel keeps on dying, it's probably an indication that you are running out of memory. Consider deleting objects you don't need anymore using the `del` statement and shutdown any other running notebooks. For example:
```Python
del some_object_not_needed
```

The first step is to calculate the total items for each `'post_code'` and `'bnf_name'`. Let's call that result `total_items_by_post_bnf`. Consider what is the best data structure(s) to represent `total_items_by_post_bnf`. It should have 141196 `('post_code', 'bnf_name')` groups.

In [30]:
total_items_by_post_bnf = {}
for key, group in list(group_by_field(joined, ('post_code', 'bnf_name')).items()):
    items_total=sum(d['items'] for d in group)
    total_items_by_post_bnf[key]=items_total 
assert len(total_items_by_post_bnf) == 141196
#print(total_items_by_post_bnf)

In [31]:
total_items = []
for (post_code, bnf_name), total in list(total_items_by_post_bnf.items()):
        new_dict = {'post_code' : post_code,
                    'bnf_name' : bnf_name,
                    'total' : total}
        total_items.append(new_dict)

Next, let's take `total_items_by_post_bnf` and group it by `'post_code'`. In other words, we want a  data structure that maps a `'post_code'` to a list of all records that belong to that `'post_code'`. There should be 118 groups.

In [35]:
total_items_by_post=group_by_field(total_items, ('post_code',))
assert len(total_items_by_post) == 118
print(total_items_by_post.keys())

dict_keys([('BB3 1PY',), ('L36 7XY',), ('BL3 5HP',), ('SE1 6JP',), ('TS1 2NX',), ('NW10 8RY',), ('WN7 2PE',), ('SS13 3HQ',), ('M30 0NU',), ('WS10 8SY',), ('BB8 0JZ',), ('CW1 3AW',), ('DN34 4GB',), ('TS23 2DG',), ('CH1 4DS',), ('RM3 9SU',), ('N9 7HD',), ('KT6 6EZ',), ('WN2 5NG',), ('M26 2SP',), ('M11 4EJ',), ('NE38 7NQ',), ('TS24 7PW',), ('CH65 6TG',), ('WS3 3JP',), ('W10 6DZ',), ('S63 9EH',), ('OL4 1YN',), ('DA1 2HA',), ('TN24 0GP',), ('NE10 9QG',), ('FY7 8GU',), ('SR5 2LT',), ('NE24 1DX',), ('SS8 0JA',), ('TS10 4NW',), ('S65 1DA',), ('ST3 6AB',), ('FY4 1TJ',), ('GL1 3PX',), ('HA3 7LT',), ('CV1 4FS',), ('FY2 0JG',), ('WD18 0JP',), ('WN3 5HL',), ('DA11 8BZ',), ('SE15 5LJ',), ('BB9 7SR',), ('YO16 4LZ',), ('NE37 2PU',), ('S74 9AF',), ('SS0 7AF',), ('BB2 1AX',), ('BB7 2JG',), ('BL1 8TU',), ('B18 7AL',), ('FY5 2TZ',), ('GU9 9QS',), ('LE5 3GH',), ('SK6 1ND',), ('B70 7AW',), ('WA7 1AB',), ('GL50 4DP',), ('L31 0DJ',), ('NN16 8DN',), ('NG7 3GW',), ('OL11 1DN',), ('SM6 0HY',), ('PL7 1AD',), ('HA

In [39]:
items_by_post=group_by_field(joined, ('post_code',))
print(items_by_post.keys())

dict_keys([('BB3 1PY',), ('L36 7XY',), ('BL3 5HP',), ('SE1 6JP',), ('TS1 2NX',), ('NW10 8RY',), ('WN7 2PE',), ('SS13 3HQ',), ('M30 0NU',), ('WS10 8SY',), ('BB8 0JZ',), ('CW1 3AW',), ('DN34 4GB',), ('TS23 2DG',), ('CH1 4DS',), ('N9 7HD',), ('RM3 9SU',), ('KT6 6EZ',), ('WN2 5NG',), ('M26 2SP',), ('M11 4EJ',), ('NE38 7NQ',), ('TS24 7PW',), ('CH65 6TG',), ('WS3 3JP',), ('W10 6DZ',), ('S63 9EH',), ('OL4 1YN',), ('TN24 0GP',), ('DA1 2HA',), ('NE10 9QG',), ('FY7 8GU',), ('SR5 2LT',), ('NE24 1DX',), ('SS8 0JA',), ('TS10 4NW',), ('S65 1DA',), ('ST3 6AB',), ('FY4 1TJ',), ('GL1 3PX',), ('HA3 7LT',), ('CV1 4FS',), ('FY2 0JG',), ('WD18 0JP',), ('WN3 5HL',), ('SE15 5LJ',), ('DA11 8BZ',), ('BB9 7SR',), ('NE37 2PU',), ('YO16 4LZ',), ('S74 9AF',), ('SS0 7AF',), ('BB2 1AX',), ('BB7 2JG',), ('BL1 8TU',), ('B18 7AL',), ('FY5 2TZ',), ('GU9 9QS',), ('LE5 3GH',), ('SK6 1ND',), ('B70 7AW',), ('WA7 1AB',), ('GL50 4DP',), ('L31 0DJ',), ('NN16 8DN',), ('NG7 3GW',), ('OL11 1DN',), ('SM6 0HY',), ('PL7 1AD',), ('HA

In [34]:
total_items = []
for (post_code, bnf_name), total in list(total_items_by_post_bnf.items()):
        new_dict = {'post_code' : post_code,
                    'bnf_name' : bnf_name,
                    'total' : total}
        total_items.append(new_dict)

In [36]:
from operator import itemgetter
get_total = itemgetter('total')
max_item_by_post = []
groups = list(total_items_by_post.values())
for group in groups:
    #print(max_total)
    max_total = sorted(group, key=itemgetter('total'), reverse=True)[0]
    max_item_by_post.append(max_total)

Now with `grouped_post_code`, let's iterate over each group and calculate the following fields for each `'post_code'`:
1. the sum of total items for all `'bnf_name'`
1. the most total items
1. the `'bnf_name'` that had the most total items

Once again, consider the best data structure(s) to use to represent the result. It may help to write and use a function when developing your solution.

In [150]:
max_item_by_post = [sorted(group, key=itemgetter('total'), reverse=True)[0]
                    for group in list(total_items_by_post.values())]
print(max_item_by_post[1])

{'post_code': 'BB2 1AX', 'bnf_name': 'Omeprazole_Cap E/C 20mg', 'total': 1030}


In [47]:
items_by_region = [('B11 4BW', 'Salbutamol_Inha 100mcg (200 D) CFF', 0.0341508247)] * 100
items_by_region

[('B11 4BW', 'Salbutamol_Inha 100mcg (200 D) CFF', 0.0341508247),
 ('B11 4BW', 'Salbutamol_Inha 100mcg (200 D) CFF', 0.0341508247),
 ('B11 4BW', 'Salbutamol_Inha 100mcg (200 D) CFF', 0.0341508247),
 ('B11 4BW', 'Salbutamol_Inha 100mcg (200 D) CFF', 0.0341508247),
 ('B11 4BW', 'Salbutamol_Inha 100mcg (200 D) CFF', 0.0341508247),
 ('B11 4BW', 'Salbutamol_Inha 100mcg (200 D) CFF', 0.0341508247),
 ('B11 4BW', 'Salbutamol_Inha 100mcg (200 D) CFF', 0.0341508247),
 ('B11 4BW', 'Salbutamol_Inha 100mcg (200 D) CFF', 0.0341508247),
 ('B11 4BW', 'Salbutamol_Inha 100mcg (200 D) CFF', 0.0341508247),
 ('B11 4BW', 'Salbutamol_Inha 100mcg (200 D) CFF', 0.0341508247),
 ('B11 4BW', 'Salbutamol_Inha 100mcg (200 D) CFF', 0.0341508247),
 ('B11 4BW', 'Salbutamol_Inha 100mcg (200 D) CFF', 0.0341508247),
 ('B11 4BW', 'Salbutamol_Inha 100mcg (200 D) CFF', 0.0341508247),
 ('B11 4BW', 'Salbutamol_Inha 100mcg (200 D) CFF', 0.0341508247),
 ('B11 4BW', 'Salbutamol_Inha 100mcg (200 D) CFF', 0.0341508247),
 ('B11 4BW

Now, we are ready to:
1. calculate the ratio (the amount dispensed as proportion of total)
1. [sort](https://docs.python.org/3/howto/sorting.html) alphabetically by the post code
1. format the answer as a list of tuples
1. take only the first 100 tuples
1. submit to the grader

In [48]:
item_by_region=[]
items_by_region=[]
for item in max_item_by_post:
    i=0
    numerator= item['total']
    #print(total_items_by_post[(item['post_code'],)][i]['total'])
    denominator=(items_by_post[(item['post_code'],)][i]['items'])
    #denominator=dict(items_by_post[item[key,]]['items'])
    proportion=numerator/denominator
    result=(item['post_code'], item['bnf_name'], proportion)
    item_by_region.append(result)
    i+=1
item_by_region.sort()

for i in range(100):
    items_by_region.append(item_by_region[i])
    
    
items_by_region   


[('B11 4BW', 'Salbutamol_Inha 100mcg (200 D) CFF', 100.85714285714286),
 ('B18 7AL', 'Salbutamol_Inha 100mcg (200 D) CFF', 278.0),
 ('B21 9RY', 'Metformin HCl_Tab 500mg', 1033.0),
 ('B23 6DJ', 'Lansoprazole_Cap 30mg (E/C Gran)', 119.8),
 ('B70 7AW', 'Paracet_Tab 500mg', 975.0),
 ('BB11 2DL', 'Omeprazole_Cap E/C 20mg', 991.0),
 ('BB2 1AX', 'Omeprazole_Cap E/C 20mg', 1030.0),
 ('BB3 1PY', 'Omeprazole_Cap E/C 20mg', 1869.0),
 ('BB4 5SL', 'Omeprazole_Cap E/C 20mg', 1196.0),
 ('BB7 2JG', 'Omeprazole_Cap E/C 20mg', 438.0),
 ('BB8 0JZ', 'Atorvastatin_Tab 20mg', 1227.0),
 ('BB9 7SR', 'Omeprazole_Cap E/C 20mg', 182.2),
 ('BD3 8QH', 'Atorvastatin_Tab 40mg', 719.0),
 ('BH18 8EE', 'Omeprazole_Cap E/C 20mg', 571.5),
 ('BH23 3AF', 'Omeprazole_Cap E/C 20mg', 1215.0),
 ('BL1 8TU', 'Omeprazole_Cap E/C 20mg', 404.5),
 ('BL3 5HP', 'Omeprazole_Cap E/C 20mg', 456.0),
 ('BL9 0NJ', 'Omeprazole_Cap E/C 20mg', 359.3333333333333),
 ('BL9 0SN', 'Omeprazole_Cap E/C 20mg', 1149.0),
 ('CB9 8HF', 'Omeprazole_Cap E/C

In [50]:
grader.score.pw__items_by_region(items_by_region)

Your score: 0.000


*Copyright &copy; 2021 WorldQuant University. This content is licensed solely for personal use. Redistribution or publication of this material is strictly prohibited.*