# Workbook 7.0 - Pandas Dataframes

Pandas is a package that allows you to work with databases in Python:

In [201]:
import pandas as pd
import numpy as np
import random as rn

The basic object in Pandas is the **Dataframe** (basically a table).

There are various ways to create a dataframe, but the simplest is to use dictionaries to define the columns:

In [202]:
tutors          = ['Dennett','Kandt','Arcaute','Wise']
modules         = ['BENVGSA3','BENVGSC1','BENVGSC5','BENVGACH']
student_numbers = [rn.randint(20,90) for i in range(4)]
is_optional     = [False,False,False,True]
terms           = [1,1,2,1]

casa_modules = pd.DataFrame({'tutor' : tutors,
                             'module' : modules,
                             'num_students' : student_numbers,
                             'is_optional' : is_optional,
                             'term' : terms})

casa_modules

Unnamed: 0,is_optional,module,num_students,term,tutor
0,False,BENVGSA3,89,1,Dennett
1,False,BENVGSC1,59,1,Kandt
2,False,BENVGSC5,32,2,Arcaute
3,True,BENVGACH,86,1,Wise


Most of the time though, we will want to import data from elsewhere to use in Python. For example, let's work with the coursework data ('coursework_1_data_2017.csv'):

In [203]:
df = pd.read_csv('coursework_1_data_2017.csv')

Once you have imported the data, you will want to get an idea of what it looks like:

In [204]:
df.head()

Unnamed: 0,local_authority_area,2008_KSI,2009_KSI,2010_KSI,2008_pop,2009_pop,2010_pop,Total_Budget,Cyclist_Safety,Child_Safety,Motorcycle_Safety,Drink_Drive_Campaigns,Promote_Cycling,Promote_Car_Sharing,local_authority_type
0,Barking and Dagenham,63,64,56,171500,175600,179500,139000,21000,22000,18000,41000,18000,19000,london_borough
1,Barnet,136,128,127,338100,343100,347200,220000,50000,17000,23000,74000,30000,26000,london_borough
2,Barnsley,112,116,100,225200,226300,227000,160000,41000,45000,5000,26000,29000,14000,metropolitan_borough
3,Bath and North East Somerset,34,20,20,177400,177700,177200,161000,33000,42000,19000,50000,10000,7000,unitary_authority
4,Bedford,80,86,65,157100,158000,158200,130000,37000,52000,17000,8000,3000,13000,unitary_authority


You will notice that there is a column of numbers on the left hand side that do not appear in the file. These are the row labels (the **index** of the dataframe).

Since our rows represent local authority areas, we perhaps want to use these as our row labels instead:

In [205]:
df = df.set_index('local_authority_area')
df.head()

Unnamed: 0_level_0,2008_KSI,2009_KSI,2010_KSI,2008_pop,2009_pop,2010_pop,Total_Budget,Cyclist_Safety,Child_Safety,Motorcycle_Safety,Drink_Drive_Campaigns,Promote_Cycling,Promote_Car_Sharing,local_authority_type
local_authority_area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Barking and Dagenham,63,64,56,171500,175600,179500,139000,21000,22000,18000,41000,18000,19000,london_borough
Barnet,136,128,127,338100,343100,347200,220000,50000,17000,23000,74000,30000,26000,london_borough
Barnsley,112,116,100,225200,226300,227000,160000,41000,45000,5000,26000,29000,14000,metropolitan_borough
Bath and North East Somerset,34,20,20,177400,177700,177200,161000,33000,42000,19000,50000,10000,7000,unitary_authority
Bedford,80,86,65,157100,158000,158200,130000,37000,52000,17000,8000,3000,13000,unitary_authority


If you want just the column headers, just the index, or just the body data:

In [206]:
df.columns

Index(['2008_KSI', '2009_KSI', '2010_KSI', '2008_pop', '2009_pop', '2010_pop',
       'Total_Budget', 'Cyclist_Safety', 'Child_Safety', 'Motorcycle_Safety',
       'Drink_Drive_Campaigns', 'Promote_Cycling', 'Promote_Car_Sharing',
       'local_authority_type'],
      dtype='object')

In [207]:
df.index

Index(['Barking and Dagenham', 'Barnet', 'Barnsley',
       'Bath and North East Somerset', 'Bedford', 'Bexley', 'Birmingham',
       'Blackburn with Darwen', 'Blackpool', 'Bolton',
       ...
       'West Sussex', 'Westminster', 'Wigan', 'Wiltshire',
       'Windsor and Maidenhead', 'Wirral', 'Wokingham', 'Wolverhampton',
       'Worcestershire', 'York'],
      dtype='object', name='local_authority_area', length=151)

In [208]:
df.values

array([[63, 64, 56, ..., 18000, 19000, 'london_borough'],
       [136, 128, 127, ..., 30000, 26000, 'london_borough'],
       [112, 116, 100, ..., 29000, 14000, 'metropolitan_borough'],
       ..., 
       [79, 80, 70, ..., 50000, 28000, 'metropolitan_borough'],
       [249, 250, 227, ..., 67000, 30000, 'non_metropolitan_county'],
       [95, 92, 93, ..., 37000, 19000, 'unitary_authority']], dtype=object)

Slicing and indexing works a little differently in pandas. You can choose between a couple of different methods, depending on whether you want to use the row and column labels (.loc[ ]) or numerical values (.iloc[ ]) to select data:

In [209]:
df.loc[['Barnsley','City of London'],['2008_KSI']] # Returns a dataframe

Unnamed: 0_level_0,2008_KSI
local_authority_area,Unnamed: 1_level_1
Barnsley,112
City of London,51


In [210]:
df.iloc[0:3,5:8] # Returns a dataframe

Unnamed: 0_level_0,2010_pop,Total_Budget,Cyclist_Safety
local_authority_area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Barking and Dagenham,179500,139000,21000
Barnet,347200,220000,50000
Barnsley,227000,160000,41000


In [211]:
df.loc['Barnsley','2008_KSI'] # Returns an entry

112

In [212]:
df.iloc[17,13] # Returns an entry

'non_metropolitan_county'

There is a nice method to give you some summary statistics of your dataframe (note that the statistics themselves are returned to you as a dataframe):

In [213]:
summary_stats = df.describe()
summary_stats

Unnamed: 0,2008_KSI,2009_KSI,2010_KSI,2008_pop,2009_pop,2010_pop,Total_Budget,Cyclist_Safety,Child_Safety,Motorcycle_Safety,Drink_Drive_Campaigns,Promote_Cycling,Promote_Car_Sharing
count,151.0,151.0,151.0,151.0,151.0,151.0,151.0,151.0,151.0,151.0,151.0,151.0,151.0
mean,161.350993,157.02649,145.403974,340811.9,343095.4,345422.5,272086.1,59761.589404,68880.794702,25278.145695,63033.112583,33185.430464,21947.019868
std,144.221367,143.273489,135.096666,259746.2,261310.4,262871.9,210015.1,53941.784498,62069.093453,22798.584442,55015.563521,29908.171453,19818.102189
min,21.0,14.0,10.0,11300.0,11500.0,11700.0,50000.0,4000.0,4000.0,1000.0,3000.0,2000.0,1000.0
25%,71.5,65.0,61.5,189000.0,190950.0,192000.0,140500.0,26000.0,32000.0,12500.0,32000.0,14000.0,9000.0
50%,102.0,99.0,94.0,252900.0,253900.0,254800.0,200000.0,42000.0,48000.0,18000.0,45000.0,25000.0,16000.0
75%,195.5,191.5,164.5,382900.0,383350.0,383650.0,304500.0,70000.0,79000.0,27000.0,70500.0,40500.0,27500.0
max,801.0,773.0,752.0,1401700.0,1411100.0,1427500.0,1170000.0,297000.0,333000.0,130000.0,340000.0,203000.0,99000.0


One area was missing from the coursework data. Road deaths and serious injuries for Heathrow Airport were recorded separately from the London borough of Hillingdon, where Heathrow is located.

The .loc method can also be used to add an extra row to our data:

In [214]:
extra_row = ['Heathrow Airport',5,2,3,0,0,0,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,'other_london']
df.loc[extra_row[0]] = extra_row[1:]
df.tail()

Unnamed: 0_level_0,2008_KSI,2009_KSI,2010_KSI,2008_pop,2009_pop,2010_pop,Total_Budget,Cyclist_Safety,Child_Safety,Motorcycle_Safety,Drink_Drive_Campaigns,Promote_Cycling,Promote_Car_Sharing,local_authority_type
local_authority_area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Wokingham,46,44,36,159700,161900,164200,109000.0,20000.0,41000.0,13000.0,19000.0,10000.0,6000.0,unitary_authority
Wolverhampton,79,80,70,238100,238500,238500,201000.0,18000.0,27000.0,23000.0,55000.0,50000.0,28000.0,metropolitan_borough
Worcestershire,249,250,227,555300,556500,559300,470000.0,128000.0,112000.0,57000.0,76000.0,67000.0,30000.0,non_metropolitan_county
York,95,92,93,194900,198800,203100,140000.0,5000.0,39000.0,8000.0,32000.0,37000.0,19000.0,unitary_authority
Heathrow Airport,5,2,3,0,0,0,,,,,,,,other_london


As we can see, Heathrow has been added at the bottom of the table, but perhaps we want to resort the index alphabetically:

In [215]:
df = df.sort_index(axis=0,ascending=True)
df.tail()

Unnamed: 0_level_0,2008_KSI,2009_KSI,2010_KSI,2008_pop,2009_pop,2010_pop,Total_Budget,Cyclist_Safety,Child_Safety,Motorcycle_Safety,Drink_Drive_Campaigns,Promote_Cycling,Promote_Car_Sharing,local_authority_type
local_authority_area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Wirral,145,137,129,308500,308500,308200,209000.0,69000.0,47000.0,30000.0,41000.0,14000.0,8000.0,metropolitan_borough
Wokingham,46,44,36,159700,161900,164200,109000.0,20000.0,41000.0,13000.0,19000.0,10000.0,6000.0,unitary_authority
Wolverhampton,79,80,70,238100,238500,238500,201000.0,18000.0,27000.0,23000.0,55000.0,50000.0,28000.0,metropolitan_borough
Worcestershire,249,250,227,555300,556500,559300,470000.0,128000.0,112000.0,57000.0,76000.0,67000.0,30000.0,non_metropolitan_county
York,95,92,93,194900,198800,203100,140000.0,5000.0,39000.0,8000.0,32000.0,37000.0,19000.0,unitary_authority


Or perhaps we want to sort everything by the total budgets:

In [216]:
df = df.sort_values(by="Total_Budget",ascending=True)
df.head()

Unnamed: 0_level_0,2008_KSI,2009_KSI,2010_KSI,2008_pop,2009_pop,2010_pop,Total_Budget,Cyclist_Safety,Child_Safety,Motorcycle_Safety,Drink_Drive_Campaigns,Promote_Cycling,Promote_Car_Sharing,local_authority_type
local_authority_area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
City of London,51,52,48,11300,11500,11700,50000.0,12000.0,9000.0,6000.0,20000.0,2000.0,1000.0,other_london
Rutland,30,25,31,38100,38400,38400,51000.0,17000.0,5000.0,8000.0,14000.0,5000.0,2000.0,unitary_authority
Darlington,34,27,27,100100,100400,100500,79000.0,18000.0,33000.0,3000.0,12000.0,10000.0,3000.0,unitary_authority
Hartlepool,29,24,21,90800,90900,91300,89000.0,37000.0,14000.0,7000.0,17000.0,7000.0,7000.0,unitary_authority
Bracknell Forest,21,14,10,114000,115100,116200,90000.0,35000.0,25000.0,9000.0,4000.0,13000.0,4000.0,unitary_authority


All the manipulations that you might have wanted to do for the previous piece of coursework can be done pretty easily in pandas. For example, suppose you want to add a column to the dataframe for 2008_KSI per hundred thousand people:

In [217]:
df['2008_KSI_pht'] = 100000 * df["2008_KSI"] / df["2008_pop"]
df.head()

Unnamed: 0_level_0,2008_KSI,2009_KSI,2010_KSI,2008_pop,2009_pop,2010_pop,Total_Budget,Cyclist_Safety,Child_Safety,Motorcycle_Safety,Drink_Drive_Campaigns,Promote_Cycling,Promote_Car_Sharing,local_authority_type,2008_KSI_pht
local_authority_area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
City of London,51,52,48,11300,11500,11700,50000.0,12000.0,9000.0,6000.0,20000.0,2000.0,1000.0,other_london,451.327434
Rutland,30,25,31,38100,38400,38400,51000.0,17000.0,5000.0,8000.0,14000.0,5000.0,2000.0,unitary_authority,78.740157
Darlington,34,27,27,100100,100400,100500,79000.0,18000.0,33000.0,3000.0,12000.0,10000.0,3000.0,unitary_authority,33.966034
Hartlepool,29,24,21,90800,90900,91300,89000.0,37000.0,14000.0,7000.0,17000.0,7000.0,7000.0,unitary_authority,31.938326
Bracknell Forest,21,14,10,114000,115100,116200,90000.0,35000.0,25000.0,9000.0,4000.0,13000.0,4000.0,unitary_authority,18.421053


One of the most useful things to do with a dataframe is to search it for rows that fulfil particular conditions. This is always done by creating a Boolean series and using this to index the dataframe.

For example, select the rows relating to areas in London:

In [218]:
bool_series = df['local_authority_type'].isin(['london_borough','other_london'])
bool_series

local_authority_area
City of London              True
Rutland                    False
Darlington                 False
Hartlepool                 False
Bracknell Forest           False
Blackburn with Darwen      False
Blackpool                  False
Halton                     False
Poole                      False
Middlesbrough              False
Wokingham                  False
North East Lincolnshire    False
Reading                    False
Slough                     False
Telford and Wrekin         False
Windsor and Maidenhead     False
Southend on Sea            False
Redcar and Cleveland       False
Thurrock                   False
Herefordshire              False
Knowsley                   False
Isle of Wight              False
Hammersmith and Fulham      True
Torbay                     False
Kensington and Chelsea      True
Bournemouth                False
Richmond-upon-Thames        True
Peterborough               False
Bedford                    False
South Tyneside        

In [219]:
df3 = df[bool_series]
df3.head()

Unnamed: 0_level_0,2008_KSI,2009_KSI,2010_KSI,2008_pop,2009_pop,2010_pop,Total_Budget,Cyclist_Safety,Child_Safety,Motorcycle_Safety,Drink_Drive_Campaigns,Promote_Cycling,Promote_Car_Sharing,local_authority_type,2008_KSI_pht
local_authority_area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
City of London,51,52,48,11300,11500,11700,50000.0,12000.0,9000.0,6000.0,20000.0,2000.0,1000.0,other_london,451.327434
Hammersmith and Fulham,94,89,89,168600,169700,170300,121000.0,28000.0,26000.0,13000.0,36000.0,14000.0,4000.0,london_borough,55.753262
Kensington and Chelsea,113,102,99,171100,169900,167900,130000.0,33000.0,42000.0,10000.0,20000.0,19000.0,6000.0,london_borough,66.04325
Richmond-upon-Thames,64,64,62,187200,189000,190500,130000.0,24000.0,39000.0,16000.0,29000.0,12000.0,10000.0,london_borough,34.188034
Kingston-upon-Thames,65,65,59,164600,166700,168900,139000.0,32000.0,39000.0,13000.0,21000.0,28000.0,6000.0,london_borough,39.489672


How about all the unitary authorities with more than 70 KSI in 2010:

In [220]:
bool_series = (df['2010_KSI'] > 70) & (df['local_authority_type'] == 'unitary_authority')
df4 = df[bool_series]
df4.head()

Unnamed: 0_level_0,2008_KSI,2009_KSI,2010_KSI,2008_pop,2009_pop,2010_pop,Total_Budget,Cyclist_Safety,Child_Safety,Motorcycle_Safety,Drink_Drive_Campaigns,Promote_Cycling,Promote_Car_Sharing,local_authority_type,2008_KSI_pht
local_authority_area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
North East Lincolnshire,102,99,97,157200,157100,157300,110000.0,25000.0,14000.0,7000.0,32000.0,25000.0,7000.0,unitary_authority,64.885496
Herefordshire,93,91,85,179100,179100,179400,120000.0,4000.0,51000.0,13000.0,42000.0,6000.0,4000.0,unitary_authority,51.926298
Isle of Wight,98,89,91,140200,140200,140900,120000.0,24000.0,33000.0,9000.0,24000.0,22000.0,8000.0,unitary_authority,69.900143
Peterborough,101,98,91,169800,171200,173200,130000.0,18000.0,53000.0,11000.0,22000.0,19000.0,7000.0,unitary_authority,59.481743
York,95,92,93,194900,198800,203100,140000.0,5000.0,39000.0,8000.0,32000.0,37000.0,19000.0,unitary_authority,48.742945


Finally, we notice that the data for Heathrow is incomplete. No information is available for how much road safety budget was spent there (probably none). There are two ways we could deal with this:

By dropping rows with missing data (NaN)...

In [221]:
df.dropna(how='any')

Unnamed: 0_level_0,2008_KSI,2009_KSI,2010_KSI,2008_pop,2009_pop,2010_pop,Total_Budget,Cyclist_Safety,Child_Safety,Motorcycle_Safety,Drink_Drive_Campaigns,Promote_Cycling,Promote_Car_Sharing,local_authority_type,2008_KSI_pht
local_authority_area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
City of London,51,52,48,11300,11500,11700,50000.0,12000.0,9000.0,6000.0,20000.0,2000.0,1000.0,other_london,451.327434
Rutland,30,25,31,38100,38400,38400,51000.0,17000.0,5000.0,8000.0,14000.0,5000.0,2000.0,unitary_authority,78.740157
Darlington,34,27,27,100100,100400,100500,79000.0,18000.0,33000.0,3000.0,12000.0,10000.0,3000.0,unitary_authority,33.966034
Hartlepool,29,24,21,90800,90900,91300,89000.0,37000.0,14000.0,7000.0,17000.0,7000.0,7000.0,unitary_authority,31.938326
Bracknell Forest,21,14,10,114000,115100,116200,90000.0,35000.0,25000.0,9000.0,4000.0,13000.0,4000.0,unitary_authority,18.421053
Blackburn with Darwen,66,66,63,139400,139900,140100,91000.0,15000.0,16000.0,13000.0,21000.0,18000.0,8000.0,unitary_authority,47.345768
Blackpool,62,57,52,140600,140000,138800,99000.0,24000.0,19000.0,9000.0,28000.0,10000.0,9000.0,unitary_authority,44.096728
Halton,59,60,55,118500,118700,119000,99000.0,31000.0,20000.0,7000.0,21000.0,10000.0,10000.0,unitary_authority,49.789030
Poole,63,64,55,140700,141200,141800,100000.0,16000.0,20000.0,15000.0,38000.0,6000.0,5000.0,unitary_authority,44.776119
Middlesbrough,47,44,40,140100,140500,141500,101000.0,19000.0,33000.0,10000.0,17000.0,13000.0,9000.0,unitary_authority,33.547466


... or by filling the NaN values with zero (no money spent):

In [222]:
df = df.fillna(value=0)
df.loc['Heathrow Airport']

2008_KSI                            5
2009_KSI                            2
2010_KSI                            3
2008_pop                            0
2009_pop                            0
2010_pop                            0
Total_Budget                        0
Cyclist_Safety                      0
Child_Safety                        0
Motorcycle_Safety                   0
Drink_Drive_Campaigns               0
Promote_Cycling                     0
Promote_Car_Sharing                 0
local_authority_type     other_london
2008_KSI_pht                      inf
Name: Heathrow Airport, dtype: object

Finally, let's remove the column we added and save the file:

In [223]:
df = df.drop(['2008_KSI_pht'],axis=1)
df.head()

Unnamed: 0_level_0,2008_KSI,2009_KSI,2010_KSI,2008_pop,2009_pop,2010_pop,Total_Budget,Cyclist_Safety,Child_Safety,Motorcycle_Safety,Drink_Drive_Campaigns,Promote_Cycling,Promote_Car_Sharing,local_authority_type
local_authority_area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
City of London,51,52,48,11300,11500,11700,50000.0,12000.0,9000.0,6000.0,20000.0,2000.0,1000.0,other_london
Rutland,30,25,31,38100,38400,38400,51000.0,17000.0,5000.0,8000.0,14000.0,5000.0,2000.0,unitary_authority
Darlington,34,27,27,100100,100400,100500,79000.0,18000.0,33000.0,3000.0,12000.0,10000.0,3000.0,unitary_authority
Hartlepool,29,24,21,90800,90900,91300,89000.0,37000.0,14000.0,7000.0,17000.0,7000.0,7000.0,unitary_authority
Bracknell Forest,21,14,10,114000,115100,116200,90000.0,35000.0,25000.0,9000.0,4000.0,13000.0,4000.0,unitary_authority


In [224]:
df.to_csv('new_road_safety_data.csv')

For more information, see the 10-Minute Pandas Tutorial:

https://pandas.pydata.org/pandas-docs/stable/10min.html