# Pandas Tutorial II

### Botswana 2014 General Election Results

##### What is pandas? 
"pandas is an open source, BSD-licensed library providing high-performance, 
easy-to-use data structures and data analysis tools for the Python programming language." - https://pandas.pydata.org/pandas-docs/stable/overview.html

##### And our tutorial? 

This is an introductory tutorial. We will be using data from Botswana's 2014 General Elections. The data is available in 
an excel spreadsheet that we will load into pandas for analysis.

Have fun!

### NOTE

The objective of this tutorial to learn pandas as we answer questions. We have already seen a few pandas features and methods so far. We will incrementally build on the methods we have seen to answer more complex questions.

Additionally, we will manipulate dataframes to be able to answer some questions.

**Task:** Import the pandas library as pd.

In [1]:
#Import pandas
import pandas as pd

### Creating dataframes from CSV files.

**Task:** Create a **constituency_stats** dataframe from the **constituency_stats.csv** file 

In [2]:
constituency_stats = pd.read_csv('data/constituency_stats.csv')

In [3]:
constituency_stats.head()

Unnamed: 0,constituency_name,registered_voters,cast_votes,rejected_votes,valid_votes
0,Chobe,8942,7354,74,7280
1,Maun East,16774,13607,151,13456
2,Maun West,18329,15100,137,14963
3,Ngami,18159,15055,175,14880
4,Okavango,15243,12726,174,12552


In [4]:
constituency_stats.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 57 entries, 0 to 56
Data columns (total 5 columns):
constituency_name    57 non-null object
registered_voters    57 non-null int64
cast_votes           57 non-null int64
rejected_votes       57 non-null int64
valid_votes          57 non-null int64
dtypes: int64(4), object(1)
memory usage: 2.3+ KB


**Task:** Create the **candidate_votes** dataframe from the **party_votes.csv** file

In [5]:
candidate_votes = pd.read_csv('data/candidate_votes.csv')

In [6]:
candidate_votes.head()

Unnamed: 0,constituency_name,candidate_name,party_name,party_votes
0,Chobe,Ronald Machana Shamukuni,BDP,4114
1,Chobe,Gibson M.R Nshimwe,BCP,3166
2,Maun East,Konstantinos Markus,BDP,6046
3,Maun East,Goretetse Kekgonegile,BCP,5304
4,Maun East,Osimilwe O. Fish,UDC,2062


In [7]:
candidate_votes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 192 entries, 0 to 191
Data columns (total 4 columns):
constituency_name    192 non-null object
candidate_name       192 non-null object
party_name           192 non-null object
party_votes          192 non-null int64
dtypes: int64(1), object(3)
memory usage: 6.1+ KB


### Slicing and Subsetting

Often we want to work with subsets of a dataframe. We can select these subsets by:

- location based indexing
- label based indexing
- conditional indexing

We will use these features to answer many questions in this tutorial.

**Question 1:** What is the maximum number of 'registered_voters', 'cast_votes', 'rejected_votes', and 'valid_votes' across all constituencies?

**Approach:** 

1. Select the required columns from the dataframe.
2. Use the **max()** on appropriate axis direction.

In [9]:
#We can get the answers all at once.
constituency_stats[['registered_voters', 'cast_votes', 'rejected_votes', 'valid_votes']].max(axis=0)

registered_voters    21146
cast_votes           18626
rejected_votes         411
valid_votes          18499
dtype: int64

**OR** We can find maximum for each column series

In [10]:
constituency_stats['registered_voters'].max()

21146

In [11]:
constituency_stats['cast_votes'].max()

18626

**Question 2:** What are the corresponding constituencies for the maximums above?

**Approach:**
- Use locational indexing to select rows and columns of interest.
- Use **loc()** method and **idxmax()**
- Select the 'constituency_name' column **ONLY.**

In [12]:
#Using positional indexing
constituency_stats.loc[constituency_stats['registered_voters'].idxmax(), 'constituency_name']

'Mochudi West'

In [13]:
constituency_stats.loc[constituency_stats['cast_votes'].idxmax(), 'constituency_name']

'Mochudi West'

In [14]:
constituency_stats.loc[constituency_stats['rejected_votes'].idxmax(), 'constituency_name']

'Nata-Gweta'

In [15]:
constituency_stats.loc[constituency_stats['valid_votes'].idxmax(), 'constituency_name']

'Mochudi West'

**OR** For an alternative solution we can use conditional indexing i.e selecting rows that match a particular criteria.

In [16]:
constituency_stats.rejected_votes == 411

0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10     True
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24    False
25    False
26    False
27    False
28    False
29    False
30    False
31    False
32    False
33    False
34    False
35    False
36    False
37    False
38    False
39    False
40    False
41    False
42    False
43    False
44    False
45    False
46    False
47    False
48    False
49    False
50    False
51    False
52    False
53    False
54    False
55    False
56    False
Name: rejected_votes, dtype: bool

The idea with conditional indexing is to return row over which a condition of interest is **True**. This is true at location index 10. We will use this Boolean series to select the row at index 10 ONLY.

In [17]:
constituency_stats.loc[constituency_stats.rejected_votes == 411, 'constituency_name']

10    Nata-Gweta
Name: constituency_name, dtype: object

In [18]:
constituency_stats[constituency_stats.valid_votes == 18499]['constituency_name']

30    Mochudi West
Name: constituency_name, dtype: object

In [19]:
constituency_stats[constituency_stats.registered_voters == 21146]['constituency_name']

30    Mochudi West
Name: constituency_name, dtype: object

### Applying dataframe methods to dataframe slices and subsets

**Question 3:** What are the totals for 'registered_voters', 'cast_votes', 'rejected_votes' and 'valid_votes' for all constituencies?

In [20]:
constituency_stats_totals = constituency_stats[['registered_voters',
                'cast_votes', 'rejected_votes', 'valid_votes']].sum()
constituency_stats_totals

registered_voters    824073
cast_votes           698409
rejected_votes         8167
valid_votes          690242
dtype: int64

In [21]:
constituency_stats_totals['cast_votes']

698409

**Question 4:** Which constituencies have less than 10000 registered voters? Return all columns of the dataframe that meets this condition.

In [22]:
constituency_stats[constituency_stats.registered_voters < 10000]

Unnamed: 0,constituency_name,registered_voters,cast_votes,rejected_votes,valid_votes
0,Chobe,8942,7354,74,7280
16,Selibe Phikwe East,9732,8361,54,8307
25,Serowe West,8500,6962,88,6874
55,Ghanzi North,9156,7772,88,7684


**OR**

In [23]:
constituency_stats.loc[constituency_stats.registered_voters < 10000, :]

Unnamed: 0,constituency_name,registered_voters,cast_votes,rejected_votes,valid_votes
0,Chobe,8942,7354,74,7280
16,Selibe Phikwe East,9732,8361,54,8307
25,Serowe West,8500,6962,88,6874
55,Ghanzi North,9156,7772,88,7684


**Question 5:** Which constituencies had more than 200 spoilt votes/rejected votes?

In [24]:
constituency_stats[constituency_stats.rejected_votes > 200]

Unnamed: 0,constituency_name,registered_voters,cast_votes,rejected_votes,valid_votes
10,Nata-Gweta,11009,9720,411,9309
11,Nkange,15078,12907,214,12693
15,Mmadinare,13108,10670,228,10442
20,Sefhare-Ramokgonami,15960,13705,339,13366
21,Mahalapye East,10993,9316,231,9085
23,Shoshong,12400,10689,226,10463
37,Ramotswa,20246,18087,281,17806
48,Mmathethe-Molapowabojang,19073,16392,290,16102


There is a useful dataframe method **describe()** that returns statistics of all numeric columns of a dataframe.

**Task: **Apply the **describe()** method on the dataframe. 

In [25]:
constituency_stats.describe()

Unnamed: 0,registered_voters,cast_votes,rejected_votes,valid_votes
count,57.0,57.0,57.0,57.0
mean,14457.421053,12252.789474,143.280702,12109.508772
std,3176.2241,2786.323555,78.921922,2758.211294
min,8500.0,6962.0,21.0,6874.0
25%,12400.0,10670.0,86.0,10442.0
50%,14054.0,11875.0,147.0,11747.0
75%,16698.0,13730.0,189.0,13530.0
max,21146.0,18626.0,411.0,18499.0


In [26]:
#Setting the index of a dataframe
constituency_stats.set_index('constituency_name', inplace=True)

In [27]:
constituency_stats.loc[:, ['registered_voters', 'rejected_votes']].head()

Unnamed: 0_level_0,registered_voters,rejected_votes
constituency_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Chobe,8942,74
Maun East,16774,151
Maun West,18329,137
Ngami,18159,175
Okavango,15243,174


In [28]:
constituency_stats.index

Index(['Chobe', 'Maun East', 'Maun West', 'Ngami', 'Okavango', 'Tati East',
       'Tati West', 'Francistown East', 'Francistown South',
       'Francistown West', 'Nata-Gweta', 'Nkange', 'Shashe West', 'Tonota',
       'Bobonong', 'Mmadinare', 'Selibe Phikwe East', 'Selibe Phikwe West',
       'Lerala-Maunatlala', 'Palapye', 'Sefhare-Ramokgonami', 'Mahalapye East',
       'Mahalapye West', 'Shoshong', 'Serowe North', 'Serowe West',
       'Serowe South', 'Boteti East', 'Boteti West', 'Mochudi East',
       'Mochudi West', 'Gaborone Central', 'Gaborone North', 'Gaborone South',
       'Gaborone Bonnington North', 'Gaborone Bonnington South', 'Tlokweng',
       'Ramotswa', 'Mogoditshane', 'Gabane-Mmankgodi', 'Thamaga-Kumakwane',
       'Molepolole North', 'Molepolole South', 'Lentsweletau-Mmopane',
       'Letlhakeng-Lephephe', 'Takatokwane', 'Lobatse', 'Goodhope-Mabule',
       'Mmathethe-Molapowabojang', 'Kanye North', 'Kanye South',
       'Moshupa-Manyana', 'Jwaneng-Mabutsane', 'Kga

**Task:** Create a column named **rejected_pct** computing the rejected_votes as a % of cast_votes.

In [29]:
constituency_stats= constituency_stats.assign(rejected_pct = constituency_stats.rejected_votes/
                          constituency_stats.cast_votes * 100)
constituency_stats.head()

Unnamed: 0_level_0,registered_voters,cast_votes,rejected_votes,valid_votes,rejected_pct
constituency_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Chobe,8942,7354,74,7280,1.006255
Maun East,16774,13607,151,13456,1.109723
Maun West,18329,15100,137,14963,0.907285
Ngami,18159,15055,175,14880,1.162405
Okavango,15243,12726,174,12552,1.36728


**Question 5:** Which constituencies have the lowest percentage of rejected votes? Return 10 entries only.

**Approach**

- Use the **sort_values()** method to sort the dataframe.
- Use the **head()** method to top 10 entries

In [30]:
constituency_stats.sort_values(by='rejected_pct', ascending=True).head(10)

Unnamed: 0_level_0,registered_voters,cast_votes,rejected_votes,valid_votes,rejected_pct
constituency_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Tlokweng,13980,11525,21,11504,0.182213
Gaborone Bonnington South,13811,11595,34,11561,0.29323
Francistown East,10236,8483,25,8458,0.294707
Gaborone Bonnington North,16698,14352,49,14303,0.341416
Gaborone North,15178,13049,45,13004,0.344854
Francistown South,12333,10097,36,10061,0.356542
Francistown West,12511,9919,46,9873,0.463756
Gaborone Central,14054,11609,55,11554,0.47377
Mogoditshane,15451,11875,62,11813,0.522105
Selibe Phikwe West,10196,8695,46,8649,0.52904


**Question 6:** Which constituency has the highest voter turnout?

In [31]:
(constituency_stats.cast_votes/constituency_stats.registered_voters).idxmax()

'Kgalagadi North'

**Question 7:** Which constituency has the lowest voter turnout?

In [32]:
(constituency_stats.cast_votes/constituency_stats.registered_voters).idxmin()

'Mogoditshane'

**Question 8:** What is the average number of registered voters per constituency?

In [33]:
constituency_stats['registered_voters'].mean()

14457.421052631578

**Task** Create a percentage turnout **turnout_pct** column. % of **cast_votes** over **registered_voters** for each constituency.

In [34]:
constituency_stats = constituency_stats.assign(turnout_pct = constituency_stats.cast_votes/
                                               constituency_stats.registered_voters * 100)
constituency_stats.head()

Unnamed: 0_level_0,registered_voters,cast_votes,rejected_votes,valid_votes,rejected_pct,turnout_pct
constituency_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Chobe,8942,7354,74,7280,1.006255,82.241109
Maun East,16774,13607,151,13456,1.109723,81.11959
Maun West,18329,15100,137,14963,0.907285,82.383109
Ngami,18159,15055,175,14880,1.162405,82.906548
Okavango,15243,12726,174,12552,1.36728,83.487502


**Question 9:** Which constituencies had the highest voter turnout? Return all coulmns of the top 20 constituencies.

In [35]:
constituency_stats.sort_values(by='turnout_pct', ascending=False).head(20)

Unnamed: 0_level_0,registered_voters,cast_votes,rejected_votes,valid_votes,rejected_pct,turnout_pct
constituency_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Kgalagadi North,10102,9307,72,9235,0.773611,92.130271
Takatokwane,11746,10698,169,10529,1.579735,91.077814
Ramotswa,20246,18087,281,17806,1.553602,89.336165
Molepolole North,17022,15108,143,14965,0.946518,88.755728
Nata-Gweta,11009,9720,411,9309,4.228395,88.291398
Mochudi West,21146,18626,127,18499,0.681843,88.082853
Kgalagadi South,14280,12574,147,12427,1.169079,88.053221
Molepolole South,13556,11893,146,11747,1.227613,87.732369
Letlhakeng-Lephephe,12772,11104,190,10914,1.711095,86.940182
Thamaga-Kumakwane,16856,14617,200,14417,1.36827,86.716896


**Question 10:** What was the national voter turnout percentage?

In [36]:
constituency_stats.cast_votes.sum()/constituency_stats.registered_voters.sum() * 100

84.750865517982021

In [37]:
candidate_votes.head()

Unnamed: 0,constituency_name,candidate_name,party_name,party_votes
0,Chobe,Ronald Machana Shamukuni,BDP,4114
1,Chobe,Gibson M.R Nshimwe,BCP,3166
2,Maun East,Konstantinos Markus,BDP,6046
3,Maun East,Goretetse Kekgonegile,BCP,5304
4,Maun East,Osimilwe O. Fish,UDC,2062


We will use methods from the numpy library in the next few question.

**Task:** Import the **numpy** library as pd

In [38]:
import numpy as np

**Question 11:** How many votes did each party receive? Create a dataframe called votes_per_party with the results.

**Approach:**

- Pivot the dataframe on **party_name**
- Sum all votes for the same party.
- Use the **pivot_table()** method 

In [39]:
votes_per_party = candidate_votes.pivot_table(values='party_votes', index='party_name',aggfunc=np.sum)

In [40]:
votes_per_party.sort_values(by='party_votes', ascending=False, inplace=True)

In [41]:
votes_per_party

Unnamed: 0_level_0,party_votes
party_name,Unnamed: 1_level_1
BDP,320647
UDC,207116
BCP,141005
IND,21484


**Question 12:** What is the total sum of opposition votes - votes for ['UDC', 'BCP', 'IND']?


In [42]:
votes_per_party[votes_per_party.index != 'BDP'].sum()

party_votes    369605
dtype: int64

In [43]:
#Set index
candidate_votes.set_index('constituency_name', inplace=True)
candidate_votes.head()

Unnamed: 0_level_0,candidate_name,party_name,party_votes
constituency_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Chobe,Ronald Machana Shamukuni,BDP,4114
Chobe,Gibson M.R Nshimwe,BCP,3166
Maun East,Konstantinos Markus,BDP,6046
Maun East,Goretetse Kekgonegile,BCP,5304
Maun East,Osimilwe O. Fish,UDC,2062


In [44]:
#Reset index
candidate_votes.reset_index(inplace=True)
candidate_votes.head()

Unnamed: 0,constituency_name,candidate_name,party_name,party_votes
0,Chobe,Ronald Machana Shamukuni,BDP,4114
1,Chobe,Gibson M.R Nshimwe,BCP,3166
2,Maun East,Konstantinos Markus,BDP,6046
3,Maun East,Goretetse Kekgonegile,BCP,5304
4,Maun East,Osimilwe O. Fish,UDC,2062


**Question 13:** Which candidates constested elections in Gaborone South?

In [45]:
candidate_votes[candidate_votes.constituency_name == 'Gaborone South']

Unnamed: 0,constituency_name,candidate_name,party_name,party_votes
112,Gaborone South,Kagiso Patrick Molatlhegi,BDP,3872
113,Gaborone South,Murray Moemedi Dipate,UDC,3629
114,Gaborone South,Akanyang Magama,BCP,2318
115,Gaborone South,Dumezweni M. Mthimkhulu,IND,1475


**Question 14:** Which candidates and parties contested elections in Mochudi West constituency?

In [46]:
candidate_votes.loc[candidate_votes.constituency_name == 'Mochudi West', ['candidate_name', 'party_name']]

Unnamed: 0,candidate_name,party_name
102,Gilbert Shimane Mangole,UDC
103,Unity Dow,BDP
104,Alfred Ramono Pilane,BCP


**Question 15:** In how many constituencies was BCP represented?

In [47]:
candidate_votes.loc[candidate_votes.party_name == 'BCP', 'constituency_name'].count()

54

**Question 16:** In how many constituencies was BDP represented?

In [48]:
candidate_votes.loc[candidate_votes.party_name == 'BDP', 'constituency_name'].count()

57

**Question 17:** In how many constituencies was UDC represented?

In [49]:
candidate_votes.loc[candidate_votes.party_name == 'UDC', 'constituency_name'].count()

52

**Question 18:** Which constituencies had no BCP candidate constesting elections?
    
**Set Approach:**

- Create a set of constituencies contested by BCP
- Create a set of ALL constituencies
- Find a set difference 

In [50]:
bcp_contested_consts = set(candidate_votes.loc[candidate_votes.party_name == 'BCP', 'constituency_name'])

In [51]:
all_consts = set(candidate_votes.loc[:,'constituency_name'])

In [52]:
all_consts - bcp_contested_consts

{'Ghanzi North', 'Molepolole North', 'Molepolole South'}

**Question 19:** Which constituencies had no UDC candidate constesting elections?

**Set Approach**
- Same idea as in **Question 18**

In [53]:
udc_contested_consts = set(candidate_votes.loc[candidate_votes.party_name == 'UDC', 'constituency_name'])

In [54]:
all_consts - udc_contested_consts

{'Chobe',
 'Lerala-Maunatlala',
 'Mmadinare',
 'Nata-Gweta',
 'Sefhare-Ramokgonami'}

**Question 20:** How many independent candidates contested elections?

In [55]:
candidate_votes.loc[candidate_votes.party_name == 'IND', 'party_name'].count()

29

**Question 21:** Which candidate won elections in each constituency? Create a dataframe of winning candidates.

In [56]:
winners = candidate_votes[['constituency_name','candidate_name', 'party_name', 
                           'party_votes']].groupby('constituency_name').head(1)

In [57]:
winners.head()

Unnamed: 0,constituency_name,candidate_name,party_name,party_votes
0,Chobe,Ronald Machana Shamukuni,BDP,4114
2,Maun East,Konstantinos Markus,BDP,6046
6,Maun West,Tawana Moremi,UDC,7271
9,Ngami,Thato Kwerepe,BDP,7063
12,Okavango,Bagalatia Aron,BCP,6864


**Question 22:** How many constituencies did each party win?

In [58]:
#Histogram Approach
winners.party_name.value_counts()

BDP    37
UDC    17
BCP     3
Name: party_name, dtype: int64

In [59]:
#Group-by approach
winners[['party_votes', 'party_name']].groupby(by='party_name').count()

Unnamed: 0_level_0,party_votes
party_name,Unnamed: 1_level_1
BCP,3
BDP,37
UDC,17


**BONUS EXERCISE** 

After the last general elections there were was a media consensus that the opposition could have won elections if 
they had constested as a united front.

**IS THAT A LEGITIMATE CLAIM?**

In [60]:
candidate_votes.head()

Unnamed: 0,constituency_name,candidate_name,party_name,party_votes
0,Chobe,Ronald Machana Shamukuni,BDP,4114
1,Chobe,Gibson M.R Nshimwe,BCP,3166
2,Maun East,Konstantinos Markus,BDP,6046
3,Maun East,Goretetse Kekgonegile,BCP,5304
4,Maun East,Osimilwe O. Fish,UDC,2062


**Task:** Select all columns and rows where the party name is BDP.

In [61]:
bdp_votes = candidate_votes[candidate_votes.party_name == 'BDP']
bdp_votes.head()

Unnamed: 0,constituency_name,candidate_name,party_name,party_votes
0,Chobe,Ronald Machana Shamukuni,BDP,4114
2,Maun East,Konstantinos Markus,BDP,6046
7,Maun West,Reaboke Mbulawa,BDP,5335
9,Ngami,Thato Kwerepe,BDP,7063
13,Okavango,Mbahahauka A. Kambimba,BDP,5473


**Task:** Create a dataframe of called **non_bdp_votes** with all columns and rows where the party voted for is not BDP.

In [62]:
non_bdp_votes = candidate_votes[candidate_votes.party_name != 'BDP']
non_bdp_votes.head()

Unnamed: 0,constituency_name,candidate_name,party_name,party_votes
1,Chobe,Gibson M.R Nshimwe,BCP,3166
3,Maun East,Goretetse Kekgonegile,BCP,5304
4,Maun East,Osimilwe O. Fish,UDC,2062
5,Maun East,Simon Lethake,IND,44
6,Maun West,Tawana Moremi,UDC,7271


In [63]:
non_bdp_votes.head(10)

Unnamed: 0,constituency_name,candidate_name,party_name,party_votes
1,Chobe,Gibson M.R Nshimwe,BCP,3166
3,Maun East,Goretetse Kekgonegile,BCP,5304
4,Maun East,Osimilwe O. Fish,UDC,2062
5,Maun East,Simon Lethake,IND,44
6,Maun West,Tawana Moremi,UDC,7271
8,Maun West,George Lubinda,BCP,2357
10,Ngami,Goyamang Taolo Habano,BCP,7015
11,Ngami,Kebinang Cosmos Moenga,UDC,802
12,Okavango,Bagalatia Aron,BCP,6864
14,Okavango,Vister M. Moruti,UDC,215


**Task:** Pivot the **non_bdp_votes** on **constituency_name** and sum the party_votes. Re-assign this dataframe to **non_bdp_votes**

In [64]:
non_bdp_votes = non_bdp_votes.pivot_table(values='party_votes', index='constituency_name',aggfunc=np.sum)
non_bdp_votes.head(10)

Unnamed: 0_level_0,party_votes
constituency_name,Unnamed: 1_level_1
Bobonong,7392
Boteti East,3198
Boteti West,6171
Chobe,3166
Francistown East,4640
Francistown South,6772
Francistown West,4568
Gabane-Mmankgodi,10211
Gaborone Bonnington North,10081
Gaborone Bonnington South,7967


In [65]:
non_bdp_votes.reset_index()
non_bdp_votes.head(10)

Unnamed: 0_level_0,party_votes
constituency_name,Unnamed: 1_level_1
Bobonong,7392
Boteti East,3198
Boteti West,6171
Chobe,3166
Francistown East,4640
Francistown South,6772
Francistown West,4568
Gabane-Mmankgodi,10211
Gaborone Bonnington North,10081
Gaborone Bonnington South,7967


**Task:** Rename the **party_votes** column to  **opposition_votes**.

In [66]:
non_bdp_votes.rename(columns = {'party_votes':'opposition_votes'}, inplace=True)

**Task:** From the **bdp_votes** dataframe select only the **constituency_name** and **party_votes** columns. Assign the resulting dataframe to bdp_votes.

In [67]:
bdp_votes = bdp_votes[['constituency_name', 'party_votes']]

**Task:** Rename the **party_votes** column of the **bdp_votes** dataframe to **bdp_votes**

In [68]:
bdp_votes.rename(columns={'party_votes':'bdp_votes'}, inplace=True)
bdp_votes.head(10)

Unnamed: 0,constituency_name,bdp_votes
0,Chobe,4114
2,Maun East,6046
7,Maun West,5335
9,Ngami,7063
13,Okavango,5473
15,Tati East,5864
18,Tati West,4510
22,Francistown East,3818
26,Francistown South,3289
28,Francistown West,5305


In [69]:
bdp_votes.set_index('constituency_name', inplace=True)

In [70]:
non_bdp_votes.head(10)

Unnamed: 0_level_0,opposition_votes
constituency_name,Unnamed: 1_level_1
Bobonong,7392
Boteti East,3198
Boteti West,6171
Chobe,3166
Francistown East,4640
Francistown South,6772
Francistown West,4568
Gabane-Mmankgodi,10211
Gaborone Bonnington North,10081
Gaborone Bonnington South,7967


We will then proceed to combine the **bdp_votes** and **non_bdp_votes** dataframes. We are now in a good position to support or refut the **BONUS Exercise** claim.

### Combining & Joining Dataframes

- concat() method
- merge() method


In [71]:
hypothetical_resuls = pd.concat([bdp_votes, non_bdp_votes], axis=1)

In [72]:
hypothetical_resuls.head()

Unnamed: 0,bdp_votes,opposition_votes
Bobonong,7350,7392
Boteti East,5530,3198
Boteti West,5790,6171
Chobe,4114,3166
Francistown East,3818,4640


**Task:** Create a column **opposition_win** in the **bdp_opposition** dataframe that indicates whether opposition votes are more than bpd votes.

In [73]:
hypothetical_resuls['opposition_win'] = (hypothetical_resuls.opposition_votes - hypothetical_resuls.bdp_votes) > 0

In [74]:
hypothetical_resuls.head()

Unnamed: 0,bdp_votes,opposition_votes,opposition_win
Bobonong,7350,7392,True
Boteti East,5530,3198,False
Boteti West,5790,6171,True
Chobe,4114,3166,False
Francistown East,3818,4640,True


In [75]:
len(hypothetical_resuls[hypothetical_resuls.opposition_win== True])

36

In [76]:
len(hypothetical_resuls[hypothetical_resuls.opposition_win == False])

21

In [77]:
hypothetical_resuls[hypothetical_resuls.opposition_win == True]

Unnamed: 0,bdp_votes,opposition_votes,opposition_win
Bobonong,7350,7392,True
Boteti West,5790,6171,True
Francistown East,3818,4640,True
Francistown South,3289,6772,True
Gabane-Mmankgodi,6833,10211,True
Gaborone Bonnington North,4222,10081,True
Gaborone Bonnington South,3597,7967,True
Gaborone Central,3191,8363,True
Gaborone North,4109,8895,True
Gaborone South,3872,7422,True


**Task:** Create a column **votes_difference** in the **hypothetical_resuls** dataframe that the difference between **bdp_votes** and **opposition_votes**

In [78]:
hypothetical_resuls['votes_difference'] = (hypothetical_resuls.bdp_votes - hypothetical_resuls.opposition_votes)

In [79]:
hypothetical_resuls.head(10)

Unnamed: 0,bdp_votes,opposition_votes,opposition_win,votes_difference
Bobonong,7350,7392,True,-42
Boteti East,5530,3198,False,2332
Boteti West,5790,6171,True,-381
Chobe,4114,3166,False,948
Francistown East,3818,4640,True,-822
Francistown South,3289,6772,True,-3483
Francistown West,5305,4568,False,737
Gabane-Mmankgodi,6833,10211,True,-3378
Gaborone Bonnington North,4222,10081,True,-5859
Gaborone Bonnington South,3597,7967,True,-4370


We have included constituencies that the opposition won in constructing the **hypothetical_resuls** dataframe.
Constituencies won by the opposition should be excluded.

This will allow us to identify additional constituencies the opposition might have won with combined votes.



**Task:** Select **constituency_name**, **party_name** columns from the winners dataframe into a
dataframe named **winning_party**

In [80]:
winning_party = winners[['constituency_name', 'party_name']]
winning_party.head()

Unnamed: 0,constituency_name,party_name
0,Chobe,BDP
2,Maun East,BDP
6,Maun West,UDC
9,Ngami,BDP
12,Okavango,BCP


In [81]:
winning_party.rename(columns={'party_name':'winning_party'}, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


**Task:** set the **constituency_name** column as the index

In [82]:
winning_party.set_index('constituency_name', inplace=True)

**Task:** Combined the **bdp_opposition** and **winning_party** dataframes

In [83]:
united_could_win = pd.concat([hypothetical_resuls, winning_party], axis=1)

In [84]:
united_could_win.head()

Unnamed: 0,bdp_votes,opposition_votes,opposition_win,votes_difference,winning_party
Bobonong,7350,7392,True,-42,BDP
Boteti East,5530,3198,False,2332,BDP
Boteti West,5790,6171,True,-381,BDP
Chobe,4114,3166,False,948,BDP
Francistown East,3818,4640,True,-822,BDP


**Question 22:** Which constituencies won by the ruling could have been won if opposition was united?

In [85]:
united_could_win[(united_could_win.winning_party == 'BDP') &
                (united_could_win.opposition_win == True)]

Unnamed: 0,bdp_votes,opposition_votes,opposition_win,votes_difference,winning_party
Bobonong,7350,7392,True,-42,BDP
Boteti West,5790,6171,True,-381,BDP
Francistown East,3818,4640,True,-822,BDP
Gaborone South,3872,7422,True,-3550,BDP
Kanye North,5726,10157,True,-4431,BDP
Lentsweletau-Mmopane,7170,9269,True,-2099,BDP
Letlhakeng-Lephephe,5265,5649,True,-384,BDP
Lobatse,5485,5530,True,-45,BDP
Mahalapye East,4406,4679,True,-273,BDP
Maun East,6046,7410,True,-1364,BDP


**Question 23:** How many constituencies won by the ruling could have been won if opposition was united?

In [86]:
len(united_could_win[(united_could_win.winning_party == 'BDP') &
                (united_could_win.opposition_win == True)])

16

In [87]:
united_could_win[['bdp_votes','opposition_votes']].sum()

bdp_votes           320647
opposition_votes    369605
dtype: int64

**Question 25:** Which contituencies lost by the opposition could have been won by more than 500 votes?

In [88]:
united_could_win[(united_could_win.winning_party == 'BDP') &
                (united_could_win.votes_difference < -500)]

Unnamed: 0,bdp_votes,opposition_votes,opposition_win,votes_difference,winning_party
Francistown East,3818,4640,True,-822,BDP
Gaborone South,3872,7422,True,-3550,BDP
Kanye North,5726,10157,True,-4431,BDP
Lentsweletau-Mmopane,7170,9269,True,-2099,BDP
Maun East,6046,7410,True,-1364,BDP
Nata-Gweta,3424,5885,True,-2461,BDP
Ngami,7063,7817,True,-754,BDP
Selibe Phikwe East,3376,4931,True,-1555,BDP
Tati West,4510,5996,True,-1486,BDP


**Question 26:** Which constituencies lost by the opposition could have been won by less than 150 votes? These would been closely contested constituencies.

In [89]:
united_could_win[(united_could_win.winning_party == 'BDP') &
                (united_could_win.votes_difference > -100)&
                (united_could_win.votes_difference < 0)]

Unnamed: 0,bdp_votes,opposition_votes,opposition_win,votes_difference,winning_party
Bobonong,7350,7392,True,-42,BDP
Lobatse,5485,5530,True,-45,BDP


**Question 27:** Which constituencies won by the ruling party would have still been lost by less than 100 votes even with combined opposition votes?

In [90]:
united_could_win[(united_could_win.winning_party == 'BDP') &
                (united_could_win.votes_difference < 100)&
                (united_could_win.votes_difference > 0)]

Unnamed: 0,bdp_votes,opposition_votes,opposition_win,votes_difference,winning_party
Kgalagadi North,4648,4587,False,61,BDP
