## Finding outliers in elections results

In [1]:
import numpy as np, pandas as pd
pd.set_option('display.max_rows', 500)
df = pd.read_csv(r"elections_final.csv",encoding = "cp1255")


In [2]:
print("num of kalpis: {}".format(len(df)))

num of kalpis: 10765


### assign latin names to columns

In [3]:
tav = df.columns[16]
naz = df.columns[27]
meretz = df.columns[25]
likud = df.columns[24]
liberman = df.columns[23]
benet = df.columns[26]
nun_cols = df.columns[[27,28,29,31]].tolist()
emet = df.columns[7]
gimel = df.columns[8]
elementar_cols = df.columns.tolist()[:7]
general_cols = df.columns.tolist()[:7]+[tav,benet,naz,emet,gimel]
parties_cols = df.columns[7:]
sums=df.loc[:,parties_cols].sum()
mask = sums>50000
bigs = sums[mask].index.tolist()
smalls = sums[~mask].index.tolist()



### First try: whether the votes for bennet have been assigned as a mistake to the nearby columns "naz".
The condition: naz is bigger than ten and is also bigger than Bennet

In [4]:
df = pd.read_csv(r"elections_final.csv",encoding = "cp1255")
mask = (df[naz]>10)&(df[naz]>df[benet])
df.loc[mask,general_cols]


Unnamed: 0,שם ישוב,סמל ישוב,מספר קלפי,בזב,מצביעים,פסולים,כשרים,טב,נ,נז,אמת,ג
2146,ג'ולס,485,6.0,756,474,4,470,0,13,17,5,0
5593,מגאר,481,20.0,680,363,6,357,0,3,20,1,0
7257,עין הוד,74,1.0,496,369,0,369,0,5,11,58,1


#### Very few: 3 of 10422. The analysts states: "Only Kibutz and arab vilages. Interesting, but no potential to the new right"
#### lets do it again with the results of the first day after the election

In [5]:
df = pd.read_csv(r"elections_20190410.csv",encoding = "cp1255")
mask = (df[naz]>10)&(df[naz]>df[benet])
df.loc[mask,general_cols]
#df = pd.read_csv(r"elections_final.csv",encoding = "cp1255")

Unnamed: 0,שם ישוב,סמל ישוב,מספר קלפי,בזב,מצביעים,פסולים,כשרים,טב,נ,נז,אמת,ג
2139,ג'ולס,485,6.0,756,474,4,470,0,13,17,5,0
2195,גבעות בר,1344,1.0,550,437,0,437,3,0,37,30,1
5574,מגאר,481,20.0,680,363,6,357,0,3,20,1,0
5618,מגידו,586,1.0,621,469,1,468,0,4,14,69,2
6422,נצרת עילית,1061,47.0,465,338,1,337,8,3,11,9,2
6928,עין הוד,74,1.0,496,369,0,369,0,5,11,58,1
7090,עפולה,7700,22.0,657,475,0,475,25,0,17,2,14
9543,שדרות,1031,28.0,780,549,2,547,76,22,25,22,8


###### Afula and Givot Bar is probably a mistake (Sdom, not Helem). However, it has been fixed since 10/04/2019.

## Correlations -Second try
Lets utilize the linear correlation between "tav" and "bennet". In fact, lets look on the maximum correlation per party

In [6]:
df = pd.read_csv(r"elections_final.csv",encoding = "cp1255")
corr=df.loc[:,bigs].corr()
mask = corr!=1
max_corr_party =corr[mask].idxmax()
max_corr_val = corr[mask].max()
corr_maxis = pd.concat([max_corr_party,max_corr_val],axis=1)
corr_maxis.columns = ["max_corr_party","max_corr_val"]
corr_maxis

Unnamed: 0,max_corr_party,max_corr_val
אמת,פה,0.757783
ג,שס,0.561259
דעם,ום,0.629882
ום,דעם,0.629882
ז,מחל,0.584611
טב,נ,0.610961
כ,מחל,0.570481
ל,מחל,0.225134
מחל,ז,0.584611
מרצ,אמת,0.459741


#### o.k. Tav and New right are strongly correlated
#### Lets find places where tav is big and new right is small. It may be a mistake!

In [7]:
df = pd.read_csv(r"elections_final.csv",encoding = "cp1255")
mask = (df[tav]>30)&(df[benet]<0.15*df[tav])
pd.set_option('display.max_rows', 500)
print("num of suspected calpis: {}".format(mask.sum()))
df.loc[mask,general_cols]

num of suspected calpis: 151


Unnamed: 0,שם ישוב,סמל ישוב,מספר קלפי,בזב,מצביעים,פסולים,כשרים,טב,נ,נז,אמת,ג
420,אלון מורה,3579,1.0,499,384,1,383,316,13,0,0,3
421,אלון מורה,3579,2.0,417,349,0,349,273,26,0,0,1
443,אלעד,1309,1.0,776,642,3,639,94,8,0,0,109
444,אלעד,1309,2.0,553,464,5,459,72,8,0,1,82
445,אלעד,1309,3.0,790,649,1,648,48,5,0,0,112
447,אלעד,1309,5.0,724,642,2,640,62,8,0,0,226
448,אלעד,1309,6.0,476,405,1,404,70,6,0,1,155
449,אלעד,1309,7.0,520,427,5,422,107,9,0,2,69
451,אלעד,1309,9.0,787,676,9,667,47,7,0,0,285
453,אלעד,1309,11.0,680,572,2,570,62,2,0,0,111


##### 151 calpis! However, it is easy to see that most, probably all, are from neighbourhood of concentrations of ultra orthodox (Hardal, haredi or Habad). Dividing the calpis to places we get:

In [8]:
df.loc[mask,general_cols[0]].value_counts()

ירושלים             26
אלעד                14
ביתר עילית          12
צפת                 10
קרית מלאכי           9
כפר חב"ד             7
רחובות               5
קרית ארבע            4
בית אל               4
בת ים                4
מצפה רמון            4
מגדל העמק            4
לוד                  4
שדרות                3
בית שמש              3
קרית גת              2
כוכב יעקב            2
עמנואל               2
אלון מורה            2
ייט"ב                1
ברקת                 1
עמיחי                1
אריאל                1
חברון                1
רמת גן               1
כפר סבא              1
חולון                1
פני חבר              1
נווה                 1
טלמון                1
בני ברק              1
בני דקלים            1
ערד                  1
בני נצרים            1
יצהר                 1
באר שבע              1
יד בנימין            1
נצרת עילית           1
קרית אתא             1
נתניה                1
קרית יערים           1
נחליאל               1
כרם יבנה (ישיבה)     1
מעלות-תרשיח

#### Third try: lets look whether one of the small parties (not only naz), get a significant percentage of the votes

### create perecantage dataframe

In [9]:
total = df.loc[:,parties_cols].sum(axis=1)
fractions = (df.loc[:,parties_cols].values.T/total.values).T
fractions = pd.DataFrame(fractions,columns = parties_cols)
fractions_df = df.copy()
fractions_df.loc[:,parties_cols] = fractions.values

#### Looking for outliers: small party with high percentage 

In [10]:
mask =(fractions_df.loc[:,smalls].max(axis=1)>0.05)
argmaxis=fractions_df.loc[mask,smalls].T.idxmax()
maxis=fractions_df.loc[mask,smalls].T.max()
maxis2 = df.loc[mask,smalls].T.max()
pd.set_option('display.max_rows', 500)
pd.concat([df.loc[mask,elementar_cols],argmaxis,maxis,maxis2,df.loc[mask,:].iloc[:,24]],axis=1)

Unnamed: 0,שם ישוב,סמל ישוב,מספר קלפי,בזב,מצביעים,פסולים,כשרים,0,1,2,מחל
24,אבו עבדון (שבט),958,1.0,138,12,0,12,ףי,0.083333,1,0
414,אל-עריאן,1316,1.0,128,64,4,60,ק,0.066667,4,2
438,אלישיב,204,1.0,473,367,0,367,נץ,0.054496,20,162
1375,בועיינה-נוג'ידאת,482,2.0,681,370,6,364,נץ,0.052198,19,0
1376,בועיינה-נוג'ידאת,482,3.0,690,339,6,333,נץ,0.075075,25,14
1378,בועיינה-נוג'ידאת,482,5.0,701,333,4,329,נץ,0.079027,26,6
1874,בענה,483,2.0,702,468,9,459,ר,0.165577,76,0
2123,ג'דיידה-מכר,1292,3.0,710,315,2,313,צק,0.073482,23,1
2127,ג'דיידה-מכר,1292,7.0,721,336,1,335,ר,0.083582,28,1
2135,ג'דיידה-מכר,1292,15.0,723,300,3,297,ר,0.053872,16,5


#### These are not the places of new right. Lets look at the interesting calpi with index 6058

In [11]:
df.iloc[6058,:]

שם ישוב      מעטפות חיצוניות
סמל ישוב               99999
מספר קלפי                168
בזב                        0
מצביעים                  450
פסולים                    10
כשרים                    440
אמת                       17
ג                         31
דעם                       11
ום                         8
ז                         17
זי                         0
זך                         1
זנ                         0
זץ                         1
טב                        24
י                          0
יז                         0
ין                         0
יץ                         0
כ                         16
ךק                         0
ל                          7
מחל                      141
מרצ                        9
נ                         17
נז                         0
נך                         0
נץ                         0
נר                         9
ן                          1
ןך                        91
ןנ                         0
פה            

#### BlueWhite is zero and a strange family whose letters are final Caf and final Nun got 91 votes. Thats a mistake, but the mistake is not important. Blue white cannot achieve or lose mandat in any case.