# Battle of the Neighborhoods week 2

## Table of contents

* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a id="introduction"></a>


The objective of this project is to find the best location to move into the Abingdon-on-Thames area.
Currently looking to move into the area of Abingdon-on-Thames and would like to take advantage of this project to use it in my favour to help me identify the best and the worst areas to live.

First step is to choose the safest borough by analysing **police crime data**. Also I need to understand the boroughs so would like to get the **Lower Layer Super Output Areas (LSOA)** for the area of interest. LSOA are a geographic hierarchy designed to improve the reporting of small area statistics in England and Wales.  
Finally, then after having defined the area, get data from **FourSquare** to help us choose the best area for supermarket, leisure centres, good schools, parks and restaurants, etc.
 
Using data science tools, will help us analyse data and focus on the safest borough and explore its neighborhoods and the common venues in each neighborhood.
 
The success criteria of the project will be that after having analysed such factors, we would then be able to make the best choice for the family.

## Data <a id="data"></a>

After having defined our problem, below are the factors that will help us make our decission:

 * finding the safest area using crime data statistics
 * finding the neaby venues around the preferred areas
 * choosing the right neighbourhood within the borough
 
 We will be using the geographical coordinates of Abingdon to plot neighbourhoods in a borough that is safe and in the city's vicinity, and finally cluster the neighborhoods, plot the crime data, get venues and present our findings.
 
 Following data sources will be needed to get the required information:
 
 - [**Step 1**: Using a real world data set from Police Data UK, get crime data for last year](#step1):  A dataset consisting of the crime statistics of each Neighbourhoof in Thames Valley along with type of crime.
 
 - [**Step 2**: Gathering LSOA and UK boundary information from Ordonance Survey of the list of boroughs around for Oxfordshire.](#step2): Borough information will be used to map the crime data data and identify a the boroughs that are best and worst.
 
 - [**Step 3**: Creating a dataset of the boroughs's location and the crime data.](#step3): This data will be consolidated and we will be able to explore more about the neighbourhood and the crime around it by plotting it on maps using Folium and perform exploratory data analysis.
 
 - [**Step 4**: Adding the dataset from FourSquare with the most common venues and the respective Neighbourhood along with co-ordinates.](#step4): This data will be fetched using Four Square API to explore the neighbourhood venues and to apply machine learning algorithm to cluster the neighbourhoods and present the findings by plotting it on maps using Folium.

### **Step 1:** Using a real world data set from Police Data UK, get crime data for last year <a id="step1"></a>


####  Thames Valley Crime Report 

Properties of the Crime Report

*   CRIME ID - Crime type
*   MONTH - Recorded month
*   REPORTED BY - authority who reported it
*   FALLS WITHIN - authority responsible
*   LONGITUDE - GPS longtitude
*   LATITUDE - GPS latitude
*   LOCATION - where was the crime
*   LSOA code - borough code where it falls 
*   LSOA name - borough name where it falls
*   CRIME TYPE - type of crime 

Data set URL: https://data.police.uk

### Import all the libraries that are needed beforehand

In [4]:
import time
import numpy as np  # data vectors
import pandas as pd # data analysis
from collections import Counter
from pandas.io.json import json_normalize # transform json into pandas dataframe
import matplotlib.cm as cm #plotting
import matplotlib.colors as colors
import matplotlib.pyplot as py
import json
import requests
from geopy.geocoders import Nominatim #get lat and long
from sklearn.cluster import KMeans #clustering
import folium #visualise map
import os
import geopandas as gpd
import earthpy as et
from folium.plugins import HeatMap
import geopandas
import matplotlib.pyplot as plt
import matplotlib.lines as mlines
from matplotlib.colors import ListedColormap
from shapely.geometry import box # Load the box module from shapely to create box objects
from shapely.geometry import shape
import earthpy as et
import seaborn as sns

## Reading from the Dataset

Due to the amount of data, for this project, I limited the data for one year to include the year of 2019.
Get the CSV file for the Crime data from the Police database data.police.uk

In [5]:
CSV = pd.read_csv("ThamesValleyCRIMEDATA.csv")
CSV.head()

Unnamed: 0,Crime ID,Month,Reported by,Falls within,Longitude,Latitude,Location,LSOA code,LSOA name,Crime type,Count
0,3d294010dbca88ade8b95964f03d79dae86cbece156e32...,2019-08,Thames Valley Police,Thames Valley Police,-1.327884,51.753488,On or near Eynsham Road,E01028708,Vale of White Horse 001A,Violence and sexual offences,1
1,dbcbf4f976221e1e202c7019f2803f9ba80a8e1c8881d9...,2019-08,Thames Valley Police,Thames Valley Police,-1.308601,51.748808,On or near Third Acre Rise,E01028709,Vale of White Horse 001B,Burglary,1
2,95569239a93eb375ef1a30f147975303c0aaa322755be2...,2019-08,Thames Valley Police,Thames Valley Police,-1.312464,51.750027,On or near Grange Court,E01028709,Vale of White Horse 001B,Violence and sexual offences,1
3,cdb82cca5ab21305455295afad2e04a4f2b4b2066d2709...,2019-08,Thames Valley Police,Thames Valley Police,-1.307831,51.741359,On or near Barn Close,E01028710,Vale of White Horse 001C,Other theft,1
4,f94e1c275753292c47a748c7241c75beb667817009ef96...,2019-08,Thames Valley Police,Thames Valley Police,-1.30848,51.742487,On or near Delamare Way,E01028710,Vale of White Horse 001C,Public order,1


### Total Crimes in different Boroughs

In [6]:
CSV['Location'].value_counts()

On or near Supermarket               267
On or near Police Station            235
On or near Parking Area              187
On or near Petrol Station            101
On or near Sports/Recreation Area     90
                                    ... 
On or near Bakery Lane                 1
On or near Corn Avill Close            1
On or near Mably Way                   1
On or near Conifer Close               1
On or near Woolstone Road              1
Name: Location, Length: 1069, dtype: int64

### **Part 2:** Gathering LSOA and UK boundary information from Ordonance Survey of the list of boroughs around for Oxfordshire<a name="step2"></a>

As part of data set by borough, so we will get first get the shapefile for the UK Boundary from the Ordonance Survey.

Get the shapefile for the UK Boundary from Ordnance Survey website

In [8]:
data = gpd.read_file('Sectors.shp')
ox_index = data[data.name == "OX"].index
ox_geom = data.loc[ox_index,'geometry']
ox_geom.head()

GeoSeries([], Name: geometry, dtype: geometry)

In [10]:
shapefile = gpd.read_file("Sectors.shp")
shapefile.head()

Unnamed: 0,name,geometry
0,AB10 1,"POLYGON ((-2.11645 57.14656, -2.11655 57.14663..."
1,AB10 6,"MULTIPOLYGON (((-2.12239 57.12887, -2.12279 57..."
2,AB10 7,"POLYGON ((-2.12239 57.12887, -2.12119 57.12972..."
3,AB11 5,"POLYGON ((-2.05528 57.14547, -2.05841 57.14103..."
4,AB11 6,"POLYGON ((-2.09818 57.13769, -2.09803 57.13852..."


Get the coordinates for our desired location and save latitude and longitude

In [11]:
address = 'Abingdon-on-Thames, United Kingdom'
geolocator = Nominatim(user_agent="abi_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Abingdon-on-Thames, United Kingdom are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Abingdon-on-Thames, United Kingdom are 51.6714842, -1.2779715.


## Methodology<a name="methodology"></a>

- [**Exploratory Data Analysis**:](#eda) Visualise the crime repots in different Oxfordshire boroughs to idenity the safest borough and normalise the neighborhoods of that borough. We will Use the resulting data and find 10 most common venues in each neighborhood.

Filtered and grouped for LSOA to add the number of crimes within each area and obtained a longitude and latitude

In [13]:
Grouped = pd.DataFrame({'Value' : CSV.groupby( ['LSOA code']).size()}).reset_index()
#need them as numpy
temp1 = Grouped.to_numpy()
newdf = pd.DataFrame(data=temp1, index=None, columns=["LSOA code", "Value"])
#merge right to get back LSOA and one 'central' lat/long so first drop duplicates 
df3 = CSV.drop_duplicates(subset='LSOA code', keep="first")
Temp = newdf.merge(df3, left_on='LSOA code', right_on='LSOA code', how='right')
#Temp.drop_duplicates(subset="LSOA code",keep="first", inplace=True)
Temp['Latitude'] = Temp['Latitude'].astype(float)
Temp['Longitude'] = Temp['Longitude'].astype(float)
Temp['Value'] = Temp['Value'].astype(int)
Temp.head()

Unnamed: 0,LSOA code,Value,Crime ID,Month,Reported by,Falls within,Longitude,Latitude,Location,LSOA name,Crime type,Count
0,E01028688,34,,2019-08,Thames Valley Police,Thames Valley Police,-1.269892,51.673533,On or near Curtis Avenue,Vale of White Horse 005A,Anti-social behaviour,1
1,E01028689,108,b4ebf9704f6345bf13bae104f70440a696b3cdca7efa7f...,2019-08,Thames Valley Police,Thames Valley Police,-1.260667,51.675202,On or near Nyatt Road,Vale of White Horse 005B,Bicycle theft,1
2,E01028690,100,,2019-08,Thames Valley Police,Thames Valley Police,-1.284287,51.661178,On or near Townsend,Vale of White Horse 008A,Anti-social behaviour,1
3,E01028691,102,17d8923859afa39df21101abe4be0a7cea4c60e3d5e588...,2019-08,Thames Valley Police,Thames Valley Police,-1.28992,51.659692,On or near Pudsey Close,Vale of White Horse 008B,Violence and sexual offences,1
4,E01028692,130,,2019-08,Thames Valley Police,Thames Valley Police,-1.290685,51.663491,On or near Saxton Road,Vale of White Horse 008C,Anti-social behaviour,1


In [14]:
#Making a simple table for values as with the Temp the folium wasn't working
Mapme = Temp[['Value','Latitude','Longitude']].copy()