<a id='top_of_doc'></a>

# <a href="http://wiki.openstreetmap.org/wiki/Main_Page">OpenStreetMap</a> Data Case
#### Completed By: Trenton J. McKinney
#### Date: 2017/08/10
***

### Table of Contents
***
* <a href="#osm_map_area">OSM Map Area</a>
* <a href="#osm_file_issues">Corrected OSM File Issues</a>
* <a href="#file_db_overview">File & Database Overview</a>
* <a href="#db_exploration">Database Exploration</a>
* <a href="#int_notes">Interesting Explorations</a>
* <a href="#other_ideas">Other Ideas About the Dataset</a>
* <a href="#conclusion">Conclusion</a>

<a href="#top_of_doc">Back to Top</a>
<a id='osm_map_area'></a>

## OSM Map Area
***
Portland OR, United States (Portland Metro Area)

* <a href="https://mapzen.com/data/metro-extracts/metro/portland_oregon/">Portland at Mapzen</a>

I live within and am interested in determining what type(s) of interesting information can be gleaned from the Portland Metropolitan OSM file.  The map below depicts the area encompassed by the OSM file (black dots) and each purple dot represents the unique zip codes discovered within the ways_tags and nodes_tags.

#### Black dots outline the area of the OSM data & Purple dots are postcodes from ways_tags and nodes_tags

<a href="https://github.com/trenton3983/UDACITY/blob/master/01_Data_Analyst/03_Data_Wrangling/Project%20-%20Data%20Wrangling/portland_osm_map.ipynb">Notebook to generate map</a>

<img style="float: left;" src="https://raw.githubusercontent.com/trenton3983/UDACITY/master/01_Data_Analyst/03_Data_Wrangling/Images/project_zips_on_map.PNG">

<a href="#top_of_doc">Back To Top</a>
<a id='osm_file_issues'></a>

## Corrected OSM File Issues
***

* <a href="#comp_city_name">Before / After Comparison of Corrected City Names</a>
* <a href="#comp_zips">Before / After Comparison of Corrected Zip Codes</a>
* <a href="#st_names">Before / After Comparison of Street Names</a>
* <a href="#add_clean">Additional Cleaning</a>

<a id='comp_city_name'></a>

### Before / After Comparison of Corrected City Names

* <a href="https://github.com/trenton3983/UDACITY/blob/master/01_Data_Analyst/03_Data_Wrangling/Project%20-%20Data%20Wrangling/project_fix_city_name.py">project_fix_city_name.py</a>
* The table shows the types of errors associated with the city names and the result of correction.

```python
def fix_city_name(name, mapping=MAPPING):
    """Splits tag.attrib['v'] and checks each string against MAPPING.
    If there's a value match, the string is changed to the new value."""

    if name in mapping:
        name = name.replace(name, mapping[name])
    return name
```

<img style="float: left;" src="https://raw.githubusercontent.com/trenton3983/UDACITY/master/01_Data_Analyst/03_Data_Wrangling/Images/project_fixed_city_names.PNG">

<a href="#osm_file_issues">Back to Corrected OSM File Issues</a>
<a id='comp_zips'></a>

### Before / After Comparison of Corrected Zip Codes

* <a href="https://github.com/trenton3983/UDACITY/blob/master/01_Data_Analyst/03_Data_Wrangling/Project%20-%20Data%20Wrangling/project_fix_zip_code.py">project_fix_zip_code.py</a>
* The table shows the types of errors associated with the zip codes and the result of correction.

```python
def fix_zip_codes(zip_codes):
    """Expects a string.  Will search the string for a consecutive 5 digits and
    return the string as a zip code or leave blank if there's no match."""

    zip_code = re.compile('\d{5}')
    zip_code = zip_code.findall(zip_codes)

    if zip_code:
        return zip_code[0]
    else:
        return ''
```

<img style="float: left;" src="https://raw.githubusercontent.com/trenton3983/UDACITY/master/01_Data_Analyst/03_Data_Wrangling/Images/project_fixed_zips.PNG">

<a href="#osm_file_issues">Back to Corrected OSM File Issues</a>
<a id='st_names'></a>

### Before / After Comparison of Street Names

* <a href="https://github.com/trenton3983/UDACITY/blob/master/01_Data_Analyst/03_Data_Wrangling/Project%20-%20Data%20Wrangling/audit_street_names.py">audit_street_names.py</a>
* <a href="https://github.com/trenton3983/UDACITY/blob/master/01_Data_Analyst/03_Data_Wrangling/Project%20-%20Data%20Wrangling/project_fix_street_name.py">project_fix_street_name.py</a>
* The table shows a non-exhaustive sample of street name corrections and a link to the full list of corrections is included below.

```python
def fix_street_name(name, mapping=MAPPING):
    """Splits tag.attrib['v'] and checks each string against MAPPING.
    If there's a value match, the string is changed to the new value."""
    name = name.strip()
    x = name.split()
    for y in x:
        if y in mapping:
            name = name.replace(y, mapping[y])
    return name
```

* <a href="https://github.com/trenton3983/UDACITY/blob/master/01_Data_Analyst/03_Data_Wrangling/Project%20-%20Data%20Wrangling/project_abbreviated_street_names.py">List of Street Types - Excluding Expected</a>
* <a href="https://github.com/trenton3983/UDACITY/blob/master/01_Data_Analyst/03_Data_Wrangling/Project%20-%20Data%20Wrangling/audited_street_names_full.txt">Full list of corrected street names</a>

#### Sample of Corrected Street Names
<img style="float: left;" src="https://raw.githubusercontent.com/trenton3983/UDACITY/master/01_Data_Analyst/03_Data_Wrangling/Images/project_fixed_street_names.PNG">

<a href="#osm_file_issues">Back to Corrected OSM File Issues</a>
<a id='add_clean'></a>

### Additional Cleaning

```sql
SELECT value
FROM (SELECT * FROM nodes_tags UNION ALL
	SELECT * FROM ways_tags) tags
WHERE key='phone'
GROUP BY value
```

The table below shows the various formats phone numbers come in.  They should be corrected to a standard format for consistency.

<img style="float: left;" src="https://raw.githubusercontent.com/trenton3983/UDACITY/master/01_Data_Analyst/03_Data_Wrangling/Images/project_phone_numbers.PNG">

<a href="#top_of_doc">Back to Top</a>
<a id='file_db_overview'></a>

## File & Database Overview
***
* This section contains basic statistics about the Portland Metro OSM dataset and the SQLite queries used.
* <a href="https://github.com/trenton3983/UDACITY/raw/master/01_Data_Analyst/03_Data_Wrangling/Project%20-%20Data%20Wrangling/portland_oregon_95_sample.7z">Sample OSM</a>
* <a href="https://1drv.ms/f/s!As2Kq3LjVaCGaUsnX6Dftgp0eb8">Full Fixed DB - link will expire 2017/10/08</a>

### File Stats

<img style="float: left;" src="https://raw.githubusercontent.com/trenton3983/UDACITY/master/01_Data_Analyst/03_Data_Wrangling/Images/project_filename_stats.PNG">

### Number of Node

```sql
SELECT COUNT(*) FROM nodes;
```

6,627,751

### Number of Ways

```sql
SELECT COUNT(*) FROM ways;
```

865,354

### Number of Distinct Contributers

```sql
SELECT COUNT(DISTINCT(users.uid))
FROM (SELECT uid FROM nodes UNION ALL
	SELECT uid FROM ways) users;
```

1,392

<a href="#top_of_doc">Back To Top</a>
<a id='db_exploration'></a>

## Database Exploration
***

* This section highlights the basic topics of exploration from the dataset and the associated SQLite queries.

<a id='city_name_count'></a>

### City Name Count

* The OSM encompasses 74 cities.

```sql
SELECT tags.value, COUNT(*) as count
FROM (SELECT * FROM nodes_tags UNION ALL
	SELECT * FROM ways_tags) tags
WHERE tags.key LIKE 'city'
GROUP BY tags.value
ORDER BY count DESC;
```

<img style="float: left;" src="https://raw.githubusercontent.com/trenton3983/UDACITY/master/01_Data_Analyst/03_Data_Wrangling/Images/project_city_count.PNG">

<a id='zip_code_count'></a>

### Zip Code Count

* The OSM encompasses 116 zip codes.

```sql
SELECT tags.value, COUNT(*) as count
FROM (SELECT * FROM nodes_tags
	UNION ALL
		SELECT * FROM ways_tags) tags
WHERE tags.key='postcode'
GROUP BY tags.value
ORDER BY count DESC;
```

<img style="float: left;" src="https://raw.githubusercontent.com/trenton3983/UDACITY/master/01_Data_Analyst/03_Data_Wrangling/Images/project_zip_code_count.PNG">

<a id='top_contributers'></a>

### Top 10 Contributers

* Total user contributions 7,493,105 by 1,392 users.
* The top 2 contributers constitute %51.5 of the entries and the top 11, %88.7.

```sql
SELECT contrib.user, COUNT(*) as count
FROM (SELECT user FROM nodes
	UNION ALL SELECT user FROM ways) contrib
GROUP BY contrib.user
ORDER BY count DESC
LIMIT 10;
```
<img style="float: left;" src="https://raw.githubusercontent.com/trenton3983/UDACITY/master/01_Data_Analyst/03_Data_Wrangling/Images/project_top_contributers.PNG">


<a href="#top_of_doc">Back To Top</a>
<a id='int_notes'></a>

## Interesting Explorations
***

* Delving into the data shows how much Portland appreciates parking, biking and coffee.  Apparently we like swimming too, eventhough it's only sunny for 3 months of the year.

### Top Amenities

```sql
SELECT tags.value, COUNT(*) as count
FROM (SELECT * FROM nodes_tags UNION ALL
	SELECT * FROM ways_tags) tags
WHERE tags.key='amenity'
GROUP BY tags.value
ORDER BY count DESC;
```
<img style="float: left;" src="https://raw.githubusercontent.com/trenton3983/UDACITY/master/01_Data_Analyst/03_Data_Wrangling/Images/project_amenity.PNG">


### Top Cuisine

```sql
SELECT value, COUNT(*) as count
FROM (SELECT * FROM nodes_tags UNION ALL
	SELECT * FROM ways_tags) tags
WHERE key='cuisine'
GROUP BY value
ORDER BY count DESC;
```
<img style="float: left;" src="https://raw.githubusercontent.com/trenton3983/UDACITY/master/01_Data_Analyst/03_Data_Wrangling/Images/project_cuisines.PNG">

### Sports Facilities

```sql
SELECT value, COUNT(*) as count
FROM (SELECT * FROM nodes_tags UNION ALL
	SELECT * FROM ways_tags) tags
WHERE key='sport'
GROUP BY value
ORDER BY count DESC;
```
<img style="float: left;" src="https://raw.githubusercontent.com/trenton3983/UDACITY/master/01_Data_Analyst/03_Data_Wrangling/Images/project_sports.PNG">


<a href="#top_of_doc">Back To Top</a>
<a id='other_ideas'></a>

## Other Ideas About the Dataset
***



#### Improving the Dataset
* Increase the number of contributors, partiularly in rural or less frequented locations.  We can see, based upon <a href="#top_contributers">Top 10 Contributers</a>, most of the data comes from the top 11 users and from <a href="#city_name_count">City Name Count</a> we can see that of the 74 citys in the dataset, the vast majority of the data is for Portland and that some of the smaller cities only have 1 count.  The primary idea behind OSM "... is a map of the world, created by people like you and free to use under an open license."  I had never heard of OSM prior to this project requirement, so some type of local outreach like <a href="https://www.meetup.com/OpenStreetMap-Portland/">Meetup: OpenStreetMap Portland</a>, but in other communities might increase the user base.
* Another idea for improving OSM is to import large datasets from other applications with a large number of users and geospatial data such as Google or Apple Maps or Pokemon Go to name a few.

#### Benefits:
* The single most obvious benefit is more users equates to more data. 

#### Potential Issues
* The main issue with attracting more users is probably the process of reaching people that may be interested.
    * Meetups are mostly free, but the volume is low.
    * People have a tendency to ignore website ads
    * Commercials cost money
* Once a potential new user is found, there are addition roadblocks
    * Monetary constraints with <a href="http://wiki.openstreetmap.org/wiki/Recording_GPS_tracks">GPS equipment</a> acquisition
    * Personal time contraints
    * Technical hurdles:
        * <a href="http://wiki.openstreetmap.org/wiki/How_to_contribute">How to Contribute</a>
        * <a href="http://wiki.openstreetmap.org/wiki/Contribute_map_data">Contribute Map Data</a>
* Large data imports from outside sources:
    * Goes against the idea of a community based map
    * "We are only interested in 'free' data. We must be able to release the data with our OpenStreetMap License"
    * There are additional technical hurdles related to importing data
        * <a href="http://wiki.openstreetmap.org/wiki/Import/Guidelines">OSM Import Guidelines</a>
        * The <a href="http://wiki.openstreetmap.org/wiki/TIGER">Tiger Import</a> had to be spread over several months to prevent overloading the OSM servers
        
#### If You're Interested in Contributing to OpenStreetMaps
* <a href="http://wiki.openstreetmap.org/wiki/Beginners%27_guide">Beginner's Guide</a>


<a href="#top_of_doc">Back To Top</a>
<a id='conclusion'></a>

## Conclusion
***

Based upon the collected data, as shown in <a href="#osm_file_issues">Corrected OSM File Issues</a>, there are a relatively small number of issues.  Specifically, only 40 city names and 50 zip codes required standardization.  Additionally, fewer that 240 street names were transformed from short form to long form.

As mentioned in <a href="#other_ideas">Other Ideas About the Dataset</a>, the Portland data is very thorough, but the more rural communities surrounding Portland would benefit from more users and data.  Bringing awareness of the OSM project and its benefits in terms of data availability to potential new users seems to be an intergral component to the continued success of OSM.