# GameZone

### Console Wars

### Author: Wenyi Xu

Discuss the epic console war amongst **Sony**, **Microsoft** and **Nintendo**.

Scape to get some datasets for analysis and visualization to show the sales and other statistics for different platforms.

## Backgroud Information

### Playstation Release timeline

* PS1 Release Date	Saturday - December 3, 1994

* PS2 Release Date	Saturday - March 4, 2000

* PS3 Release Date	Saturday - November 11, 2006

* PS4 Release Date	Friday - Nov 15, 2013 

* PS4 Pro Release	Thursday - Nov 10, 2016 

### Playstation Release timeline

* Xbox Release Date	- November 15, 2001

* Xbox 360 Release Date	- November 22, 2005

* Xbox One Release Date	- November 22, 2013

* Xbox One X Release Date - June 13, 2016

### Nintendo Platforms release timeline

* NES Release Date - July 15, 1983

* Nintendo 64 Release Date - June 23, 1996

* GameCube Release Date - September 14, 2001

* Wii Release Date - November 19, 2006

* Wii u Release Date - November 18, 2012

* Switch Release Date - March 3, 2017


## Part 0: Scrape VGChartz

Scrape VGChartz to get useful datasets about game and hardware sales for different platforms.

I can only find sales data of home video games consoles on VGChartz starting from 25th Nov 2006 (PS3 & XBOX360 & Wii), so the data of PS2 and other previous consoles need to be extracted from other sources.

In [31]:
%%html
<style>
    table {
        display: inline-block
    }
</style>

In [43]:
import urllib3
from bs4 import BeautifulSoup
from datetime import date
from datetime import datetime
import math

The VGChartz important pages that contain data we need

In [32]:
# Main page
vgchartz_main_url = 'http://www.vgchartz.com/'
# Gloabl weekly sales
# http://www.vgchartz.com/weekly/[date_id]/Global/
# date_id: start from 39047 (25th Nov 2006) to 43058 (18th Nov 2017)
date_id_start = 39047
date_start = datetime(2006, 11, 25)
date_id_end = 43058
date_end = datetime(2017, 11, 18)

date_id = 39047

vgchartz_weekly_url_head = 'http://www.vgchartz.com/weekly/'
vgchartz_weekly_url_tail = '/Global/'
vgchartz_weekly_url = vgchartz_weekly_url_head + str(date_id) + vgchartz_weekly_url_tail

Check if the date diff correspond to the date_id diff

In [27]:
print(datetime(2017, 11, 18) - date_start)
print(43058 - 39047)

4011 days, 0:00:00
4011


### Function get_weekly_url

We have

* Start date: date_start (25th Nov 2006)

* Start date id: date_id_start (39047)

* End date: date_end (18th Nov 2017)

* End date id: date_id_end (43058)

When given a random date between 25th Nov 2006 & 18th Nov 2017, we want to first get its corresponding week_id, and then generate its vgchartz_weekly_url


In [47]:
# date: in datetime format
def get_weekly_url(date):
    
    if type(date) != type(date_end):
        return 'Date must be type datetime'
    if date < date_start or date > date_end:
        return 'Date must be from 25th Nov 2006 to 18th Nov 2017'
    
    date_diff = date - date_start
    day_diff = date_diff.days
    # Round up
    week_diff = math.ceil(day_diff/7.0)
    date_id = date_id_start + week_diff*7
    
    vgchartz_weekly_url = vgchartz_weekly_url_head + str(date_id) + vgchartz_weekly_url_tail
    return vgchartz_weekly_url

In [52]:
# Test the function get_weekly_url
print(get_weekly_url('25th Nov 2006') == 'Date must be type datetime')
print(get_weekly_url(datetime(2000, 11, 20)) == 'Date must be from 25th Nov 2006 to 18th Nov 2017')
print(get_weekly_url(datetime(2017, 11, 20)) == 'Date must be from 25th Nov 2006 to 18th Nov 2017')
print(get_weekly_url(datetime(2007, 2, 17)) == 'http://www.vgchartz.com/weekly/39131/Global/')

True
True
True
True


Create the PoolManager

In [12]:
http = urllib3.PoolManager()

Make the request 

and query the pages

In [65]:
vgchartz_main_response = http.request('GET', vgchartz_main_url)
vgchartz_main_soup = BeautifulSoup(response.data, "lxml")

vgchartz_week_response = http.request('GET', vgchartz_weekly_url)
vgchartz_week_soup = BeautifulSoup(vgchartz_week_response.data, "lxml")

Now we have **soup**: the **HTML** of the VGCHartz pages.

Now try to extract some useful data.

There are 2 useful tables in weekly sales pages: **Global Hardware by Platform** & **Global Software by Platform**

Example of the **Global Hardware by Platform** table:

|Platform |	Weekly (change) |	Total |
|---------|-----------------|--------|
|DS	| 905,597	(+88%) |	29,319,098 |
|Wii |	529,658	(N/A) |	529,658 |
|X360 |	361,561	(+87%) |	5,943,800 |
|PSP |	352,884	(+87%) |	17,124,390 |
|PS3 |	103,130	(-50%) |	394,937 |


In [70]:
tables = vgchartz_week_soup.find_all('h2', attrs={'class': 'heading'})

In [71]:
print(tables)

[<h2 class="heading">Global Hardware by Platform</h2>, <h2 class="heading">Global Software by Platform</h2>]


## Part 1: Analyze the Weekly Sales