### Video Games Dataset: EDA
#### 1. Describe Dataset
- **Who:** The data was acquired from Kaggle and supplied by the user Gregory Smith (https://www.kaggle.com/gregorut/videogamesales). The data was scraped from www.vgchartz.com. 
- **What:** The dataset contains a list of video games with sales greater than 100,000 from 1980 to 2011. It contains information such as the platform the game was made available, year of release, genre, publisher, sales in NA, sales in JP, sales in EU, sales in the rest of the world and global sales (total). The data set also includes the rank of games in terms of overall sales. **NOTE: Sales are in millions**
- **When:** The data set was last updated 4 years ago but contains games released from 1980 to seemingly 2017. 
- **Why:** The video game industry is a very competitive yet profitable industry. While big companies with large amounts of resources have an edge over smaller companies, we have recently seen many small companies finding huge success. Not only in game creation but in the case of streamers for example, playing a game before it becomes mainstream might give you an edge against bigger name streamers. With this data set, we are able to gain insight into general idea such as performance of companies, most popular titles and genres. We are also able to dive deeper and look at changing genre popularities over time, regional preference in game genres/platforms, upcoming developer etc.  
- **How:** The data set was scraped from the www.vgzchartz.com website using BeautifulSoup. The scraping script can be found here (https://github.com/GregorUT/vgchartzScrape)

#### 2. Load Dataset

In [48]:
import pandas as pd
import numpy as np
import altair as alt
from altair_saver import save
alt.renderers.enable('mimetype')
alt.data_transformers.enable('data_server')

game = pd.read_csv("vgsales.csv")
game.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16598 entries, 0 to 16597
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Rank          16598 non-null  int64  
 1   Name          16598 non-null  object 
 2   Platform      16598 non-null  object 
 3   Year          16327 non-null  float64
 4   Genre         16598 non-null  object 
 5   Publisher     16540 non-null  object 
 6   NA_Sales      16598 non-null  float64
 7   EU_Sales      16598 non-null  float64
 8   JP_Sales      16598 non-null  float64
 9   Other_Sales   16598 non-null  float64
 10  Global_Sales  16598 non-null  float64
dtypes: float64(6), int64(1), object(4)
memory usage: 1.4+ MB


#### 3. Explore Dataset

In [49]:
print("\nPlatform:\n",game.Platform.unique(),"\nCount: ",game.Platform.nunique())
print("\nYear\n",game.Year.unique(),"\nCount: ",game.Year.nunique())
print("\nGenre\n",game.Genre.unique(),"\nCount: ",game.Genre.nunique())
print("\nPublishers\n",game.Publisher.unique()[0:15],"\nCount: ",game.Publisher.nunique())

print(game.sort_values("NA_Sales",ascending=False).head(5).iloc[:,0:5])
print(game.sort_values("EU_Sales",ascending=False).head(5).iloc[:,0:5])
print(game.sort_values("JP_Sales",ascending=False).head(5).iloc[:,0:5])
print(game.sort_values("Global_Sales",ascending=False).head(5).iloc[:,0:5])


Platform:
 ['Wii' 'NES' 'GB' 'DS' 'X360' 'PS3' 'PS2' 'SNES' 'GBA' '3DS' 'PS4' 'N64'
 'PS' 'XB' 'PC' '2600' 'PSP' 'XOne' 'GC' 'WiiU' 'GEN' 'DC' 'PSV' 'SAT'
 'SCD' 'WS' 'NG' 'TG16' '3DO' 'GG' 'PCFX'] 
Count:  31

Year
 [2006. 1985. 2008. 2009. 1996. 1989. 1984. 2005. 1999. 2007. 2010. 2013.
 2004. 1990. 1988. 2002. 2001. 2011. 1998. 2015. 2012. 2014. 1992. 1997.
 1993. 1994. 1982. 2003. 1986. 2000.   nan 1995. 2016. 1991. 1981. 1987.
 1980. 1983. 2020. 2017.] 
Count:  39

Genre
 ['Sports' 'Platform' 'Racing' 'Role-Playing' 'Puzzle' 'Misc' 'Shooter'
 'Simulation' 'Action' 'Fighting' 'Adventure' 'Strategy'] 
Count:  12

Publishers
 ['Nintendo' 'Microsoft Game Studios' 'Take-Two Interactive'
 'Sony Computer Entertainment' 'Activision' 'Ubisoft' 'Bethesda Softworks'
 'Electronic Arts' 'Sega' 'SquareSoft' 'Atari' '505 Games' 'Capcom'
 'GT Interactive' 'Konami Digital Entertainment'] 
Count:  578
   Rank               Name Platform    Year     Genre
0     1         Wii Sports      Wii  2006.0

#### 4. Initial thoughts?
- We have null values in Year and Publisher
- Year is a float, we could probably turn it into an int to make it prettier
- We have 31 unique Platforms
- We have 39 unique years (one being nan)
- We have 12 unique genres
- We have 578 unique publishers

#### 5. Wrangling

#### 6. Research Questions/Visualization+Analysis

#### 7. Future Studies