# 第三次作业：Video Game Sales 电子游戏销售分析
## 相关信息

本代码是北京理工大学计算机学院 2021 数据挖掘课程的课程作业。
+ 项目主页为：[DataMining-Course-of-BIT](https://github.com/tenkeyseven/DataMining-Course-of-BIT)
+ 本次作业详细说明为：[Assignments 2](https://github.com/tenkeyseven/DataMining-Course-of-BIT/tree/main/Assignments-2)

## 选用数据集：
+ [Video Game Sales](https://www.kaggle.com/gregorut/videogamesales): Analyze sales data from more than 16,500 games.

## 处理代码：
+ 对 [Video Game Sales](https://www.kaggle.com/gregorut/videogamesales) 数据集，处理代码为：[video-game-sales.ipynb](./video-game-sales.ipynb)

## 作业要求：
+ 电子游戏市场分析：受欢迎的游戏、类型、发布平台、发行人等
+ 预测每年电子游戏销售额
+ 可视化应用：如何完整清晰地展示这个销售故事

## 实验过程

### 1.数据读取


In [2]:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import tqdm
from rich.progress import track
from sklearn.linear_model import LinearRegression
import numpy as np
import warnings
warnings.filterwarnings('ignore')

In [3]:
data = pd.read_csv('../Datasets/vgsales.csv')
print('属性类别数:', len(data.columns))
print('总行数:', len(data))
print('示例数据:')
data.head(5)

属性类别数: 11
总行数: 16598
示例数据:


Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


### 2.数据清洗

In [6]:
data.isnull().sum()[data.isnull().sum()!=0]


Year         271
Publisher     58
dtype: int64

In [7]:
data = data.dropna(how='any')

### 3.1 查看最受欢迎的游戏排行

In [9]:
data_rank = data[['Rank', 'Name']].sort_values(by='Rank')
data_rank.head(10)

Unnamed: 0,Rank,Name
0,1,Wii Sports
1,2,Super Mario Bros.
2,3,Mario Kart Wii
3,4,Wii Sports Resort
4,5,Pokemon Red/Pokemon Blue
5,6,Tetris
6,7,New Super Mario Bros.
7,8,Wii Play
8,9,New Super Mario Bros. Wii
9,10,Duck Hunt


### 3.2 查看最受欢迎的游戏类型

In [12]:
data_genre=data[['Genre', 'Global_Sales']].groupby('Genre').sum().sort_values(by='Global_Sales', ascending=False)
data_genre.head(10)

Unnamed: 0_level_0,Global_Sales
Genre,Unnamed: 1_level_1
Action,1722.84
Sports,1309.24
Shooter,1026.2
Role-Playing,923.83
Platform,829.13
Misc,789.87
Racing,726.76
Fighting,444.05
Simulation,389.98
Puzzle,242.21


### 3.3 查看最受欢迎的发行平台

In [13]:
data_platform=data[['Platform', 'Global_Sales']].groupby('Platform').sum().sort_values(by='Global_Sales', ascending=False)
data_platform.head(10)

Unnamed: 0_level_0,Global_Sales
Platform,Unnamed: 1_level_1
PS2,1233.46
X360,969.6
PS3,949.35
Wii,909.81
DS,818.91
PS,727.39
GBA,305.62
PSP,291.71
PS4,278.1
PC,254.7


### 3.4 查看最受欢迎的发行人

In [14]:
data_publisher=data[['Publisher', 'Global_Sales']].groupby('Publisher').sum().sort_values(by='Global_Sales', ascending=False)
data_publisher.head(10)

Unnamed: 0_level_0,Global_Sales
Publisher,Unnamed: 1_level_1
Nintendo,1784.43
Electronic Arts,1093.39
Activision,721.41
Sony Computer Entertainment,607.28
Ubisoft,473.54
Take-Two Interactive,399.3
THQ,340.44
Konami Digital Entertainment,278.56
Sega,270.7
Namco Bandai Games,253.65


### 4.预测每年电子游戏销售额