# CMSC478 Machine Learning

# Project Proposal

## Project Title

Predicting Spending on "Fire Emblem: Heroes" Based on Interview Data

Author:
Tim Rice

Student ID: UX75258

## Problem Description

The goal of this project is to predict the amount of money players will spend on the game "Fire Emblem: Heroes" (FEH) based on a survey of over 4,500 people who played the game. Due to the structure of the survey this will actually be a classification problem rather than regression, because the survey separated money spent into different ranges of money which will each serve as a class for classification.

## Motivation

FEH is a mobile game that uses the "gacha" model, a model in which the game is free to download but players have the option of making in game transactions that cost real money. The term gacha comes from gachapon, which refers to machines that give a random reward for some amount of money. In a gacha game, in game currency is exchanged for random rewards, and while in-game currency can be earned for free in game, players are allowed to purchase more in exchange for real money. Because the amount of free currency is limited, some players spend exorbinant amount of money on in-game currency hoping for a reward that they want. This model is often classified as a form of gambling, which many countries ban, however despite anti-gambling laws many gacha games still find their way into the market and have become espescially prominent on the mobile platform. The gacha model relies on the extremely small portion of players who spend extremely large amounts of money on the game in hopes off getting what they want. These players may come from different finanical situations, but all exhibit the same dangerous gambling behavior that the gacha model preys off of. By surveying people who play these games, you could use a machine learning approach to detect players that may be at risk for such behavior and try to intervene before potenially bad circumstances arise.

## Dataset

Min 1000 records (1 student), max 100MB<br>

Link to dataset source: https://www.kaggle.com/natalieytan/fire-emblem-heroes-survey/version/1

Dataset description: The data is from an interview via google forms for players of the mobile game FEH. It contains 104 features that all consist of a discrete number of unique answers. The target feature will be "How much money have you spent on the game?" and the remaining features will be used to predict. Because many of the features contain text entried, but they are selected from a list, the choices will be converted to numeric identifiers for computation.

## <font color="red"> Required Coding

In [1]:
# Import necessary Python, sklearn and/or tensorflow/keras modules
import numpy as np
import sklearn
import pandas
import matplotlib

# Load the data
df = pandas.read_csv('FEHSurvey8All.csv')

# Get data shape via built-in methods of sklearn, pandas or tensorflow/keras
df.shape

(4677, 104)

In [2]:
#code for building dictionary of features and answers and converting data to ints

featureDict = {}

answerDict = {}
featureName = ""

data = df.values
dataNums = np.zeros(data.shape)

for col in range(0, data.shape[1]):
    featureName = df.columns[col]
    n = 1
    
    for row in range(0,data.shape[0]):
        value = data[row][col]
        if(value not in answerDict):
            answerDict.update({value:n})
            n = n + 1
        dataNums[row][col] = answerDict[value]
            
    featureDict.update({featureName:answerDict.copy()})
    answerDict = {}

print(featureDict["What is your age?"])
print(dataNums[4][3])
print(dataNums.shape)

{'19 – 21 years old': 1, '16 – 18 years old': 2, '22 – 24 years old': 3, '12 – 15 years old': 4, '25 – 30 years old': 5, nan: 6, 'Prefer not to answer': 7, '31 – 40 years old': 8, '41 – 50 years old': 9, '51 – 60 years old': 10, '61+ years old': 11, 'Under 12 years old': 12}
3.0
(4677, 104)


## Methods

I plan to use random forest, SVM, and neural network models for this assignment.

## How to Submit and Due Date

Name your notebook ```Lastname-Proposal.ipynb```. Submit the notebook file with your dataset file in a zip file named EXACTLY as `Lastname-Proposal.zip` using the ```Project Proposal``` link on Blackboard. For groups, only one submission is required.

<font color=red><b>Project Proposal Due Date: Wednesday Nov 20th 11:59PM.</b></font>