# Success Prediction: Wargaming

## <span style="color:red">Problem</span>

An army is simulating multiple engagement scenarios and wants to know the probability of winning a scenario based on how well their units are performing against enemy units at any time during the scenario. Using the given wargaming dataset, your goal is to build a model that can predict the winning probability of a scenario at its current state. 


## <span style="color:red">Data</span>

In order to build an AI that can answer this need, we provide a simulated dataset of multiple wargaming scenarios. The scenarios are generated from different initial parameters (amount of initial friendly and opposing units, technological advantage) and the final outcome of each scenario (win or lose) is determined after a series of engagements.


### <span style="color:blue">Features</span>

- **time**: time point at which the scenario current parameters are calculated (numeric)


- **friendly_start**: number of friendly units at the beginning of the scenario (numeric)


- **enemy_start**: number of enemy units at the beginning of the scenario (numeric)


- **tech_advantage**: indicator of how more or less technologically advanced the friendly side is, from -5 (strongly disadvantaged) to 5 (strongly advantaged) (numeric)


- **strength_ratio**: the ratio of friendly/enemy units at the current time (numeric)


- **survival_rate**: percentage of friendly units still active for combat at the current time (numeric)


- **engagement_performance**: performance indicator of the friendly side at the current time (numeric)


### <span style="color:blue">Target variable</span>

- **final_outcome**: Was the scenario successful ?  binary: 1 (yes), 0 (no))

### <span style="color:blue">Train/Test sets</span>

The train set contains 10000 rows with scenario parameters at random time points and their final outcome. The test set itself contains 1000 data points.

## <span style="color:red">Before starting</span>

Given the problem and data:
- Which machine learning approach do you think would be most suited between classification and regression ?
- What is the range of values your model should be able to return ?

Answer in the below cell

## <span style="color:red">Coding starts here</span>

### Import packages

In [None]:
import pandas as pd
import sklearn
import seaborn

**Import Training Data**

In [None]:
train_data = pd.read_csv('https://github.com/youtalspectra/spectra_ml_example/raw/master/data/wargaming_train.csv')

## Exploratory Data Analysis

Explore, pre-process and/or clean the data here.

## Model fitting

Fit and optmize your model here.

## Predictions

Make predictions on the following test set and get the model score here. Remember to apply the same pre-processing to the test set as done on the test set !

In [None]:
test_data = pd.read_csv('https://github.com/youtalspectra/spectra_ml_example/raw/master/data/wargaming_test_1000.csv')