# Fundamentals of Machine Learning

## Evaluating machine-learning models

#### Simple hold-out validation

```
num_validation_samples = 10000

np.random.shuffle(data)

validation_data = data[:num_validation_samples]
data = data[num_validation_samples:]
training_data = data[:]

model = get_model()
model.train(training_data
validation_score = model.evaluate(validation_data)

# Tune the model, retrain it, evaluate it ...

model = get_model()
model.train(np.concatenate([training_data,
                            validation_data]))

test_score = model.evaluate(test_data)
```

#### K-fold validation

```
k = 4
num_validation_samples = len(data) // k

np.random.shuffle(data)

validation_scores = []

for fold in range(k):
    validation_data = data[num_validation_samples * fold: 
    num_validation_samples * (fold + 1)]
    training_data = data[:num_validation_samples * fold] + 
        data[num_validation_samples * (fold + 1):]
    
    model = get_model()
    model.train(training_data)
    validation_score = model.evaluate(validation_data)
    validation_scores.append(validation_score)

validation_score = np.average(validation_scores)

model.get_model()
model.train(data)
test_score = model.evaluate(test_data)
```

#### Iterated K-folld validation with shuffling

It consists of applying K-fold validation multiple times, shuffling the data every time before splitting it K ways. The final score is the average of the scores obtained at each run of K-fold validation. 

## Data preprocessing, feature engineering and feature learning

#### Vectorization

All inputs and targets in a neural network must be tensors of floating-poing data. Whatever data you need to process, you must first turn into tensors, a step called _data vectorization_. 

#### Value normalization

Before you feed data into your network you have to normalize each feature independentrly so that it had a standard deviation of 1 and a mean of 0.

* Take small values (the 0-1 range)
* Be homogenous (all features in the same range)

Additionaly:
* Normalize each feature independently to have a mean of 0.
* Normalize each feature independently to have a stddev of 1.

```
x -= x.mean(axis=0)
x /- x.std(axis=0)
```

#### Handling missing values

If you're expecting missing values in the test data, but the network was trained on data without any missing values, the network won't have learned to ignore missing values. In this situation, you should artificially generate training samples with missing entries: copy some training samples several times, and drop some of the features that you expect are likely to be missing in the test data.

### Feature engineering 

Feature engineering is the process of using your own knowledge about the data and about the machine-learining algorithms at hand to make the algorithm work better by applying hardcoded transformations to the data before it goes into the model. 

## Overfitting and underfitting 

