Skip to content

Latest commit



132 lines (97 loc) · 6.09 KB

File metadata and controls

132 lines (97 loc) · 6.09 KB

Tutorial: Training using a data frame generated by EasyVVUQ

In this tutorial we will create a forward uncertainty-propagation surrogate using a standard artificial neural network (ANN), trained on a data frame generated by EasyVVUQ. Forward uncertainty propagation is defined as computing the output distribution of a computational model, given assumed probability density functions for the input paramaters of the model, see the image below for a sketch of the problem. A related problem is creating a cheap surrogate model for the input-output map, which can be evaluated at a fraction of the cost of the computational model. EasyVVUQ is VECMA's forward uncertainty propagation toolkit. In this example we will show how to use a EasyVVUQ data frame to train a surrogate model in EasyVVUQ.

Problem definition

For computational efficiency we will consider the analytical Sobol G function with 5 uncertain inputs, see this page for a desription of this function. The model implementation is found in tests/easyvvuq_easysurrogate_coupling/model/ and the entire script for this tutorial is given here: tests/easyvvuq_easysurrogate_coupling/

EasyVVUQ Monte Carlo campaign

We will assume you are familiar with EasyVVUQ, see this page for interactive tutorials and links to the documentation. The code below uses the Monte Carlo sampler to generate a data frame of input - output samples for the problem described above.

# number of uncertain parameters
D = 5

# Define parameter space
params = {}
for i in range(D):
    params["x%d" % (i + 1)] = {"type": "float",
                               "min": 0.0,
                               "max": 1.0,
                               "default": 0.5}
params["D"] = {"type": "integer", "default": D}
params["out_file"] = {"type": "string", "default": "output.csv"}
output_filename = params["out_file"]["default"]
output_columns = ["f"]

# create encoder, decoder, and execute locally
encoder = uq.encoders.GenericEncoder(template_fname=HOME + '/model/g_func.template',
decoder = uq.decoders.SimpleCSV(target_filename=output_filename,
execute = ExecuteLocal('{}/model/ in.json'.format(os.getcwd()))
actions = Actions(CreateRunDirectory('/tmp'),
                  Encode(encoder), execute, Decode(decoder))

# uncertain variables
vary = {}
for i in range(D):
    vary["x%d" % (i + 1)] = cp.Uniform(0, 1)

# MC sampler
my_sampler = uq.sampling.MCSampler(vary=vary, n_mc_samples=100)

# EasyVVUQ Campaign
campaign = uq.Campaign(name='g_func', params=params, actions=actions)

# Associate the sampler with the campaign

# Execute runs

# get the EasyVVUQ data frame
data_frame = campaign.get_collation_result()

Training on a EasyVVUQ data frame

The EasyVVUQ (pandas) data frame can be read into EasySurrogate via:

# Create an EasySurrogate campaign
surr_campaign = es.Campaign()

# This is the main point of this test: extract training data from EasyVVUQ data frame
features, samples = surr_campaign.load_easyvvuq_data(campaign, qoi_cols='f')

This will output:

Extracting features ['x1', 'x2', 'x3', 'x4', 'x5']
Extracting output data ['f'] 

indicating that it has read samples drawn from 5 input variables, which will be used as features. The output data is in this case a single column of the CSV output file of the G function, named 'f'. If you wish to read multiple columns, specify a list of names under the qoi_cols parameter of load_easyvvuq_data. Note that features is an array, wheres samples is a dictionary indexed by the qoi_cols. In this case, samples['f'] will return an array with all output values of the G function. From this point onward we can simply use this data as training data for any surrogate. Here, we will use it to train an ANN surrogate:

# Create artificial neural network surrogate
surrogate = es.methods.ANN_Surrogate()

# Number of training iterations (number of mini batches)
N_ITER = 10000

# The latter fraction of the data to be kept apart for testing

# Train the ANN
surrogate.train(features, samples['f'], N_ITER,
                n_layers=4, n_neurons=50, test_frac=TEST_FRAC)

Note that here we reserved the latter 30 % of the data for testing the accuracy of the surrogate. We evaluate the training and test error via:

# get some useful dimensions of the ANN surrogate
dims = surrogate.get_dimensions()

# evaluate the ANN surrogate on the training data
training_predictions = np.zeros([dims['n_train'], dims['n_out']])
for i in range(dims['n_train']):
    training_predictions[i] = surrogate.predict(features[i])

# print the relative training error
error_train = np.linalg.norm(training_predictions - samples['f'][0:dims['n_train']]) /\
print("Relative error on training set = %.3f percent" % (error_train * 100))

# evaluate the ANN surrogate on the test data
test_predictions = np.zeros([dims['n_test'], dims['n_out']])
for count, i in enumerate(range(dims['n_train'], dims['n_samples'])):
    test_predictions[count] = surrogate.predict(features[i])

# print the relative test error
error_test = np.linalg.norm(test_predictions - samples['f'][dims['n_train']:]) /\
print("Relative error on test set = %.3f percent" % (error_test * 100))

Here dims is a dictionary containing the size of the training, testing and total data, as well as the number of in- and output neurons of the ANN. The output will look something like:

Relative error on training set = 0.294 percent
Relative error on test set = 2.944 percent

indicating that we have a training / test error of 0.3 % and 3 % respectively. Again, the entire script can be found here: tests/easyvvuq_easysurrogate_coupling/