From 66cf069a3b4d905aa79d9c50abd610898b4283bf Mon Sep 17 00:00:00 2001 From: newtoncalegari Date: Wed, 27 Apr 2016 11:08:14 -0300 Subject: [PATCH] Resolving comment 67 --- bp.html | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/bp.html b/bp.html index 3ba11dc..5e5852c 100644 --- a/bp.html +++ b/bp.html @@ -3895,13 +3895,13 @@

Data Enrichment

Enrich your data by generating new data from the raw data when doing so will enhance its value.

Why

-

Enrichment can greatly enhance processability, particularly for unstructured data. Missing values can be filled in, and new attributes and measures can be added. Publishing more complete datasets enhances trust. Deriving additional values that are of general utility saves users time and encourages more kinds of reuse. There are many intelligent techniques that can be used to enrich data, making the dataset an even more valuable asset.

+

Enrichment can greatly enhance processability, particularly for unstructured data. Under some circumstances, missing values can be filled in, and new attributes and measures can be added. Publishing more complete datasets can enhance trust, if done properly and ethically. Deriving additional values that are of general utility saves users time and encourages more kinds of reuse. There are many intelligent techniques that can be used to enrich data, making the dataset an even more valuable asset.

Intended Outcome

-

A dataset that has missing values is enhanced if it is possible to fill in those values. Additional relevant measures or attributes should be added if they enhance utility. Unstructured data can be given structure in this way as well.

-

Because inference-based enrichment may introduce errors into the data, values generated by such techniques should be labeled as such, and it should be possible to retrieve any original values replaced by enrichment.

-

Whenever licensing permits, the code used to enrich the data should be made available along with the dataset. Sharing such code is particularly important for scientific data.

+

Data that is unstructured should be given structure if possible. In structured data, missing values should be added if they enhance utility, but only if the addition does not distort analytical results, significance, or statistical power.

+

Values generated by inference-based techniques should be labeled as such, and it should be possible to retrieve any original values replaced by enrichment.

+

Whenever licensing permits, the code used to enrich the data should be made available along with the dataset.

Possible Approaches to Implementation