Skip to content

Commit

Permalink
Merge pull request #84 from telefonicaid/feature/generate-collection-…
Browse files Browse the repository at this point in the history
…names-using-a-hash-function

Generate collection names using a hash function
  • Loading branch information
Francisco Romero committed May 22, 2015
2 parents fe8e9d2 + 7f4824a commit b33dd71
Show file tree
Hide file tree
Showing 10 changed files with 373 additions and 122 deletions.
1 change: 1 addition & 0 deletions CHANGES_NEXT_RELEASE
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@
* [FEATURE] Version information provided using the /version URL path (#16)
* [FEATURE] Including attribute type information when retrieving raw data (#54)
* [FEATURE] dateFrom and dateTo as optional parameters in queries (#53)
* [FEATURE] Generate collection names using a hash function (#83)
30 changes: 28 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -295,13 +295,17 @@ a counter used as the suffix for the log file name. Optional. Default value: "0"
- LOG_DIR: The path to a directory where the log file will be searched for or created if it does not exist. Optional. Default value: "./log".
- LOG_FILE_NAME: The name of the file where the logs will be stored. Optional. Default value: "sth_app.log".
- PROOF_OF_LIFE_INTERVAL: The time in seconds between proof of life logging messages informing that the server is up and running normally. Default value: "60".
- DB_PREFIX: The prefix to be added to the service for the creation of the databases. More information below. Optional. Default value: "sth".
- DB_PREFIX: The prefix to be added to the service for the creation of the databases. More information below. Optional. Default value: "sth_".
- DEFAULT_SERVICE: The service to be used if not sent by the Orion Context Broker in the notifications. Optional. Default value: "orion".
- COLLECTION_PREFIX: The prefix to be added to the collections in the databases. More information below. Optional. Default value: "sth".
- COLLECTION_PREFIX: The prefix to be added to the collections in the databases. More information below. Optional. Default value: "sth_".
- DEFAULT_SERVICE_PATH: The service path to be used if not sent by the Orion Context Broker in the notifications. Optional. Default value: "/".
- POOL_SIZE: The default MongoDB pool size of database connections. Optional. Default value: "5".
- WRITE_CONCERN: The <a href="http://docs.mongodb.org/manual/core/write-concern/" target="_blank">write concern policy</a> to apply when writing data to the MongoDB database. Default value: "1".
- SHOULD_STORE: Flag indicating if the raw and/or aggregated data should be persisted. Valid values are: "only-raw", "only-aggregated" and "both". Default value: "both".
- SHOULD_HASH: Flag indicating if the raw and/or aggregated data collection names should include a hash portion. This is mostly
due to MongoDB's limitation regarding the number of bytes a namespace may have (currently limited to 120 bytes). In case of hashing,
information about the final collection name and its correspondence to each concrete service path, entity and (if applicable) attribute
is stored in a collection named `COLLECTION_PREFIX + "collection_names"`. Default value: "true".
- DATA_MODEL: The data model to use. Currently 3 possible values are supported: collection-per-service-path (which creates a MongoDB collection
per service patch to store the data), collection-per-entity (which creates a MongoDB collection per service path and entity to store the data)
and collection-per-attribute (which creates a collection per service path, entity and attribute to store the data). More information about these
Expand Down Expand Up @@ -345,6 +349,28 @@ the attribute type does not have any special semantic or effect currently.
As already mentioned, all this configuration parameters can also be adjusted using the
[`config.js`](https://github.com/telefonicaid/IoT-STH/blob/develop/config.js) file whose contents are self-explanatory.

It is important to note that there is a limitation of 120 bytes for the namespaces (concatenation of the database name and
collection names) in MongoDB (see <a href="http://docs.mongodb.org/manual/reference/limits/#namespaces" target="_blank">http://docs.mongodb.org/manual/reference/limits/#namespaces</a>
for further information). Related to this, the STH generates the collection names using 2 possible mechanisms:

1. <u>Plain text</u>: In case the `SHOULD_HASH` configuration parameter is set to 'false', the collection names are
generated as a concatenation of the `COLLECTION_PREFIX` plus the service path (in case of the collection-per-service-path
data model) plus the entity id plus the entity type (in case of the collection-per-entity data model) plus the attribute name
(in case of the collection-per-attribute data model) plus '.aggr' for the collections of the aggregated data. The length
of the collection name plus the `DB_PREFIX` plus the database name (or service) should not be more than 120 bytes using UTF-8
format or MongoDB will complain and will not create the collection, and consequently no data would be stored by the STH.

2. <u>Hash based</u>: In case the `SHOULD_HASH` option is set to something distinct from 'false' (the default option), the
collection names are generated as a concatenation of the `COLLECTION_PREFIX` plus a generated hash plus '.aggr' for the
collections of the aggregated data. To avoid collisions in the generation of these hashes, they are forced to be 20 bytes
long at least. Once again, the length of the collection name plus the `DB_PREFIX` plus the database name (or service) should not
be more than 120 bytes using UTF-8 or MongoDB will complain and will not create the collection, and consequently no data
would be stored by the STH. The hash function used is SHA-512.

In case of using hashes as part of the collection names and to let the user or developer easily recover this information,
a collection named ```DB_COLLECTION_PREFIX + _collection_name``` is created and fed with information regarding the mapping
of the collection names and the combination of concrete services, service paths, entities and attributes.

[Top](#section0)

##<a id="section5"></a> Inserting data (random single events and its aggregated data) into the database
Expand Down
10 changes: 8 additions & 2 deletions config.js
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,10 @@ config.database = {
// The name of the replica set to connect to, if any. Default value: "".
replicaSet: '',
// The prefix to be added to the service for the creation of the databases. Default value: "sth".
prefix: 'sth',
prefix: 'sth_',
// The prefix to be added to the collections in the databases. More information below.
// Default value: "sth".
collectionPrefix: 'sth',
collectionPrefix: 'sth_',
// The default MongoDB pool size of database connections. Optional. Default value: "5".
poolSize: '5',
// The write concern (see http://docs.mongodb.org/manual/core/write-concern/) to apply when
Expand All @@ -45,6 +45,12 @@ config.database = {
// Flag indicating if the raw and/or aggregated data should be persisted. Valid values are:
// "only-raw", "only-aggregated" and "both". Default value: "both".
shouldStore: 'both',
// Flag indicating if the raw and/or aggregated data collection names should include a hash portion.
// This is mostly due to MongoDB's limitation regarding the number of bytes a namespace may have
// (currently limited to 120 bytes). In case of hashing, information about the final collection name
// and its correspondence to each concrete service path, entity and (if applicable) attribute
// is stored in a collection named `COLLECTION_PREFIX + "collection_names"`. Default value: "true".
shouldHash: 'true',
// The data model to use. Currently 3 possible values are supported: collection-per-service-path
// (which creates a MongoDB collection per service patch to store the data), collection-per-entity
// (which creates a MongoDB collection per service path and entity to store the data) and
Expand Down
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
},
"dependencies": {
"boom": "^2.7.1",
"bytes-counter": "^1.0.0",
"good": "^5.1.2",
"good-console": "^4.1.0",
"good-file": "^4.0.2",
Expand Down
12 changes: 10 additions & 2 deletions src/sth_configuration.js
Original file line number Diff line number Diff line change
Expand Up @@ -40,13 +40,14 @@
NOT_AVAILABLE: 'NA',
SHUTDOWN: 'OPER_STH_SHUTDOWN',
DB_CONN_OPEN: 'OPER_STH_DB_CONN_OPEN',
DB_LOG: 'OPER_STH_DB_LOG',
DB_CONN_CLOSE: 'OPER_STH_DB_CONN_CLOSE',
SERVER_START: 'OPER_STH_SERVER_START',
SERVER_LOG: 'OPER_STH_SERVER_LOG',
SERVER_STOP: 'OPER_STH_SERVER_STOP'
},
DB_PREFIX: ENV.DB_PREFIX || config.database.prefix || 'sth',
COLLECTION_PREFIX: ENV.COLLECTION_PREFIX || config.database.collectionPrefix || 'sth',
DB_PREFIX: ENV.DB_PREFIX || config.database.prefix || 'sth_',
COLLECTION_PREFIX: ENV.COLLECTION_PREFIX || config.database.collectionPrefix || 'sth_',
DATA_MODELS: {
COLLECTIONS_PER_SERVICE_PATH: 'collection-per-service-path',
COLLECTIONS_PER_ENTITY: 'collection-per-entity',
Expand Down Expand Up @@ -140,6 +141,13 @@
} else {
module.exports.DATA_MODEL = 'collection-per-entity';
}
if (ENV.SHOULD_HASH) {
module.exports.SHOULD_HASH = ENV.SHOULD_HASH !== 'false';
} else if (config.database.shouldHash) {
module.exports.SHOULD_HASH = config.database.shouldHash !== 'false';
} else {
module.exports.SHOULD_HASH = true;
}
module.exports.DB_USERNAME = dbUsername;
module.exports.DB_PASSWORD = dbPassword;
module.exports.DB_AUTHENTICATION = (dbUsername && dbPassword) ?
Expand Down
Loading

0 comments on commit b33dd71

Please sign in to comment.