Generate collection names using a hash function #84

gtorodelvalle · 2015-05-14T12:49:40Z

100% tests passed
Implements Generate collection names using a hash function #83
Assigned to @frbattid
The collection names are generated this way:
- For raw data: COLLECTION_PREFIX + generatedHash
- For aggregated data: COLLECTION_PREFIX + generatedHash + '.aggr'

where generatedHash is generated using the SHA-512 algorithm truncating the hash to 120 bytes after reserving the needed bytes for the database name, the COLLECTION_PREFIX and the .aggr prefix for aggregates.

The relation between the collection names and the combination of service, service path, entity and attribute is stored in a new collection named COLLECTION_PREFIX + 'collection_names'.

Example entries of this collection are the following documents:

{ "_id" : "sth_35b3d41c92944eb243dfd1324a756961c6d4d21955ab003d8857994c38be159c13b2a9631b7971ccb9ab394da44e66c730aa1", "dataModel" : "collection-per-attribute", "isAggregated" : false, "service" : "orion", "servicePath" : "/", "entityId" : "entityId", "entityType" : "entityType", "attrName" : "attrName" }
{ "_id" : "sth_35b3d41c92944eb243dfd1324a756961c6d4d21955ab003d8857994c38be159c13b2a9631b7971ccb9ab394da44e66c730aa1.aggr", "dataModel" : "collection-per-attribute", "isAggregated" : true, "service" : "orion", "servicePath" : "/", "entityId" : "entityId", "entityType" : "entityType", "attrName" : "attrName" }
{ "_id" : "sth_5c86f0344ed249425c8aad3f272b72a2ce0ff388791f31497eaf257fa72629487faafd7fd27eb0dd6355d5509e2f3faf76711", "dataModel" : "collection-per-service-path", "isAggregated" : false, "service" : "orion", "servicePath" : "/", "entityId" : "entityId", "entityType" : "entityType", "attrName" : "attrName" }
{ "_id" : "sth_5c86f0344ed249425c8aad3f272b72a2ce0ff388791f31497eaf257fa72629487faafd7fd27eb0dd6355d5509e2f3faf76711.aggr", "dataModel" : "collection-per-service-path", "isAggregated" : true, "service" : "orion", "servicePath" : "/", "entityId" : "entityId", "entityType" : "entityType", "attrName" : "attrName" }
{ "_id" : "sth_509cdf604e8ea57462262bcb4f8454a2005a24f8b8d09377bc6de4ddc6f6b43120f694bbb0cce18a87318f26b6c4a87f689b8", "dataModel" : "collection-per-entity", "isAggregated" : false, "service" : "orion", "servicePath" : "/", "entityId" : "entityId", "entityType" : "entityType", "attrName" : "attrName" }
{ "_id" : "sth_509cdf604e8ea57462262bcb4f8454a2005a24f8b8d09377bc6de4ddc6f6b43120f694bbb0cce18a87318f26b6c4a87f689b8.aggr", "dataModel" : "collection-per-entity", "isAggregated" : true, "service" : "orion", "servicePath" : "/", "entityId" : "entityId", "entityType" : "entityType", "attrName" : "attrName" }

frbattid · 2015-05-18T14:38:24Z

src/sth_database.js

    }
+    // The maximum number of bytes accepted by MongoDB for namespaces is 120 bytes.
+    var limit = 119 - bytesCounter.count(databaseName) - bytesCounter.count(sthConfig.COLLECTION_PREFIX) -
+      bytesCounter.count('.aggr');


Despite this is a method for obtaining the collection name for raw data, I guess it is necessary to have into account the number of bytes for ".aggr" since the hash-based collection name must match both for raw and aggregated, right?

That's right :) That's the reason of the - bytesCounter.count('.aggr')part :) Am I missing anything here? :p

No, it is perfect, just wondering if I understood it well :) NTC

frbattid · 2015-05-20T10:47:38Z

README.md

@@ -345,6 +345,16 @@ the attribute type does not have any special semantic or effect currently.
 As already mentioned, all this configuration parameters can also be adjusted using the
 [`config.js`](https://github.com/telefonicaid/IoT-STH/blob/develop/config.js) file whose contents are self-explanatory.

+It is important to note that there is a limitation on the bytes a MongoDB namespace (concatenation of the database name and


By introducing this paragraph it should be asumed that the data model is fixed to collection-per-entity, right? I mean, we are still maintaining the possibility to switch to another data model, and AFAIK this hashing mechanism is only addressing the collection-per-entity data model, isn`t it?

I've been thinking about fixing the data model and so on, and I think we can hide from the documentation any reference to the data_model parameter, defaulting it to collection-per-entity but maintaining the code that allows to change the data model for internal testing purposes (I'm affraid we decide to fix the data model and in the future the bosses decide for another one :)).

No, no, the included code can be used with any of the 3 supported data models. The hash is generated from the service path (if collection-per-service-path), the concatenation of service path, entityId and entityType (if collection-per-entity) or the concatenation of service path, entityId, entityType and attributeName (if collection-per-attribute). See https://github.com/telefonicaid/IoT-STH/pull/84/files#diff-db503fe45ab607476c902dd760bd0defR203 and how the collectionName4Events is obtained ;)

Consequently, we can keep the option to select the data model if we want to since it is supported by the hashing mechanism :)

You are right, I did not notice the switch, I only saw the last case... Tip: do not review PRs with fever xD NTC

In addition, the last case is not about collection-per-entity but collection-per-attribute... so I reviewed it completely bad xD

…ashing for the collection names

iariasleon · 2015-05-21T19:31:41Z

README.md

+
+1. <u>Plain text</u>: In case the `SHOULD_HASH` configuration parameter is set to 'false', the collection names are
+generated as a concatenation of the `COLLECTION_PREFIX` plus the service path (in case of the collection-per-service-path
+data model) plus the entity id plus the entity type (in case of the collection-per-entity data model) plus the attribute name


We agreed to eliminate all related information with data models

But not in this PR, right? :) I wanted to include everything needed and in a new PR delete all the information just to have that information together in case someone asks us to get it back... :p

+1 not in this PR

…it imposed by MongoDB

frbattid · 2015-05-22T08:29:46Z

LGTM, well done @gtorodelvalle !!

…names-using-a-hash-function Generate collection names using a hash function

gtorodelvalle assigned frbattid May 14, 2015

frbattid reviewed May 18, 2015
View reviewed changes

frbattid mentioned this pull request May 20, 2015

Implement raw and aggregated MongoDB collections names based in hashes telefonicaid/fiware-cygnus#420

Closed

frbattid reviewed May 20, 2015
View reviewed changes

gtorodelvalle added 4 commits May 21, 2015 19:47

Generate collection names using a hash function

da9caa8

Confusing comment fixed

8455e39

New SHOULD_HASH configuration parameter to enable or to disable the h…

855a0ac

…ashing for the collection names

Hash size in bytes must be bigger than 20 bytes

63c914b

gtorodelvalle force-pushed the feature/generate-collection-names-using-a-hash-function branch from c3ff1bb to 63c914b Compare May 21, 2015 17:48

iariasleon reviewed May 21, 2015
View reviewed changes

gtorodelvalle added 2 commits May 22, 2015 09:11

Correctly informing of the size in bytes of the service

0b832b3

Informing the user that the size of namespaces is bigger than the lim…

7f4824a

…it imposed by MongoDB

frbattid pushed a commit that referenced this pull request May 22, 2015

Merge pull request #84 from telefonicaid/feature/generate-collection-…

b33dd71

…names-using-a-hash-function Generate collection names using a hash function

frbattid merged commit b33dd71 into develop May 22, 2015

frbattid deleted the feature/generate-collection-names-using-a-hash-function branch May 22, 2015 08:51

gtorodelvalle mentioned this pull request May 22, 2015

Generate collection names using a hash function #83

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate collection names using a hash function #84

Generate collection names using a hash function #84

gtorodelvalle commented May 14, 2015

frbattid May 18, 2015

gtorodelvalle May 21, 2015

frbattid May 21, 2015

frbattid May 20, 2015

gtorodelvalle May 21, 2015

gtorodelvalle May 21, 2015

frbattid May 21, 2015

frbattid May 21, 2015

iariasleon May 21, 2015

gtorodelvalle May 22, 2015

frbattid May 22, 2015

frbattid commented May 22, 2015

Generate collection names using a hash function #84

Generate collection names using a hash function #84

Conversation

gtorodelvalle commented May 14, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frbattid commented May 22, 2015