Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate collection names using a hash function #84

Merged

Conversation

gtorodelvalle
Copy link
Member

where generatedHash is generated using the SHA-512 algorithm truncating the hash to 120 bytes after reserving the needed bytes for the database name, the COLLECTION_PREFIX and the .aggr prefix for aggregates.

The relation between the collection names and the combination of service, service path, entity and attribute is stored in a new collection named COLLECTION_PREFIX + 'collection_names'.

Example entries of this collection are the following documents:

{ "_id" : "sth_35b3d41c92944eb243dfd1324a756961c6d4d21955ab003d8857994c38be159c13b2a9631b7971ccb9ab394da44e66c730aa1", "dataModel" : "collection-per-attribute", "isAggregated" : false, "service" : "orion", "servicePath" : "/", "entityId" : "entityId", "entityType" : "entityType", "attrName" : "attrName" }
{ "_id" : "sth_35b3d41c92944eb243dfd1324a756961c6d4d21955ab003d8857994c38be159c13b2a9631b7971ccb9ab394da44e66c730aa1.aggr", "dataModel" : "collection-per-attribute", "isAggregated" : true, "service" : "orion", "servicePath" : "/", "entityId" : "entityId", "entityType" : "entityType", "attrName" : "attrName" }
{ "_id" : "sth_5c86f0344ed249425c8aad3f272b72a2ce0ff388791f31497eaf257fa72629487faafd7fd27eb0dd6355d5509e2f3faf76711", "dataModel" : "collection-per-service-path", "isAggregated" : false, "service" : "orion", "servicePath" : "/", "entityId" : "entityId", "entityType" : "entityType", "attrName" : "attrName" }
{ "_id" : "sth_5c86f0344ed249425c8aad3f272b72a2ce0ff388791f31497eaf257fa72629487faafd7fd27eb0dd6355d5509e2f3faf76711.aggr", "dataModel" : "collection-per-service-path", "isAggregated" : true, "service" : "orion", "servicePath" : "/", "entityId" : "entityId", "entityType" : "entityType", "attrName" : "attrName" }
{ "_id" : "sth_509cdf604e8ea57462262bcb4f8454a2005a24f8b8d09377bc6de4ddc6f6b43120f694bbb0cce18a87318f26b6c4a87f689b8", "dataModel" : "collection-per-entity", "isAggregated" : false, "service" : "orion", "servicePath" : "/", "entityId" : "entityId", "entityType" : "entityType", "attrName" : "attrName" }
{ "_id" : "sth_509cdf604e8ea57462262bcb4f8454a2005a24f8b8d09377bc6de4ddc6f6b43120f694bbb0cce18a87318f26b6c4a87f689b8.aggr", "dataModel" : "collection-per-entity", "isAggregated" : true, "service" : "orion", "servicePath" : "/", "entityId" : "entityId", "entityType" : "entityType", "attrName" : "attrName" }

}
// The maximum number of bytes accepted by MongoDB for namespaces is 120 bytes.
var limit = 119 - bytesCounter.count(databaseName) - bytesCounter.count(sthConfig.COLLECTION_PREFIX) -
bytesCounter.count('.aggr');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite this is a method for obtaining the collection name for raw data, I guess it is necessary to have into account the number of bytes for ".aggr" since the hash-based collection name must match both for raw and aggregated, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right :) That's the reason of the - bytesCounter.count('.aggr')part :) Am I missing anything here? :p

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is perfect, just wondering if I understood it well :) NTC

@@ -345,6 +345,16 @@ the attribute type does not have any special semantic or effect currently.
As already mentioned, all this configuration parameters can also be adjusted using the
[`config.js`](https://github.com/telefonicaid/IoT-STH/blob/develop/config.js) file whose contents are self-explanatory.

It is important to note that there is a limitation on the bytes a MongoDB namespace (concatenation of the database name and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By introducing this paragraph it should be asumed that the data model is fixed to collection-per-entity, right? I mean, we are still maintaining the possibility to switch to another data model, and AFAIK this hashing mechanism is only addressing the collection-per-entity data model, isn`t it?

I've been thinking about fixing the data model and so on, and I think we can hide from the documentation any reference to the data_model parameter, defaulting it to collection-per-entity but maintaining the code that allows to change the data model for internal testing purposes (I'm affraid we decide to fix the data model and in the future the bosses decide for another one :)).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, no, the included code can be used with any of the 3 supported data models. The hash is generated from the service path (if collection-per-service-path), the concatenation of service path, entityId and entityType (if collection-per-entity) or the concatenation of service path, entityId, entityType and attributeName (if collection-per-attribute). See https://github.com/telefonicaid/IoT-STH/pull/84/files#diff-db503fe45ab607476c902dd760bd0defR203 and how the collectionName4Events is obtained ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consequently, we can keep the option to select the data model if we want to since it is supported by the hashing mechanism :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I did not notice the switch, I only saw the last case... Tip: do not review PRs with fever xD NTC

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, the last case is not about collection-per-entity but collection-per-attribute... so I reviewed it completely bad xD

@gtorodelvalle gtorodelvalle force-pushed the feature/generate-collection-names-using-a-hash-function branch from c3ff1bb to 63c914b Compare May 21, 2015 17:48

1. <u>Plain text</u>: In case the `SHOULD_HASH` configuration parameter is set to 'false', the collection names are
generated as a concatenation of the `COLLECTION_PREFIX` plus the service path (in case of the collection-per-service-path
data model) plus the entity id plus the entity type (in case of the collection-per-entity data model) plus the attribute name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed to eliminate all related information with data models

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But not in this PR, right? :) I wanted to include everything needed and in a new PR delete all the information just to have that information together in case someone asks us to get it back... :p

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 not in this PR

@frbattid
Copy link
Member

LGTM, well done @gtorodelvalle !!

frbattid pushed a commit that referenced this pull request May 22, 2015
…names-using-a-hash-function

Generate collection names using a hash function
@frbattid frbattid merged commit b33dd71 into develop May 22, 2015
@frbattid frbattid deleted the feature/generate-collection-names-using-a-hash-function branch May 22, 2015 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants