Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multitenancy support for all the sinks #338

Closed
frbattid opened this issue Mar 5, 2015 · 5 comments
Closed

Multitenancy support for all the sinks #338

frbattid opened this issue Mar 5, 2015 · 5 comments
Assignees
Labels
Milestone

Comments

@frbattid
Copy link
Member

frbattid commented Mar 5, 2015

Currently, all the available sinks are designed to work with a single user:

  • OrionHDFSSink
    • cygnusagent.sinks.hdfs-sink.cosmos_default_username
    • cygnusagent.sinks.hdfs-sink.cosmos_default_password
  • OrionMySQLSink
    • cygnusagent.sinks.mysql-sink.mysql_username
    • cygnusagent.sinks.mysql-sink.mysql_password
  • OrionCKANSink
    • cygnusagent.sinks.ckan-sink.api_key

This behaviour must be changed in order to support multiple users at each sink.

A simple approach is to replace the above parameters with pointers to files (or a single file) containing more than one entry, e.g. for HDFS we could have several (username, password) pairs for each user that will receive a copy of the persisted data... nevertheless, is this the desired behaviour?

Orion attaches within its notifications a fiware-service and a fiware-servicePath headers that in some way are determining which is the owner of the notified data. Thus, it seems persisting all the notified data for all the configured users in the above files is not a good idea, but a matching between the fiware-service and fiware-servicePath and the HDFS/MySQL/CKAN/whatever user must be done. For instance (CSV-like example):

# fiware-service,fiware-servicePath,hdfs_user,hdfs_pass,mysql_user,mysql_pss,ckan_user,ckan_pass
fuenlabrada,basuras,fuen_waste,o34erf8z  ...
fuenlabrada,parques,fuen_park,943uryhf  ...
leganes,basuras,lega_waste,lkj2wmd0 ...

Of course, such a CSV-like file must be cached in memory for performance issues.

NOTE 1: Once the HDFS/MySQL/CKAN/whatever users have been found with the above mentioned mechanism, not all the notified data will be sent all the configured storages. The model-entities feature of Orion will be used to decide which are the storages that definitely will persist the data. This is described at issue https://github.com/telefonicaid/fiware-connectors/issues/315.

NOTE 2: There is as well a close relation among this new feature and the already develop pattern-based grouping. See this link for more details, but as a reminder, such a pattern-based grouping was in charge of (1) deciding the final persitence destination within HDFS/MySQL/CKAN/whatever (file, table, resource, respectively), and (2) the destination dataset, i.e. a way of modifying the fiware-servicePath. (1) is irrelevant from this issue point of view, but (2) has to be taken into account when putting all together.

Conclusion: We can say the steps are:

  1. Decide the destination and destination dataset e.g. new fiware-servicePath, by checking the configured patterns.
  2. Decide which is the owner of the notified data, by checking the users CSV file.
  3. Then, decide which are the storages that will be holding the persisted data, by checking the notified model-entity.
@frbattid frbattid self-assigned this Mar 5, 2015
@frbattid frbattid added this to the release/0.8.0 milestone Mar 5, 2015
@fgalan
Copy link
Member

fgalan commented Mar 6, 2015

After an skype talk with @frbattid it seems the right procedure should be:

  • Cygnus gets the X-Auth-Token from the Orion notification
  • Cygnus checks in the model entity to which backends the notification has to be peristed.
  • Depending on the backend, two situations may happen:
    • For backends based in FIWARE authtentication in which X-Auth-Token is meanifull, Cygnus propagates that X-Auth-Token (along with Fiware-Service and Fiware-ServiciePath).
    • For backends in which X-Auth-Token is not meaninfull (e.g. HDFS or MySQL):
      • Cygnus interacts with IDM using that X-Auth-Token to get the corresding user and the services, subservices it belongs. Thus, Cygnus gets an user an a list of duples associated to the user, each duple consisting in <fiware-service, fiware-servicepath>.
        • Some expert in IDM should validate that the IDM works as suggested above, i.e. given an X-Auth-Token it returns the user name and the list of services and servicese path to which that user belongs.
      • Cygnus searchs in its configuration (CVS or JSON) for the triple <user, fiware-service, fiware-servicepath> to get the credentials to use in the backend (e.g. HDFS-level user and password). Note that if the user belongs to more than one service, several persistence operations may be needed (is that correct?)

In the above secuence, the fiware-servicepath to use may come for an HTTP header from Orion or the result after applyting the patterns table.

@mrutid
Copy link
Member

mrutid commented Mar 6, 2015

"Cygnus interacts with IDM using that X-Auth-Token to get the corresding user and the services, subservices it belongs. Thus, Cygnus gets an user an a list of duples associated to the user, each duple consisting in <fiware-service, fiware-servicepath>."

I would preffer just a <fiware-service, fiware-servicepath> mapping without even resolve the user. So <fiware-service, fiware-servicepath> tuple determine which credentials we should use at third party (we increase the granularity, but I think it is enough). If Cygnus receives a notification, then the token is valid and corresponds to the given Serv SubServ at headers (provided the source is CB).

"Some expert in IDM should validate that the IDM works as suggested above, i.e. given an X-Auth-Token it returns the user name and the list of services and servicese path to which that user belongs."

I'm not an expert but IDM resolves the token and give you that information (roles, services, subservices).

Agree on making a conditional forwarding for the xAuth-token. For every Serv Subsev DestiantionBE we should be able to determine if we need token forwarding (default to false for security reasons).

@frbattid
Copy link
Member Author

I hope I'm able to explain myself regarding the conversation I had with @fgalan at FIWARE's Developers Week in Brussels.

We think Cygnus already supports multitenancy :)

It already supports multitenancy because:

  • Each fiware-service maps (or may map with very little modifications) into an isolated data space within the final backend:
    • MySQL: each fiware-service maps into a database.
    • CKAN: each fiware-service maps into an organization.
    • HDFS: each fiware-service could map (with minor changes in the code) into a HDFS user space.
  • Each fiware-servicePath maps into a different data container:
    • CKAN: each fiware-servicePath maps into a dataset/package.
    • HDFS: each fiware-servicePath maps into a HDFS folder.
    • MySQL: in this case, there is no direct translation and we prefix all the table names with the fiware-servicePath.

In order to Cygnus may write in all the above isolated spaces and containers only one superuser is required. That currently occurs in the case of MySQL and CKAN. In the case of HDFS, such a superuser may be already be configured, but all the data is put under the HDFS space belonging to that superuser (/user/<superuser>/<service>/<servicePath>/); this could be easily changed in order to have HDFS paths such as /user/<service>/<servicePath>/.

Thus, from the writting point of view, which is the goal of Cygnus, nothing else should be done. Why should we need a different user for each different write, if a single superuser can do everything?

A very different thing is the ownership of the data and which users are allowed to exploit the data, but Cygnus should have nothing to say regarding that. I mean, that is clearly a provision task that is apart from Cygnus goals.

@elenatid
Copy link

elenatid commented Apr 9, 2015

The approach suggested by @frbattid looks good. I'd say that multi-tenancy is supported by the system configuration, not by Cygnus itself, but that is just a technicality ;-).

Since we're relying on the configuration of each flow to ensure that each data is written in the proper sink, there is a slight chance that some error might happen when configuring the patterns and the data ends up being written in the wrong sink (as Cygnus is writing with a superuser), but this is something which is easily detectable and quite unlikely to happen, so I think it something we can survive with.

@frbattid
Copy link
Member Author

frbattid commented May 7, 2015

Implemented in PR #379

@frbattid frbattid closed this as completed May 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants