-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Authentication to Federation #117
Comments
One possible solution would be to make it possible to pass one or more service-specific auth headers to the SPARQL endpoint.
The standard auth headers could be used as auth values. Side-note for related work: The |
Yes, authentication matters. Is there anything special to SPARQL Federation here though? If it is general matter of HTTP connection handling, then using machinery available from libraries, and the systems that enterprises know how to manage and control, would seem better. The challenge is to answer why this is not the same as securing any HTTP connection and identify, what, if anything, does SPARQL 1.2 the-standard has to do. Jena allows |
I believe this is the most difficult problem to actually do. Because, in this kind of federation you are passing on trust. |
@rubensworks wrote:
Promiscuous queryIsn't that generally available now (if you don't mind sharing your credentials with the executor of the query and any service they might pass a part of the query to)? If I paste this query into service0.example's SPARQL query interface: SELECT ... { ...
SERVICE <http://user1:password1@service1.example/> { ...
SERVICE <http://user2:password2@service2.example/> {
} ...
SERVICE <http://user3:password3@service3.example/> {
} ...
} ...
} , service0 will see credentials for services 1-3 and it's likely that service1 will see credentials for 2-3. If they're using generic HTTP libraries to dereference the SERVICE URLs, this should work in SPARQL 1.1. If that's true, I think we don't get much benefit from having a syntactic extension that moves the credentials out of the URLs. I think in practice, when folks want to write such queries in a non-promiscuous way, they end up contacting services 0-3 and setting up lots of tedious custom interfaces and giving up on service 0 and service 1's implementation of SPARQL Federation. This indicates that there's some need to deal with this at the protocol level. Bag o' credentialsA simple extension, which would move the credentials out of plain site, would be to move them into a header so that when e.g. service 1 was calling service 2 or 3, it could pull the credentials out of a (hypothetical)
You could mitigate the last point a bit by making dynamically creating short-duration, limited use credentials with services 1-3 before initiating the query. Ultimately, it would be nice to have a token per SERVICE call, perhaps scoped to execute only that subquery. I fiddled around with how an OAuth2 Implicit pattern might enable each service to get a token when it first gets queried. Will add a comment if that coalesces. |
while this removes them from the query text, this still entrusts the credentials to the intermediate sites. |
when placing remote requests in connection with service clauses, dydra handles just the first authentication level. the rbac graph which governs access to remote service location resources (see #121) can also include credentials, which are decrypted and incorporated into request headers, as appropriate. this does not address successive authentication stages, but i would not want the authority to do so. |
@lisp, I'm guessing that if you were service1 in the Promiscuous query example above, you wouldn't want the initial querier to hand you a big bag of of credentials for unlimited access to service2 and 3. Would you feel better about any of these approaches?
3 is arguably easier than 1 cause in 3, the querier gets to decide whether the service clause that you call e.g. service1 with is allowed. In 1, service1 would have to make that call, which would require more sophistication to register the token, store approved service clause, and test equivalence between that and the query you pass to them. Since in 3, the querier does all that, themselves, they can make that code more sophisticated when they need, without any coordination with service1. (I guess in principle, folks could use 3 today by setting up proxy query endpoints and rewriting the query to use them instead of service1,2,3.) |
On 2020-10-07, at 13:14:34, ericprud ***@***.***> wrote:
@lisp, I'm guessing that if you were service1 in the Promiscuous query example above, you wouldn't want the initial querier to hand you a big bag of of credentials for unlimited access to service2 and 3.
yes, we play the role of service1 only.
even if we are service2 or service3 in a larger context, our relation to the respective client is as a service1.
we do not now allow anyone to “hand us a big bag of credentials”.
as we operate publicly accessible network services, “backend” resources are all managed. just as we do not permit arbitrary account creation, we do not permit an existing account to create an arbitrary remote resource.
we control that process.
it’s a political, rather than a technology decision.
Would you feel better about any of these approaches?
• querier prearranges tokens with service2,3 which give time-boxed permission to execute the respective service clauses but nothing else.
as use of these tokens is governed strictly by relationships between the querier and service2,3, i have no opinion about it.
so long as i am responsible for the credentials for the process of authentication with service2 only, and not for action (beyond authentication) to be governed by either of service2,3, it does not matter to me.
even if the querier has decided that part of the protocol with services2,3 involves that i would convey such tokens in the text of a federation request, so long as that text is conformant, why should i feel anything about it?
• querier prearranges tokens with service2,3 which give one-time permission to execute the respective service clauses but nothing else. This means you have to gather everything in a BINDINGs clause it make only one query, which could be difficult for non-trivial cases.
i do not understand this mechanism - esp "gather everything in a BINDINGs clause it make only one query”. please give a more detailed example.
if it means that i need to do more that the mechanics of introducing values clauses to effect sidewards-information-passing, that is, it is not inherent the algebra, then i would be wary.
• querier sends you temporary endpoints (or temporary tokens for more persistent endpoints) to get the querier (or some delegate) to execute the query. This moves more bits around the network, but you (service1) never see the querier's credentials for service2 or 3.
this is analogous to scenario 1. that is, the service1 federation requests are opaque to service1.
so long as they are conformant, why should i feel anything about them?
if it is to be dynamic, we would have to relax the constraint on creating resources, to permit the querier to create the remote temporary endpoints and supply their respective credentials.
3 is arguably easier than 1 cause in 3, the querier gets to decide whether the service clause that you call e.g. service1 with is allowed. In 1, service1 would have to make that call, which would require more sophistication to register the token, store approved service clause, and test equivalence between that and the query you pass to them. Since in 3, the querier does all that, themselves, they can make that code more sophisticated when they need, without any coordination with service1.
(I guess in principle, folks could use 3 today by setting up proxy query endpoints and rewriting the query to use them instead of service1,2,3.)
modulo, that we restrict the use of remote endpoints to those which have been configured in advance.
|
We need a encrypted, mutual trust handshake. Isn’t this where WebID steps in? |
can anyone point me to the relevant bits of codebase? |
I'm glad people are gradually contributing bits of knowledge to this difficult problem. Here's how I understand the difficulties:
Because server0 has to decide what credentials to pass to server1, based on the user, server0 setup, the identity of server1, and perhaps even the query. @ericprud: @cto-troven
AFAIK there isn't any commonly accepted codebase, we're still gathering prior art. |
(insert general concern of custom security approaches) Let's set a baseline - what of this isn't OAuth/OpenID/VC/...? We have things like github-related authorization of apps to do things on behalf of the user. In these services, the user has authorized server0 to use server1 on its behalf - and trust server0 to do so only as appropriate. A solution focused on ABAC is preferable to RBAC.
Current: (The implementation will "soon" to be significantly upgraded.) In outline: server0 has "if contacting endpoint E on behalf of user A, use credentials XYZ on the HTTP connection". This allows two cases:
|
Yep. I tweaked my comment: "out of a (hypothetical) |
Apologies for waiting almost a year to reply to @lisp's comprehensive comments above.
And a sensible one.
That makes sense. Your service is running in a pretty conservative mode, but since these arrangements are outside of your control, you don't have to care about them.
Nothing so exotic. I was just noting that if service 1 (who's acting as a sort of aggregator) were given OTPs for services 2 and 3, it would need to be able to formulate the federation as single queries (no getting 2's results and iteratively constructing queries to 3).
Exploring static and dynamic (or verbatim and derived) queries a bit here: verbatim queriesStatic (verbatim) sounds like it's limited to cases where the querier pre-constructed query like (an example from 11 yeara go): SELECT ?symbol ?label
WHERE
{
SERVICE <service2>
{
[] uniprot:gene\#acc "P04637" ;
uniprot:gene\#val ?symbol .
}
SERVICE <service3>
{
[] ucsc:association\#gene_product_id [
ucsc:gene_product\#Symbol ?symbol
] ;
ucsc:association\#term_id [
ucsc:term\#name ?label
] .
}
} In this case, the aggregator (service1) can almost parrot the exact query on to services 2 and 3, which means the querier could have pre-arranged with those services to accept specific queries from service1. Of course, whitespace tweaks are a problem, but the bigger problem is that service1 would probably not want to pose an such an unconstrained query to service3 (which returns all of the gene symbol/label pairs). Instead it would want to constrain that query to return only the symbol for P04637, e.g. SELECT ?symbol ?label
WHERE
{
[] ucsc:association\#gene_product_id [
ucsc:gene_product\#Symbol ?symbol
] ;
ucsc:association\#term_id [
ucsc:term\#name ?label
] .
}
VALUES ?symbol { "TP53" } # value return from uniprot That means that the querier would have to tell service3 to accept from servive1 any query with an algebra of: (base <http://example/base/>
(prefix ((ucsc: <>))
(project (?symbol ?label)
(join
(bgp
(triple ??0 <association#gene_product_id> ??1)
(triple ??1 <gene_product#Symbol> ?symbol)
(triple ??0 <association#term_id> ??2)
(triple ??2 <term#name> ?label)
)
(table (vars ?symbol)
(row [?symbol X]) *
))))) That's not too tough but it does require inventing a bit of language for Are we over-engineering yet? But wait, there's more... derived queriesIf the aggregator were helfully figuring out to go to uniprot and UCSC to execute a query like: SELECT ?id ?gene_symbol WHERE {
?gene uniprot:id ?id ; skos:prefLabel ?gene_symbol
} , the aggregator would be inventing the whole orchestration, including constructing from whole cloth the queries of services 2 and 3. One way to do that would be to make it interactive so that the querier submits the above query to service1 and service1 says "for me to execute this, I need permission to execute template A on service2 and template B on service3." Those templates may be many pages of eye-gouging SPARQL algebra, but i'd expect that in most cases, the querier would say "yeah, sure, whatever" and tell services 2 and 3 to permit any queries from service1, maybe with some ulimits. Now service1 can go ahead and weave a bit of Semantic Web from those services.
Right, I guess in that case, the proxy could automatically configure the remote endpoints before issuing the query. |
on the contrary, where we permit an external request
was the example not standard sparql?
yes, we have done that. that is, we have relied on schema definitions to determine how to deconstruct an aggregate query and delegate aspects to the respective sources and associated the requisite location and authentication information with the schema definitions. in this case, we accepted and managed the authentication credentials.
In the case at hand, everything was declared in relation to the schemas which drove the process to deconstruct the aggregate query. it was not difficult. remote service reliability was much more the issue. no interaction was necessary. |
There are two use cases that I'm juggling:
For use case 1 (alone), the user can pass an access token (password, timeboxed, OTP, etc). For 2, the user needs to associate that token with some parameterized access. One way to do that would be with verbatim queries, hence my point that if service3's query depends on service2's response, the querier would not be able to predict the query to pre-approve it with service3 (unless the answer is already known, but then, why issue the query). Below, I explore parameterizing the access by tying it to some templated query encompassing the query issued from service1 to service3.
Still standard SPARQL 1.1, so far, but with the query structure policed by service3 to match something pre-apprived by the querier.
Sure, but even if you're passing a token, we also have to envision ways that the contract between the querier and service3 can be both:
Gotcha. I was exploring query templates, but you could also enumerate name graphs or permitted predicate paths or...
If service1 were mischevious, could it abuse the querier's credentials to ask for data beyond what the querier inteded?
Sorry, are we talking about service1 or service3 parsing that description and policing the query? |
[ i elide the majority of the text, as i seek to avoid any juggling acts which involve external services. while there is adequate literature on access control via query rewriting, service2 would be negligent to rely on service1 to implement authorization constraints by delegating just approved requests. the interaction between service1 and service2 is in principle no different than that between service2 and any other client. that service1 uses credentials provided by querier does not change that. were service1 a reseller rather than an aggregator, that might change. |
Agreed, modulo the point that service1 has to stay inside the lines of the constract between querier and service 2 or 3.
By "juggling", i meant i was examining two use cases; not that service1 has to do any juggling. |
you wrote,
the protocols and apis among querier and service1 and service2 must ensure that the two use cases are the same.
yes, a literal "token" is likely not sufficient, but sparql 1.1 already permitted the equivalent of that. |
This is how Amazon Neptune (Blazegraph) handles it now: https://docs.aws.amazon.com/neptune/latest/userguide/sparql-service.html
# send to http://neptune-1:8182/sparql
SELECT * WHERE {
?person rdf:type foaf:Person .
SERVICE <http://neptune-2:8182/sparql> {
?person foaf:knows ?friend .
}
}
It is not possible to put |
I wonder how they keep your credentials from leaking out to non-neptune services. I suppose they check the IP against some list of masks for in-house IP addrs. I don't think we want to encourage a monolithic trust realm. That said, we can learn from the use cases this enables and re-envision them spread across services hosted by diverse institutions, enabled by bearer tokens and authorized templates and all the other mechanisms we dreamed up. |
I suggest reordering the elements in the title of this issue — as the desire is to add Authentication to Federation. The current "Authentication with Federation" implies adding Federation to Authentication. |
Why?
SPARQL 1.1 Federation does not specify any way to authenticate.
This is a significant impediment in enterprise scenarios.
Previous work
Vendor-specific solutions:
Many vendors have additional authentication solutions (eg integrations with LDAP, SSO, etc) but afaik they are not exposed to Federation.
Proposed solution
Sorry, someone smarter than me should propose solutions :-)
Considerations for backward compatibility
None, this will be new functionality.
The text was updated successfully, but these errors were encountered: