Skip to content

Commit

Permalink
Merge 17f67cd into 987c8c1
Browse files Browse the repository at this point in the history
  • Loading branch information
Marko Obrovac committed Jan 14, 2015
2 parents 987c8c1 + 17f67cd commit a0224bb
Show file tree
Hide file tree
Showing 14 changed files with 886 additions and 121 deletions.
142 changes: 132 additions & 10 deletions config.example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,140 @@

port: 7231

# System domain (used to store restbase metadata) in reverse DNS notation
# System domain (used to store restbase metadata)
sysdomain: restbase.local

storage:
default:
# module name
type: restbase-cassandra
hosts: [localhost]
keyspace: system
username: cassandra
password: cassandra
defaultConsistency: localQuorum # or 'one' for single-node testing
templates:

wmf-content-1.0.0: &wp/content/1.0.0
swagger: '2.0'
# swagger options, overriding the shared ones from the merged specs (?)
info:
version: 1.0.0-abcd
title: Standard Wikimedia content API
description: All the content for this domain.
termsOfService: http://wikimedia.org/terms/
contact:
name: The project maintainers
url: http://mediawiki.org/wiki/RESTBase
license:
name: Creative Commons 4.0 International
url: http://creativecommons.org/licenses/by/4.0/
security:
# ACLs for public *.wikipedia.org wikis
- mediaWikiAuth:
- user:read
paths:
/v1:
x-restbase:
interfaces:
- mediawiki/v1/content
# - mediawiki/v1/mobile
# - mediawiki/v1/revision-scoring

x-restbase-paths: # Internal paths. These use the same config structure as
# regular paths, but are restricted to internal use and
# don't show up in swagger.
#
# This stanza defines the /{domain}/sys/ hierarchy.

/sys/table: &wp/sys/table # Can use this anchor to share the table
# backend even if other parts differ
x-restbase:
interfaces:
- restbase/sys/table
modules:
# There can be multiple modules too per stanza, as long as the
# exported symbols don't conflict. The operationIds from the spec
# will be resolved against all of the modules.
- name: restbase-cassandra
version: 1.0.0
type: npm
options: # Passed to the module constructor
hosts: [localhost]
keyspace: system
username: cassandra
password: cassandra
defaultConsistency: localQuorum # or 'one' for single-node testing

/sys/page_revisions: &wp-page-revisions
x-restbase:
interfaces:
- mediawiki/sys/page_revisions
modules:
- name: restbase-mod-page_revisions
version: 1.0.0
type: npm
options:
apiURL: http://{domain}/w/api.php

/sys/key_rev_value: &wp/sys/key_rev_value
x-restbase:
interfaces:
- restbase/sys/key_rev_value
modules:
- name: restbase-mod-key_rev_value
version: 1.0.0
type: npm

/sys/parsoid:
x-restbase:
interfaces:
- mediawiki/sys/parsoid
modules:
- name: restbase-mod-parsoid
version: 1.0.0
type: npm
options:
parsoidHost: http://parsoid-lb.wikimedia.org
apiURL: http://{domain}/w/api.php
resources:
# Storage owned by this module. Created / checked after setting up
# all modules (separate traversal).
# Convention: Prefix each entry with the owning sys path to avoid
# conflicts.
- uri: /{domain}/sys/key_rev_value/parsoid.html
- uri: /{domain}/sys/key_rev_value/parsoid.data-parsoid
- uri: /{domain}/sys/key_rev_value/parsoid.data-mw
- uri: /{domain}/sys/key_rev_value/parsoid.wikitext

# /sys/revscore:
# title: Simple revscore service wrapper
# x-restbase:
# # Generic revision service interface; Expects requests of the form
# # /{title}/{revision}.
# # Specific interface documentation (content types etc) at public
# # entry point, although we might also want to enforce them
# # internally.
# interfaces:
# - restbase/sys/key_rev_service
# modules:
# - name: restbase-mod-service
# version: 1.0.0 # simple service module, to be shared
# options:
# storage:
# uri: /{domain}/sys/key_rev_value/revscore.scores/{title}/{revision}
# service:
# uri: http://revscore.wikimedia.org/{domain}/{title}/{revision}
# resources:
# - uri: /{domain}/sys/key_rev_value/revscore.scores

wp-default-1.0.0: &wp/default/1.0.0
x-restbase:
interfaces:
- *wp/content/1.0.0



spec:
title: "The RESTBase root"
# Some more general RESTBase info
paths:
/{domain:en.wikipedia.org}: *wp/default/1.0.0
/{domain:de.wikipedia.org}: *wp/default/1.0.0
/{domain:es.wikipedia.org}: *wp/default/1.0.0
/{domain:nl.wikipedia.org}: *wp/default/1.0.0


logging:
name: restbase
Expand Down
175 changes: 125 additions & 50 deletions doc/Implementation.md
Original file line number Diff line number Diff line change
@@ -1,70 +1,145 @@
# RESTBase Implementation

## Code structure
- storage backends in separate npm packages
- modules in separate npm packages
- `restbase-tables-cassandra`
- `restbase-queues-kafka`
- `restbase-mod-parsoid`

Tree:
```
restbase.js
lib/
storage.js
util.js
proxy_handlers/
global/
network.js
parsoid.js
buckets/
kv_rev/
wikipages/
# XXX: not quite final yet
config.yaml
conf.d
mediawiki
api/
bucket/
projects/
# projects enable grouping of restbase configs per project
someproject/
global/
buckets/
# kv:.pages.html.yaml -- kv bucket named 'html'
# pagecontent:.pages.yaml -- pagecontent buckets named 'pages'
interfaces/
restbase/
sys/
key_rev_value.yaml
key_rev_service.yaml
table.yaml # defining operationIds, which map to module exports
mediawiki/
v1/
content.yaml
sys/
parsoid.yaml
page_revision.yaml
doc/
test/
```

### Bucket & proxy handler config
- global & per domain
- FS: conf/global and conf/{domain}/
- doesn't scale too well, but integrates with code review, deploy testing
& typical development style
- later, maybe: distributed through storage

### Routing
- global (or per-domain, later) proxy handler routeswitch
- if no or same match: forward to storage backend
- checks domain & bucket
- calls per-bucket-type routeswitch with global env object
- on request from handler:
- if uri same (based on _origURI attribute): forward to table storage
- need to select the right backend
- else: route through proxy

#### Bucket / table -> storage backend mapping
- table registry
- bucket type ('kv')
- storage backend for table *with same name*
- possibly no table storage associated - storage entry null
- flow through bucket to storage:
1) call bucket routeswitch & handler
2) on request with identical url, call underlying storage handler
- need to know storage backend
- hook that up on the proxy ahead of time (if not null), before
calling bucket handler
3) on requests to other tables, follow same procedure as above
- lets us move each table to separate storage
## Spec loading
Converts a spec tree into a route object tree, ready to be passed to
`swagger-router`. Can be passed into Router.addSpec as a handler.

- parameters:
- spec

- check global nodeMap.get(spec)
- if found, just use the existing sub-tree (`parentNode.set()`) and return
- specToTree: spec -> { children: []
- look for
- x-restbase-paths at top level
- treat just like normal paths, but restrict access unconditionally
- path-based ACL: `restbase:sys` with capability added for
internal requests, but not external ones
- bail out if prefix is not `/{domain}/sys/`
- for each x-restbase directly inside of path entries (*not* inside of methods)
- if `modules` is defined, load them and check for duplicate symbols
- if `interfaces` is defined, load them and apply spec loader
recursively, passing in modules and prefix path
- if `resources` is defined, add them to a global list, with ref back
to the original spec
- call them later on complete tree (should we *only* do PUT?)
- on error, complain really loudly and either bail out
completely or keep going (config)
- could also consider blacklisting modules / paths based
on this; perhaps re-build the tree unless we can
`.delSpec()` by then
- for each x-restbase inside of methods inside of path entries
- if `service` is defined, construct a method that resolves the
backend path
- else, check if `operationId` is defined in passed-in modules
- in cases where we can be sure that the matching end point will
be static, we can cache the result (with a method to map
parameters, possibly inferred from a wildcard mapping or by
passing in unique strings & looking for them in the final
parameters)

Result: tree with spec nodes like this:
```javascript
{
path: new URI(pathFragment),
spec: specObj, // reference to the original spec object, for documentation
value: valueObject,
// optionally:
children: [childObj, childObj], // child specs, one for each interfaces:
// declaration
}
```

`valueObject` might look like this:
```javascript
{
acl: {}, // TODO: figure out
handler: handlerFn, // signature: f(restbase, req), as currently
// more properties extracted from the spec as needed (ex: content-types
// for sanitization)
}
```

For router setup, each path down the spec tree is passed to the router as an
array: `addSpecs([specRootNode, specNode2, specNode3])`. We *could* also pass
the entire tree, but that'd be less flexible for dynamic updates later.

In any case, passing in an array of spec nodes lets us check each spec node
for presence in the `_nodes` map before creating a subtree for it. This will
naturally establish sharing at the highest possible spec boundary. Dynamic
updates later without a full rebuild won't be trivial with sharing. A good
compromise could be to always rebuild an entire domain on any change. (So back
do passing trees, except that they are not the root tree?)

For ACLs it *might* be useful to leverage the DAG structure by checking ACLs all
the way down the path. This would allow us to restrict access at the domain
level, for the entire domain, while still sharing sub-trees. To avoid tight
coupling of the router to the actual ACL implementation we can have
`lookup(path)` (optionally) return an array of all value objects encountered
in a successful lookup in addition to the actual lookup result / leaf
valueObject. We can then check each of those valueObjects for the presence of
an acl object (or whatever other info we stash in there), and run the
associated authorization or [insert here] logic. In the spec, an ACL for a
sub-path could look like this:

```yaml
paths:
/{domain:en.wikipedia.org}:
x-restbase:
security: # basically as in https://github.com/swagger-api/swagger-spec/blob/master/versions/2.0.md#securityRequirementObject
mediaWikiSecurity:
# ACLs that apply to all *children* accessed through this point in
# the tree
- readContent
interfaces:
- mediawiki/v1/content
get:
# optional: spec for a GET to /en.wikipedia.org itself
# can have its own security settings
```

The effective required capabilities (aka roles|scopes|..) for a given route
are the union of the path-induced ones with those defined on the route handler
itself. This means that path-based ACLs can only add to the required
capabilities for subtree access, effectively locking them down further. The
result should be fairly predictable behavior.

Most of the ACL customizations between different wikis would happen at the
authorization level anyway (mapping of identity to capabilities), which means
that tree ACLs don't absolutely need to differ between public and private
wikis.

TODO: Actually think this through more thoroughly.

## Internal request & response objects
### Request
Expand All @@ -83,7 +158,7 @@ test/
}
```
#### `uri`
The URI of the resource. Required.
The URI of the resource. Required. Can be a string, or a `swagger-router.URI` object.

#### `method` [optional]
HTTP request method. Default `GET`. Examples: `GET`, `POST`, `PUT`, `DELETE`
Expand Down
Loading

0 comments on commit a0224bb

Please sign in to comment.