/
doc.go
313 lines (232 loc) · 11.4 KB
/
doc.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
/*
Package ndb provides a data management system built atop a key-value store of bytes.
The key-value store should provide the following:
- sorted key/value table (ala big table)
- Get (with Iteration), Put, Delete
- batch writes
ndb layers on top of this to provide:
- data storage and retrieval
- querying off indexes
There are 2 categories of data stored:
- Entities
- Indexes
A single database is used to store all entities and indexes. This
affords us some level of transactionality. However, the design allows
sharding later.
Entities
Typically, there are a few top-level entities (e.g. user) and all other data pieces
hang off these. To allow entities which naturally go together to
exist around the same area in the datastore, applications will use the same localId
(only changing the entityType).
The key in this table is 8-bytes (uint64) which can be broken down into:
discrim:shardId:localid:entityType:entityShape:entryType
Each of these sub-components is stored big-endian for easy ordering
where:
discrim: 4-bits [0...16), used to pre-discriminate e.g. index, entity, etc
Ensure discrim id for entity is low (e.g. 1).
This allows us to easily convert the key to a int64 and back
without losing precision.
shardId: 12-bits [0...4096), shard id
localId: 32-bits [0...4.3 billion), id local to the shard
0-8192 are reserved
entityType: 8-bits [0...256), used to store the entity type e.g. user, userprofile, etc
0-32 are reserved for internal use
entityShape: 5-bits [0...32), used to store entity shape
entryType: 3-bits [0...8), used to store entry type e.g. metadata, data, indexrow, etc
Because of this model of assigning ids, the id of an entity is "rich" and cannot
be randomly assigned (as it contains a lot of information about the entity
location and contents.
The intId as received by Entities is a subset of this, and includes just
the shardId and localId portions (ie 44bits, 4 0's in front and 16 0's
at back). This will allow you to still assign the same "entity id" to
different types.
Configuration exists which maps entityType(string) to entityType(uint8).
There may be concerns with the limitations on number of entityTypes (only 255 allowed)
or indexes (only 255 allowed) inherent in this design. This is ok. The database is for
only one application. Any other application can interop via a loosely coupled API.
Pinning Entities
Some entities can be pinned to shard id 1.
Those are entities with struct tag value pinned.
This way, it's easy to get all entities for that type by just checking one shard alone.
Auto Gen Entity Ids
These are also stored in the datastore.
Their format is similar to that listed above for entities, except that it
only contains:
discrim: IDGEN
shardid:
entityType:
Indexes
Each query maps to an index and must have a properly declared index.
Simple queries are supported:
- Multiple Equality filters
- At most one inequality filter at the end
At write time, we examine the metadata against the configured indexes
and write the appropriate indexes.
An index must intrinsically know what fields are in it. This is got
from the index configuration file.
Only primitives can be indexed, and we store values appropriately to
enable easy ascending sorting.
The following primitives can be indexed:
- bool, int8...64, uint8...64, float32, float64, string
Internally, they are stored as uint8...64, or as []byte (for string).
We convert appropriately based on the type when storing.
Each index row has a prefix:
Prefix: (uint8 containing initial 4-bits discrim in front)
(1-byte typeId)(1-byte index id)
(... indexed field values ...)
Suffix: (sequence of uint64 which is the data key)
The value in the table for each index is always a single-byte:
false. The value is not used at all.
In the index corresponding to the query:
select from (type) where (bool)=true, (int8)=9, (float32)=32.0,
(string)="yes",(string)>="abc"
The index row will have value of uint8 (0) and a key as:
(index prefix)(uint8)(uint8)(uint32)(string)(nil-byte)(string)(nil-byte)(index-suffix)
Note the following rules which apply above to handle variable-length strings:
- Strings must come at the end
- Strings are always terminated by a nil byte
From a user call site P.O.V., the following restrictions apply:
- All indexes must be configured and named
- A query runs against a specific index.
The index must be inferred from the parameters to the query call.
- If you need to sort on a set of fields, ensure they appear last in the properties
in the sort order you want.
The API only allows you say ascending or descending when you query.
Anatomy of a write
For each key that is written, we write rows for:
- data (codec encoded typed object)
- metadata (codec encoded db.PropertyList).
This is done so we can re-create the index later without having the schema of the typed object.
- index rows.
This is necessary so that we can delete current set of index rows before creating new ones.
A Put operation is like:
type Metadata struct { key string, value interface{} }
metadata values must be primitives
(bool, int8...64, uint8...64, float32, float64, string)
However, the type must match with the registered index.
Put(key uint64, value interface{}, metadata []Metadata)
if key == 0, then a new id is assigned.
When a Put is performed, we will in one atomic update:
- Look up data and metadata for the key
- If found, figure out old set of composite keys
- Determine new set of index row keys from newly supplied metadata
- Determine the full set of new writes and updates to index rows, entity data and entity metadata
- Determine which index rows are no longer valid and should be removed
- Do all changes in one write batch
Note that we currently store indexes in different databases from the actual data. This gives us:
- faster queries
(as we don't have to distribute the query across the different servers)
- support for limit, cursors (for paging, etc), etc.
This is impossible with distributed queries, as all results must be returned, and sorted in-line.
For a delete:
- Look up metadata
- Look up index rows
- Delete all in one write batch
Ensuring Atomic Updates at row-level
For all these to work, we need to implement key-level locks for batches of keys.
i.e. you can get a lock for keys: 7, 9 and 13, then batch write them all, and
release the locks. This only works now while we use a single server.
Query Support
For starters, our query support will be limited, and will NOT include
- multiple inequality filters on a single property
(only support single inequality filter)
- sort order (Implementing sorting does not makes sense for ndb)
Configuration
A single configuration file is used called ndb.json. It lives in the root directory
from which the server is started. It includes:
- server location (defined later after sharding happens)
Integration with db package
The following things to note:
- ndb.json keeps mapping of kind to int
Caching
Some caching will be configured in the database. ndb will not do any extra
caching beyond what is performed by the app (based on caching configuration).
Sharding
A shard is the unit of consistency. All entities in a shard can be edited in
an atomic fashion. This is because a shard is always managed by a single server.
Entities can be sharded by shardid, while indexes can be sharded
by index name.
Note the following limitations once sharding is implemented:
- Writing to indexes becomes a best effort.
Indexes will not be written to the same datastore as the actual data,
and so there's a possibility that data was written but index writing failed.
We chose not to store indexes alongside entities to allow us support:
- Limit on queries:
Limits are impractical if you have to hit 1000 servers
- Paging:
Paging is practical if we are only looking at a single index
- Non-Distributed queries:
To do a query, only check one place.
ndb.json will contain information on:
- indexes (and how/where to access them)
- servers and what data they serve (ie which servers serve which shards)
Package Contents
This package contains everything related to ndb and napp:
- Context implementation
- low level driver
- database management functionality as described above
- etc.
There is no extra napp package.
Starting
Initially, we will use a small shard range (shardid 1-16) and keep
everything on a single machine. In the future, we can change the range
to start creating entities on new shards.
Single Server simplification
In a single server, things are simplified:
- There might be no need for an external cache server, since the database
already caches msgpack-encoded byte streams.
- All shards are on single server. There's much less need to have to distribute
configuration changes.
- Indexes and Data live on same server.
- Shards do not move between servers.
- Mostly no need to synchronize ndb.json changes across a collection of nodes.
In a distributed deployment, none of these hold.
Distributed Deployment
ndb must be developed in a way that can support multiple frontends and multiple
backends.
In a multi-server setup, we will use:
- 1 database process per group of shards
- n memcache processes
We need a way that changes to ndb.json are auto-detected and server
auto-reloaded, since it could hold information about cache servers, etc.
*/
package ndb
/*
Notes:
- Need a way to denote that a new id cannot be retrieved for a specific
type. Actually, leave that as an app specific problem. For example,
blackannex app knows that User and UserProfile should share same id,
so it always creates them together.
- Considered keeping indexes with the data. It makes writes cheaper, since
writes will typically occur at one place.
However, it makes sharding harder, since indexes must stay on one machine.
NOTE:
=====
- When locking, always lock on either bytes of the key (Key.Bytes()),
or the int32 value for "locking" the shard/type in the store so as to
make a New Key.
*/
/*
BUG(ugorji): The bugs are below:
- THE FOLLOWING BELOW ARE CURRENT NDB BUGS:
=========================================
- We should support the ability for two different entities to share the
same id e.g. User and UserProfile. It currently has issues with ndb,
but we should have a simple solution.
- THE FOLLOWING BELOW ARE UN-IMPLEMENTED NDB FEATURES (lETS TRACK THEM):
======================================================================
- Rich queries are not supported (just basic queries)
Does not support: cursors, order, multiple inequality filters.
(not currently used in codebase)
To support sort orders, we need to
- Setup a reverse comparator
- Pass that to write options when writing reverse filters
- app.QueryFilter now has a sortDesc field (defaults to false)
Unfortunately, this will not work currently, because we keep everything in one
database. A database is opened with a comparator, not a comparator passed to
a write operation.
- THE FOLLOWING NDB ISSUES BELOW ARE FIXED, BUT LET'S STILL TRACK THEM:
=====================================================================
- Shape is supported (but not fully investigated).
(not currently used in codebase)
*/