-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
abstract out SQS and S3 #15
Conversation
tilequeue/queue/file.py -Implement the `read()` and `job_done()` methods. -Make `close()` remove all `read()` elements from the queue.
tilequeue/tile.py -In `tilequeue/worker.py:325`, the `sqs_handle` attribute of a `CoordMessage` instance is expected to have a bunch of properties specific to the messages received from SQS (ie the real thing). To stub it out properly, the timestamp should be passed to the `CoordMessage` constructor and made an instance variable instead.
tilequeue/queue/file.py -Use an `RLock` in all of the file manipulation functions, since multiple readers might be using the same file queue and would otherwise corrupt it.
tilequeue/(tile, queue/file).py -Since the stubbed out queues (`queue.memory`, `queue.file`) creates CoordMessage with simply `None` timestamps, it's a good idea to just make it optional.
tilequeue/command.py -If `log-queue-sizes` is disabled in the config, then `queue_printer_thread_stop` will be set to `None` and an exception will be thrown when `queue_printer_thread_stop.set()` is called.
tilequeue/worker.py -90b14d4 removed the expectation of `sqs_handle` containing a method that allows you to retrieve its "sent to queue" timestamp, to allow SQS to be stubbed out with something like `queue.file`, and added a timestamp instance variable to `CoordMessage` to allow it to be set on a per-queue basis. `SqsQueueWriter` is still getting the `SentTimestamp` via `sqs_handle`, rather than from the initially received `CoordMessage`'s `timestamp` attribute. Add it to the `metadata` dict inside `SqsQueueReader` instead, and use that inside `SqsQueueWriter`.
tilequeue/queue/file.py -The `lock` instance variable shouldn't be used outside of the `OutputFileQueue`'s methods.
tilequeue/queue/file.py -On second thought, the convention of prefixing private instance variables with an underscore isn't consistently adhered to elsewhere in the codebase, so don't bother using it, or we'll end up with a mix of classes that do and don't use it. This reverts commit 4417b3c.
tilequeue/queue/file.py -`readline()` will pick up new lines appended to the file, whereas `next()` will not since the iterator will just hit `StopIteration` and stop generating new lines. Use `readline()` instead, then, since it might be desirable to append something to the queue file and have tilequeue detect the new input without having to restart.
tilequeue/store.py -It's better to write the tiles to individual tiles, since that makes them easier to inspect and removes the need for locking the file pointer (even though there are multiple writer threads).
tilequeue/queue/file.py -01a8fcb made `OutputFileQueue.read()` use `readline()` instead of `next()`, but didn't update `OutputFileQueue.close()`, which uses a list comprehension to grab the rest of the file. Since `.read()` no longer uses the iteration protocol, `.close()` will start iterating from the beginning of the file. Use `.read()` instead of a list comprehension to only grab everything after what `.readline()` already picked up.
tilequeue/(queue/file, store).py -Add some API-style doc comments for clarity.
tilequeue/worker.py -It wasn't getting called before, which isn't a problem for `queue.sqs` (which has a simple `pass` method), but *needs* to be called for `queue.file` (which needs to close its file pointer before the process exits).
config.yaml.sample, tilequeue/config.py -Since `s3` and `sqs` now have drop-in file/directory based replacements for local development, the configuration for both `tilequeue`'s queue and store should be contained inside more generically named properties than `aws.s3` and `aws.sqs` in the config, like simply `queue` and `store`. -Change the sample config file and `config.py` to support them.
tilequeue/command.py -Most/all of the `tilequeue_*` methods create their own queue via `make_queue()`, even though `tilequeue_main()`, which calls them, creates a queue as well and passes it to them inside the `peripherals` object. Use that one instead of recreating one with `make_queue()`.
tilequeue/command.py -The past few commits finalized a file-based queue (`tilequeue.queue.file.OutputFileQueue`) and a directory/file-based store (`tilequeue.store.TileDirectory`). -Integrate them in `tilequeue.command`. -Define a `make_store()` function, similar to `make_queue()`, which is responsible for creating either an s3 or directory store. -Add some logic to `make_queue()` to ensure that the desired path for the file-queue does not exist/is a file.
tests/queue/test_file.py -Add a bunch of `unittest` unit tests for most/all of the functions in `tilequeue.queue.file`, as per all of the recent work there.
tests/store.py -Add a tests file for `tilequeue.store`. -Add a test suite for the `TileDirectory` class, with a single unit test.
Oh, woah, I broke literally all the tests. |
tilequeue/(config, queue/file, store).py -Use single quotes instead of double quotes for consistency with the rest of the code.
type: directory | ||
name: tiles | ||
path: osm | ||
reduced-redundancy: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have the sample config default to the types that we use in production, and have a comment for all the other types? Or maybe a separate commented line for each type?
queue_printer_thread_stop.set() | ||
|
||
if queue_printer_thread_stop: | ||
queue_printer_thread_stop.set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch here too!
This looks good. ⚡ I put in some comments on what I noticed, which we can discuss further. Let's be careful around merging this into master. I'd like to have the necessary deployment changes surrounding the config file ready before we hit the green button. |
config.yaml.sample -Address https://github.com/mapzen/tilequeue/pull/15/files#r31184048 and revert the default queue/store to sqs/s3. Document the alternative options.
tests/(store, queue/test_file).py -The `Coordinate()` constructor has coordinate arguments in an order that doesn't match the format used throughout tilequeue, so address https://github.com/mapzen/tilequeue/pull/15/files#r31184405 and use keyword args to make things explicit.
tests/queue/test_file.py -Address https://github.com/mapzen/tilequeue/pull/15/files#r31184687 and make `_write_tiles_to_file()` accept a string parameter. Rename it to `_write_str_to_file()` for clarity.
tests/test_store.py -Use a temporary directory via the `tempfile` module for the tile store, as per https://github.com/mapzen/tilequeue/pull/15/files#r31184725. -Move directory creation/removal into `setUp()` and `tearDown()`, to make sure that the directory gets blown away even when an exception is raised in the test function(s).
tilequeue/command.py -Apparently, this isn't necessary if the file doesn't exist, as `a+` will go ahead and create it for you. Address https://github.com/mapzen/tilequeue/pull/15/files#r31184867.
tilequeue/config.py -The incorrect key in the config was getting mapped to `Configuration.s3_bucket`.
I addressed all your comments in a number of follow-up commits. Before we move forward, did you try running tilequeue off this branch against SQS and S3? I've only been able to test if with the on-disk alternatives that this PR introduces, so I don't know if I mistakenly broke anything on the SQS/S3 side (I don't have access to those, so can't use them). |
This pull requests adds (or improves on already existing) drop-in replacements for S3 and SQS that should make it easier for developers to work on tilequeue locally, since they won't have to bother setting up those AWS services.
tilequeue.queue.file.OutputFileQueue
, an on-disk, file-based alternative to SQS, adding locking, read functionality, etc.tilequeue.store.TileDirectory
(renamed fromTileFile
) write files out to a directory.aws.s3
andawd.sqs
properties into top-levelqueue
andstore
keys, since both the queue and tile-store are now generic and not tied to those services.CoordMessage
).I can currently
tilequeue seed
andtilequeue process
in parallel and generate all of the tiles to an on-disk directory, so it worketh. @rmarianski , @baldur , please look this over when you get a chance.