Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

abstract out SQS and S3 #15

Merged
merged 31 commits into from
Jun 2, 2015
Merged

abstract out SQS and S3 #15

merged 31 commits into from
Jun 2, 2015

Conversation

sevko
Copy link
Contributor

@sevko sevko commented May 27, 2015

This pull requests adds (or improves on already existing) drop-in replacements for S3 and SQS that should make it easier for developers to work on tilequeue locally, since they won't have to bother setting up those AWS services.

  • beef up tilequeue.queue.file.OutputFileQueue, an on-disk, file-based alternative to SQS, adding locking, read functionality, etc.
  • make tilequeue.store.TileDirectory (renamed from TileFile) write files out to a directory.
  • add tests for both of the above
  • turn the config's aws.s3 and awd.sqs properties into top-level queue and store keys, since both the queue and tile-store are now generic and not tied to those services.
  • some other small changes were involved in changing code that was hardcoded against SQS/S3 (like how we get timestamps from each CoordMessage).

I can currently tilequeue seed and tilequeue process in parallel and generate all of the tiles to an on-disk directory, so it worketh. @rmarianski , @baldur , please look this over when you get a chance.

sevko added 20 commits May 22, 2015 14:58
tilequeue/queue/file.py
        -Implement the `read()` and `job_done()` methods.
        -Make `close()` remove all `read()` elements from the queue.
tilequeue/tile.py
	-In `tilequeue/worker.py:325`, the `sqs_handle` attribute of a
	`CoordMessage` instance is expected to have a bunch of
	properties specific to the messages received from SQS (ie the
	real thing). To stub it out properly, the timestamp should be
	passed to the `CoordMessage` constructor and made an instance
	variable instead.
tilequeue/queue/file.py
	-Use an `RLock` in all of the file manipulation functions, since
	multiple readers might be using the same file queue and would
	otherwise corrupt it.
tilequeue/(tile, queue/file).py
	-Since the stubbed out queues (`queue.memory`, `queue.file`)
	creates CoordMessage with simply `None` timestamps, it's a good
	idea to just make it optional.
tilequeue/command.py
	-If `log-queue-sizes` is disabled in the config, then
	`queue_printer_thread_stop` will be set to `None` and an
	exception will be thrown when `queue_printer_thread_stop.set()`
	is called.
tilequeue/worker.py
	-90b14d4 removed the expectation of `sqs_handle` containing a
	method that allows you to retrieve its "sent to queue"
	timestamp, to allow SQS to be stubbed out with something like
	`queue.file`, and added a timestamp instance variable to
	`CoordMessage` to allow it to be set on a per-queue basis.
	`SqsQueueWriter` is still getting the `SentTimestamp` via
	`sqs_handle`, rather than from the initially received
	`CoordMessage`'s `timestamp` attribute. Add it to the `metadata`
	dict inside `SqsQueueReader` instead, and use that inside
	`SqsQueueWriter`.
tilequeue/queue/file.py
	-The `lock` instance variable shouldn't be used outside of the
	`OutputFileQueue`'s methods.
tilequeue/queue/file.py
	-On second thought, the convention of prefixing private instance
	variables with an underscore isn't consistently adhered to
	elsewhere in the codebase, so don't bother using it, or we'll
	end up with a mix of classes that do and don't use it.

This reverts commit 4417b3c.
tilequeue/queue/file.py
	-`readline()` will pick up new lines appended to the file,
	whereas `next()` will not since the iterator will just hit
	`StopIteration` and stop generating new lines. Use `readline()`
	instead, then, since it might be desirable to append something
	to the queue file and have tilequeue detect the new input
	without having to restart.
tilequeue/store.py
	-It's better to write the tiles to individual tiles, since
	that makes them easier to inspect and removes the need for
	locking the file pointer (even though there are multiple writer
	threads).
tilequeue/queue/file.py
	-01a8fcb made `OutputFileQueue.read()` use `readline()` instead
	of `next()`, but didn't update `OutputFileQueue.close()`, which
	uses a list comprehension to grab the rest of the file. Since
	`.read()` no longer uses the iteration protocol, `.close()` will
	start iterating from the beginning of the file. Use `.read()`
	instead of a list comprehension to only grab everything after
	what `.readline()` already picked up.
tilequeue/(queue/file, store).py
	-Add some API-style doc comments for clarity.
tilequeue/worker.py
	-It wasn't getting called before, which isn't a problem for
	`queue.sqs` (which has a simple `pass` method), but *needs* to
	be called for `queue.file` (which needs to close its file
	pointer before the process exits).
config.yaml.sample, tilequeue/config.py
	-Since `s3` and `sqs` now have drop-in file/directory based
	replacements for local development, the configuration for both
	`tilequeue`'s queue and store should be contained inside more
	generically named properties than `aws.s3` and `aws.sqs` in the
	config, like simply `queue` and `store`.
	-Change the sample config file and `config.py` to support them.
tilequeue/command.py
	-Most/all of the `tilequeue_*` methods create their own queue
	via `make_queue()`, even though `tilequeue_main()`, which calls
	them, creates a queue as well and passes it to them inside the
	`peripherals` object. Use that one instead of recreating one
	with `make_queue()`.
tilequeue/command.py
	-The past few commits finalized a file-based queue
	(`tilequeue.queue.file.OutputFileQueue`) and a
	directory/file-based store (`tilequeue.store.TileDirectory`).
	-Integrate them in `tilequeue.command`.
	-Define a `make_store()` function, similar to `make_queue()`,
	which is responsible for creating either an s3 or directory
	store.
	-Add some logic to `make_queue()` to ensure that the desired
	path for the file-queue does not exist/is a file.
tests/queue/test_file.py
	-Add a bunch of `unittest` unit tests for most/all of the
	functions in `tilequeue.queue.file`, as per all of the recent
	work there.
tests/store.py
	-Add a tests file for `tilequeue.store`.
	-Add a test suite for the `TileDirectory` class, with a single
	unit test.
@sevko
Copy link
Contributor Author

sevko commented May 27, 2015

Oh, woah, I broke literally all the tests.
...fixing...

sevko added 3 commits May 27, 2015 10:56
tilequeue/(config, queue/file, store).py
	-Use single quotes instead of double quotes for consistency with
	the rest of the code.
type: directory
name: tiles
path: osm
reduced-redundancy: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have the sample config default to the types that we use in production, and have a comment for all the other types? Or maybe a separate commented line for each type?

queue_printer_thread_stop.set()

if queue_printer_thread_stop:
queue_printer_thread_stop.set()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch here too!

@rmarianski
Copy link
Member

This looks good. ⚡ I put in some comments on what I noticed, which we can discuss further.

Let's be careful around merging this into master. I'd like to have the necessary deployment changes surrounding the config file ready before we hit the green button.

sevko added 7 commits May 28, 2015 11:02
config.yaml.sample
	-Address
	https://github.com/mapzen/tilequeue/pull/15/files#r31184048 and
	revert the default queue/store to sqs/s3. Document the
	alternative options.
tests/(store, queue/test_file).py
	-The `Coordinate()` constructor has coordinate arguments in an
	order that doesn't match the format used throughout tilequeue,
	so address https://github.com/mapzen/tilequeue/pull/15/files#r31184405
	and use keyword args to make things explicit.
tests/queue/test_file.py
	-Address https://github.com/mapzen/tilequeue/pull/15/files#r31184687
	and make `_write_tiles_to_file()` accept a string parameter.
	Rename it to `_write_str_to_file()` for clarity.
tests/test_store.py
	-Use a temporary directory via the `tempfile` module for the
	tile store, as per
	https://github.com/mapzen/tilequeue/pull/15/files#r31184725.
	-Move directory creation/removal into `setUp()` and
	`tearDown()`, to make sure that the directory gets blown away
	even when an exception is raised in the test function(s).
tilequeue/command.py
	-Apparently, this isn't necessary if the file doesn't exist, as
	`a+` will go ahead and create it for you. Address
	https://github.com/mapzen/tilequeue/pull/15/files#r31184867.
tilequeue/config.py
	-The incorrect key in the config was getting mapped to
	`Configuration.s3_bucket`.
@sevko
Copy link
Contributor Author

sevko commented May 28, 2015

I addressed all your comments in a number of follow-up commits.

Before we move forward, did you try running tilequeue off this branch against SQS and S3? I've only been able to test if with the on-disk alternatives that this PR introduces, so I don't know if I mistakenly broke anything on the SQS/S3 side (I don't have access to those, so can't use them).

rmarianski added a commit that referenced this pull request Jun 2, 2015
@rmarianski rmarianski merged commit a691e7c into master Jun 2, 2015
@rmarianski rmarianski deleted the abstract-out-sqs branch June 2, 2015 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants