Limit object reads to specific databases/collections #15
Comments
We can't seem to get the right permissions on Atlas, so just don't do that. See also zph#15
👋 @edanaher thank you for the issue. Apologies for the long delayed response, I had notifications incorrectly set 😬 . Hmm... I agree this would be a nice improvement to the library and reduce load. I didn't hit that limitation with the two mongo hosts I've used at prior companies but would happily incorporate those changes. Are you interested to propose it upstream to GTM and then we could update moresql against that? |
It turns out gtm has come a long way in the past few years - as rwynn noted, it now supports a NamespaceFilter function that allows for exactly what I'm asking for here. It also apparently now supports change streams, which are a more principled (and officially supported) way to watch the oplog that lets Mongo itself do the filtering. I'm not sure how painful it would be to upgrade gtm, but assuming that's feasible, it seems like it should be relatively straightforward to add a NamespaceFilter to only grab events from the collections that moresql is watching. I'd expect switching to change streams to be a bit more work, but I haven't looked at it. @zph, do you think there's a chance you'll have time for this in the near future? As much as I'd like to take a shot at this, realistically it's not likely to be a priority for me. (See, for example, the fact that it took me almost a month to take another look at this after your reply... ;) ) |
@edanaher Good news, my notifications worked correctly this time 😆 . This isn't a high priority for me, so it would be just for the fun and relaxation of working on it.... which I can't promise any specific timeline for. But if I do work on it, I'll keep you posted here. So my questions are:
🤔 I do like the idea of upgrading and hewing to more officially supported mechanisms 🤔 . |
Fair enough; we'll see which of us gets to it first. To answer your questions:
|
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days |
It seems that, while moresql should only care about the collections in its configuration file, the underlying gtm library is actually trying to read every record updated in the oplog. It would be nice (and require fewer permissions and possibly be more efficient) if moresql could read only from the oplog and configured collections, and ignore all other changes.
In particular, it seems that gtm's
Flush
function is attempting to find all results from any oplog change to pass up to its user, and so when there's a change tosystem.sessions
, we attempt to load that data and fail with a permissions error.I suspect this would require adding additional functionality to gtm to limit which databases/collections it watches, but wanted to check here first to see if I'm missing anything.
Additional Details
After running for a bit (between a few seconds and a couple minutes), moresql is dying with
We're using a hosted mongo solution that doesn't seem to allow us to grant read access on the system collections, so just giving the moresql user the relevant permissions isn't an option.
By adding a line to gtm's Flush function that skips any update on the
config
DB, I was able to suppress this error. It would be nice to have a more robust solution, though.The text was updated successfully, but these errors were encountered: