Skip to content
This repository has been archived by the owner on Apr 10, 2024. It is now read-only.

Limit object reads to specific databases/collections #15

Closed
edanaher opened this issue Sep 16, 2020 · 5 comments
Closed

Limit object reads to specific databases/collections #15

edanaher opened this issue Sep 16, 2020 · 5 comments

Comments

@edanaher
Copy link

It seems that, while moresql should only care about the collections in its configuration file, the underlying gtm library is actually trying to read every record updated in the oplog. It would be nice (and require fewer permissions and possibly be more efficient) if moresql could read only from the oplog and configured collections, and ignore all other changes.

In particular, it seems that gtm's Flush function is attempting to find all results from any oplog change to pass up to its user, and so when there's a change to system.sessions, we attempt to load that data and fail with a permissions error.

I suspect this would require adding additional functionality to gtm to limit which databases/collections it watches, but wanted to check here first to see if I'm missing anything.

Additional Details

After running for a bit (between a few seconds and a couple minutes), moresql is dying with

FATA[0052] Exiting: Mongo tailer returned error not authorized on config to execute command { find: "system.sessions", filter: { _id: { $in: [ { id: UUID("09f2d949-ddbd-410f-9f3c-ac6cd5ed609b"), uid: BinData(0, A692C2C7CEBFE5DFCBBE61D42E17B4B0617EDE8A36FFCFD7B4FCE10C9D322C4B) }, [snip] ) } ] } }, skip: 0, $db: "config" } 

We're using a hosted mongo solution that doesn't seem to allow us to grant read access on the system collections, so just giving the moresql user the relevant permissions isn't an option.

By adding a line to gtm's Flush function that skips any update on the config DB, I was able to suppress this error. It would be nice to have a more robust solution, though.

edanaher added a commit to AdmitHub/moresql that referenced this issue Sep 16, 2020
We can't seem to get the right permissions on Atlas, so just don't do
that.  See also zph#15
@zph
Copy link
Owner

zph commented Dec 10, 2020

👋 @edanaher thank you for the issue.

Apologies for the long delayed response, I had notifications incorrectly set 😬 .

Hmm... I agree this would be a nice improvement to the library and reduce load. I didn't hit that limitation with the two mongo hosts I've used at prior companies but would happily incorporate those changes.

Are you interested to propose it upstream to GTM and then we could update moresql against that?

@edanaher
Copy link
Author

edanaher commented Jan 8, 2021

It turns out gtm has come a long way in the past few years - as rwynn noted, it now supports a NamespaceFilter function that allows for exactly what I'm asking for here.

It also apparently now supports change streams, which are a more principled (and officially supported) way to watch the oplog that lets Mongo itself do the filtering.

I'm not sure how painful it would be to upgrade gtm, but assuming that's feasible, it seems like it should be relatively straightforward to add a NamespaceFilter to only grab events from the collections that moresql is watching. I'd expect switching to change streams to be a bit more work, but I haven't looked at it.

@zph, do you think there's a chance you'll have time for this in the near future? As much as I'd like to take a shot at this, realistically it's not likely to be a priority for me. (See, for example, the fact that it took me almost a month to take another look at this after your reply... ;) )

@zph
Copy link
Owner

zph commented Jan 8, 2021

@edanaher Good news, my notifications worked correctly this time 😆 .

This isn't a high priority for me, so it would be just for the fun and relaxation of working on it.... which I can't promise any specific timeline for. But if I do work on it, I'll keep you posted here.

So my questions are:

  • How useful or important is this feature to you given that you have a patch in place?
  • Would you be available or interested to test out changes if I do the upgrade? I'm no longer working in companies where we have this running in production and thus harder to do any beta testing in a stage environment.

🤔 I do like the idea of upgrading and hewing to more officially supported mechanisms 🤔 .

@edanaher
Copy link
Author

edanaher commented Jan 8, 2021

Fair enough; we'll see which of us gets to it first. To answer your questions:

  • It's not particularly important to us - if it was, I'd spend work time on this and likely get it done within the next week. But since the patch in place is good enough, this is a "nice to have". As such, I can't justify prioritizing it that highly from a work perspective.
  • I'd be happy to test out the changes - while it's not urgent, we would prefer to have a better solution in place in the medium-to-long term. If there's a patch in place, we'll happily test it out in staging and, assuming it works well, roll it out to production.

@github-actions
Copy link

github-actions bot commented Feb 8, 2021

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants