Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting application_name to current application user on global db object #418

Closed
lloydh opened this issue Oct 14, 2017 · 36 comments
Closed
Labels

Comments

@lloydh
Copy link

lloydh commented Oct 14, 2017

I'm trying to achieve something similar to #71, #100 —  setting the application_name to include the current application user's id for use in audit tables (via trigger functions).

I haven't been able to set the application_name on a configured db object (despite this SO post) and the furthest I've got is this example from Vitaly of using event extend:

// with extra parameter:
var options = {
    extend: obj=> {
        obj.myMethodWithParam = (query, values, context)=> {
            // append/pre-pend query parameters, using the context;
            return obj.query(query, values);
        }
    }
};

How can I set application_name and would use of extend mean having to use db.myMethodWithParam vs. db.any, db.one etc. or does it 'extend' the base methods?

Thanks,
Lloyd

@vitaly-t
Copy link
Owner

vitaly-t commented Oct 14, 2017

Do I understand it right that you want to extend the protocol according to what database it is?

If so, then you should use the Database Context, see the Database constructor.

var options = {
    extend: (obj, dc) => {
        // check the `dc` and apply the extension according to the context
    }
};

The dc is available everywhere in the protocol ;)

does it 'extend' the base methods?

You add new methods that can call an existing method + do something else.

@lloydh
Copy link
Author

lloydh commented Oct 14, 2017

Sorry if my question was misleading, or perhaps I've been looking at the wrong solutions.

In my case there's only 1 database and the goal is to be able to set the logged-in application user's id in some way so that this can be used for an audit trail.

@vitaly-t
Copy link
Owner

vitaly-t commented Oct 14, 2017

I just did a test against the latest pg-promise v7.0.1 + PostgreSQL 10, and it works as expected:

const pgp = require('pg-promise')();

const appName = 'my-app';

const db = pgp({
    database: 'pg_promise_test',
    port: 5432,
    user: 'postgres',
    application_name: appName
});

db.any('SELECT application_name FROM pg_stat_activity WHERE application_name = $1', appName)
    .then(data => {
        console.log(data);
        pgp.end();
    });

Outputs:

[ anonymous { application_name: 'my-app' } ]

@lloydh
Copy link
Author

lloydh commented Oct 14, 2017

That would only work if I created a new db object for each request with the user id from that request but my understanding is that's horrible for performance.

@vitaly-t
Copy link
Owner

vitaly-t commented Oct 14, 2017

You want the interactive user from the request? That's quite unrelated to all of this. If you have the user from the request, you log right there, so what's the problem?

And your question in this case is completely misleading. It got nothing to do with event extend or the application_name. It's just basic logging logic that you implement yourself.

@vitaly-t
Copy link
Owner

Also, what do you think your method myMethodWithParam should do? You cannot amend query like that.

You can either execute two queries, like log insert + the actual query, or you can use a template for the query, like from the external SQL file that would expect the extra parameters, so it can amend it.

@lloydh
Copy link
Author

lloydh commented Oct 14, 2017

I hadn't figured out what the myMethodWithParam example was going to do but I believe the author of that post was trying to accomplish the same thing as me.

I have the user id at the application layer so in the business logic I could insert into the foo_audit table whenever changing foo. This works but will add noise to the business logic and is more prone to errors should another dev update foo without manually updating foo_audit correctly.

I think I follow your suggestion of using a template but the audit for inserts will need the id returned, so two statements (or a common table expression) are required.

After insert/update trigger functions seem to solve things nicely however the issue is having access to the user id from the application layer — and pg-promise is that bridge.

If there's not an elegant way to achieve this then I'll go with one of the other options.

@vitaly-t
Copy link
Owner

I cannot advise you on how to template it into a single query, as it depends on the database architecture, and what it is exactly you are expecting.

At best, I can show you how you can do it via 2 queries and event extend:

var options = {
    extend: obj => {
        obj.logContext = context => {
            return obj.none('INSERT INTO logs(context) VALUES($1)', context);
        };
        obj.queryWithContext = (query, values, context) => {
            return obj.logContext(context).then(() => {
                return obj.query(query, values);
            });
        }
    }
};

@lloydh
Copy link
Author

lloydh commented Oct 14, 2017

Thanks for the help.
Lloyd

@lloydh lloydh closed this as completed Oct 14, 2017
@jmealo
Copy link

jmealo commented Oct 24, 2017

@vitaly-t: Hello again! Since this keeps coming up, for the record: what's the recommend way to prepend/append a single SET statement with a per-request unique identifier*?

*Allowing re-use of connections/pg-instances between users/requests and use of pg-monitor.

SET application_name = <per-request-unique-uidentifier>

I've been version locked with my nasty hack and am just now trying to fix this the correct way :-)

@vitaly-t
Copy link
Owner

vitaly-t commented Oct 25, 2017

@jmealo you want to execute SET application_name on every HTTP request? That doesn't sound right, because due to parallel HTTP on top of parallel query execution you are likely to keep overriding it, thus ending up with invalid current application.

The only way it won't override the setting is by using a transaction. But using a transaction for every HTTP request is a horrible idea, because transactions are blocking operations.

So I don't think you'd want to do it 😉

@vitaly-t
Copy link
Owner

vitaly-t commented Oct 25, 2017

Anyhow, as a pure exercise, you can play with it. One approach is to extend the protocol with your custom task method that will execute SET application_name = ... in the beginning of every task:

const initOptions = {
    extend: obj => {
        obj.appTask = (appName, cb) => {
            return obj.task('app-task', t => {
                cb = cb.bind(t, t);
                return t.none('SET application_name = $1', appName).then(cb);
            });
        }
    }
};

Then you can call:

db.appTask('app-name', t => {
    // do the task....
}).then().catch();

@jmealo
Copy link

jmealo commented Oct 27, 2017

@vitaly-t Use case: if you set the application name GUC to a combination of a session id and correlation id you can match any PostgreSQL log output to a given correlation id or web application session. The application_name GUC is the only way to influence the CSV logging at run-time. Extending Postgres to do this would be an issue for those using hosted solutions.

So to be clear, I do not need a transaction per query, I need a SET statement prepended to allow queries issued by pg-promise so that the query is attributed to the correct correlation id. It's strictly for correlating logs and can be used in RLS rules to reference the web application user.

If there is something I misunderstand about the internals of pg-promise that would break the above use case please let me know, I've had a shim in place in production for 2 years but I'd really like to get rid of it and use the latest version of pg-promise. The shim wraps each method and munges the input strings. I only have very limited use of the library because of this.

@vitaly-t
Copy link
Owner

vitaly-t commented Oct 28, 2017

@jmealo The example I showed above is a task, not a transaction.

And of course you can alternatively prepend a set to each query either manually or by extending the protocol.

Here, for example, we introduce a new special appQuery method that extends the existing query method by prepending each query with SET application_name = :

const initOptions = {
    extend: obj => {
        obj.appQuery = (appName, query, values, qrm) => {
            const sql = pgp.helpers.concat([
                {query: 'SET application_name = $1', values: appName},
                {query, values}
            ]);
            return obj.query(sql, [], qrm);
        }
    }
};

It would be well-performing, as the query is concatenated and executed only once.

See also method helper.concat ;)

@ffflabs
Copy link

ffflabs commented Apr 19, 2018

You can also set the PGAPPNAME environment variable. If you are using dotenv in yout project, then it will be sent by default when opening the connection. This also works natively on Heroku.

@jmealo
Copy link

jmealo commented Apr 19, 2018

@amenadiel @vitaly-t Thank you for your suggestions, however, they do not accomplish the stated goal, just to re-iterate:

  • The app name can be included in PostgreSQL logging.
  • App name is the only value that can be output via standard logging AND set per-query via GUC.
  • Setting the app name enables us to output an arbitrary value that will be associated with a given query in the logging facilities provided by PostgreSQL.
  • If we prepend a SET statement to each query that sets the application name to a correlation/request id/web application user name we're able to do application-level/user-level tracing without adding logging facilities to our web application.
  • To make absolutely sure that every query issued by pg-promise on behalf of a given request (or given user), we'd want to inject values into a SET statement along with each query.

Every suggestion that I've received does not accomplish the goals above. I had to wrap pg-promise and am locked into an older version which allowed such monkey patching.

When using something like Koa where we can have a middleware that presents pg-promise to the request context it would be hepful to set these state variables without entirely wrapping pg-promise.

@vitaly-t
Copy link
Owner

Another thought, if the number of apps is very limited, you can throw each app-specific stuff into its own schema, and then use separate database objects ;)

@jmealo
Copy link

jmealo commented Apr 19, 2018

@vitaly-t: This works with older versions of pg-promise but I cannot upgrade, hopefully this makes my user case clear: https://gist.github.com/jmealo/58c96de7add64ebbab86a637a46d3417

guc = {
    'spark.user_id': ctx.userId,
    'spark.role': ctx.role,
    'spark.request_id': requestId,
    application_name: `spark-api_${ctx.username}_${requestId}`
};

This populates a set of GUC variables that can be used both for logging and row-level security.

The goal here is to enforce that a given set of GUC variables be SET before issuing every query leaving pg-promise. The GUC variables depend on some state, so there would need to be a way to bind that state.

pg-promise has tons of facilities for modifying the query, however, I have been unable to find one that will work appropriately within a request/response workflow where we want to use state from the request in the GUC variables (see example above).

@abeluck
Copy link

abeluck commented Jan 24, 2020

@jmealo This is an old ticket now, but I want to do the same, did you ever land on a good solution? I'm using Koa2 and want to be able to set GUC variables within the request workflow. It seems a transaction per request is the way to go :?

@jmealo
Copy link

jmealo commented Feb 12, 2020

@abeluck: I just stayed locked at an older version. If a transaction per-request works for your application, that should be safe but may not meet your performance requirements. This gist may be a good starting point.

IIRC, @vitaly-t warned that this wrapper might break some guarantees with tasks. I encountered no issues by prepending the queries. It's worth noting that I limited most requests to a single round trip using a CTE where I would grab everything in one go.

If you design your app/schema/queries around the premise that you want end-to-end logging, multi-tenancy (using schemas) and RLS using application roles rather than PostgreSQL roles and are going to prepend each query with the correct application name you should be fine. You can enforce this pretty easily by requiring the GUC to be set via RLS and then any logic errors should be caught in development. YMMV. Good luck!

@vitaly-t
Copy link
Owner

I generally advise against wrapping base methods like that, because it won't work within tasks and transactions. If you want to extend the protocol, you should use event extend instead.

@jmealo
Copy link

jmealo commented Apr 12, 2020

@vitaly-t: I hope that you are healthy/well. Please consider the use case of row-level-security that uses web application roles instead of database roles:

https://www.graphile.org/postgraphile/security/#how-it-works
https://postgrest.org/zh/stable/auth.html

We need an API that allows passing in an application context (in our case, an http request) that will be used to populate settings (GUC) based upon that context by prepending SET commands to outgoing queries.

Here is an example of one such API. .

How can we implement this with pg-promise?

I still don't see a way to use extend for this use case (set guc values based on an incoming web request [[ or custom context]] prior to issuing each query)

I'm working on a new project and I'd really like to have this functionality as both postgraphile and postgrest support using GUC settings to enforce RLS rules based on application roles to great success.

@vitaly-t
Copy link
Owner

vitaly-t commented Apr 13, 2020

@jmealo Something like this might help?

prepending SET commands to outgoing queries.

Do you intend to pretend SET to every single query? That would be not a good idea. Especially, inside a task or transaction, where you would end up prepending and executing useless SET queries that just repeat each other.

It is better to do inside a task or transaction, because there you can check for a fresh connection, and execute SET only for fresh connections...

await db.task('my-task', async t => {
    const isFreshConnection = t.ctx.useCount === 0;

    if(isFreshConnection) {
        await t.none('SET ...');
        // etc.
     }

   // then your queries...

});

@jmealo
Copy link

jmealo commented Apr 13, 2020

@vitaly-t: Thanks for the quick reply. I looked over #710 and what you have here and I'm still not sure how to expose full-functionality of pg-promise while requiring every query to execute a "pre-query" first. For my use case it's critical that there's no way to issue a query without the search_path and set_config/current_setting values being set / bound to the user session making the request/query.

Here's what I get by prepending queries:

  • The settings are used in row-level security rules
  • The search_path is used to select the correct tenant
  • Setting the application_name to a correlation id for each web request allows tracing of any outgoing queries made while fulfilling that request (application_name is available in the postgresql logs)

My primary concern is that I want to make it impossible for an outgoing query to be made without setting the search_path and settings.

If at all possible, I don't want it to be on developers to only use a subset of pg-promise features, or through later advanced usage somehow make a query without the SET statement.

@csrapr
Copy link

csrapr commented Feb 2, 2021

this is killing me, I've been reading issues in circles and I still don't know how to solve this problem

@vitaly-t
Copy link
Owner

vitaly-t commented Feb 2, 2021

I do not believe that this library creates any kind of limitation by itself. If you can do it within PostgreSQL, then you can do it here. There hasn't been a clear formulation of the solution within the server architecture, which made it impossible to offer any usable guidance as to how to approach this within the library.

@jmealo
Copy link

jmealo commented Feb 2, 2021

@vitaly-t: Many projects are allowing for this, the server supports it just fine using GUC:
https://postgrest.org/en/v7.0.0/auth.html
https://www.graphile.org/postgraphile/security/

With the API surface exposed right now, there's no way to do this with pg-promise without using a very old version and wrapping it like I did years ago, which negates some of the best features (but was fine for my use case).

The only alternative I could see which seems like a reach would be to use async local storage: https://nodejs.org/api/async_hooks.html

@csrapr: You could store the user/session in async local storage and then pull it out using one of the supported means of extending pg-promise and hopefully get some per-request data in.

@jmealo
Copy link

jmealo commented Feb 2, 2021

@vitaly-t: Would it be possible to modify the outgoing query from a query event? If you can manipulate the context you could use async hooks to do this here probably.

const http = require('http');
const { AsyncLocalStorage } = require('async_hooks');

const asyncLocalStorage = new AsyncLocalStorage();

// Instead of logWithId, this is your query event handler, where you'd PREPEND your `SET` statement
function logWithId(msg) {
  const id = asyncLocalStorage.getStore();
  console.log(`${id !== undefined ? id : '-'}:`, msg);
}

let idSeq = 0;
http.createServer((req, res) => {
  asyncLocalStorage.run(idSeq++, () => {
    logWithId('start');
    // Imagine any chain of async operations here
    setImmediate(() => {
      logWithId('finish');
      res.end();
    });
  });
}).listen(8080);

I didn't see anything else in the docs that would accomplish this. It seems like extend just adds new methods rather than extending existing ones?

@jmealo
Copy link

jmealo commented Feb 2, 2021

@csrapr: If there's no pg-promise sanctioned way to pass request/user context; you can monkey patch query on node-postgres. You could use async-local-storage to get a hold of your request/user/session and prepend your SET statement above the query. There's a performance penalty and monkey patches should be used sparingly. Let me know how you make out! If there's no solution here I may provide a library just for this use case.

For context, I store my ctx object from koa in async local storage and use it for logging and various places that I can't pass around context; which effectively is the problem we have here. We have no way of passing context and rewriting queries globally.

@vitaly-t
Copy link
Owner

vitaly-t commented Feb 2, 2021

I didn't see anything else in the docs that would accomplish this. It seems like extend just adds new methods rather than extending existing ones?

When following the repository pattern, as shown pg-promise-demo, you can have repository methods extending existing methods, whereby any of your repository methods can invoke methods from any other repository. This would propagate the current task/transaction context automatically.

@jmealo
Copy link

jmealo commented Feb 2, 2021

@vitaly-t: thanks, for those not using the repository pattern who want to prepend a query with a dynamic string with request context from async local storage, should we be using the query event and mutating the context or just monkey patch node-postgres?

@vitaly-t
Copy link
Owner

vitaly-t commented Feb 2, 2021

Event query is just a notification, it does not support any query mutation.

If you want to prepend a query in front of each other query, you can use extend event to add your own method that does it, i.e. prepends the required clause, and then just use that method everywhere, in place of the existing methods.

@jmealo
Copy link

jmealo commented Feb 2, 2021

I want to use the existing methods and I want it to happen uniformly; I think that's what everyone wants. Anything else is a security risk if one developer uses the wrong method. It would bypass auth. You can be smart and set up things so that the query will fail with the role you use with pg-promise if no GUC is set but it's relying on a lot of moving pieces versus just being able to unilaterally make sure every query can be authenticated based on the application user so that authorization can be enforced in DB.

@vitaly-t
Copy link
Owner

vitaly-t commented Feb 3, 2021

I gave you one solution, via repositories, you didn't want it. I gave you another, with an extended method, you didn't like it. I'm not gonna suggest any more solutions here of how to circumvent things that you would like.

I want to use the existing methods and I want it to happen uniformly; I think that's what everyone wants

I do not believe that this is what everyone wants. And even if somebody needs it, I have provided two approaches of how this can be done. After all, what you are doing is very non-standard, trying to hack queries like that.

@jmealo
Copy link

jmealo commented Feb 3, 2021

@csrapr: for what it's worth, I'd like to find a solution and will give this a shot when I can, but do see if postgrest/postgraphile meet your needs. They provide an API either REST or GraphQL and both support using JWT/sessions for Row-Level Security and auditing by reflecting over your database.

@vitaly-t: this keeps coming up. I'm not sure why you've been so critical of this idea but I respect your choice for how you handle your library and appreciate your time proposing solutions but none of the solutions address the problems that we're trying to address.

I pioneered this pattern and other projects I collaborated with picked it up or discovered it independently. It's a good pattern that's been verified in production and your library does actively prevent it IMO. I think I'm qualified to make that assessment.

I don't think locking down the prototype breaking my original solution of wrapping the methods differs from what you're suggesting in any meaningful way. It seems like a control issue :(

Library authors don't need to save everyone from themselves when they're working on the next big thing. A library preventing extensibility forces circumventing those preventions to implement this pattern. I'm sorry that we weren't able to meet somewhere in the middle but I'm done arguing the merits of this approach on your repo out of respect for all your hard work on an otherwise brilliant library.

@csrapr
Copy link

csrapr commented Feb 3, 2021

@jmealo We've been investigating a solution combining AsyncLocalStorage to store the request and setting the user schema before a query as we go through a request. I'll comment here when we reach a solution we like, and thank you and @vitaly-t for the help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants