-
-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error detail regression #380
Comments
We have another handful of other test that I believe are failing for the same underlying reason. We have logic that handles unique key violations as follows (I've noticed the clumsy catch and now replaced it with do {
try await Build(id: buildId,
versionId: trigger.versionId,
jobUrl: jobUrl,
platform: pair.platform,
status: .triggered,
swiftVersion: pair.swiftVersion)
.create(on: database)
} catch {
if let error = error as? PostgresError,
error.code == .uniqueViolation {
if let oldBuild = try await Build.query(on: database,
platform: pair.platform,
swiftVersion: pair.swiftVersion,
versionId: trigger.versionId) {
// Fluent doesn't allow modification of the buildId of an existing
// record, therefore we need to delete + create.
let newBuild = Build(id: buildId,
versionId: trigger.versionId,
buildCommand: oldBuild.buildCommand,
jobUrl: jobUrl,
platform: pair.platform,
status: .triggered,
swiftVersion: pair.swiftVersion)
try await database.transaction { tx in
try await oldBuild.delete(on: tx)
try await newBuild.create(on: tx)
}
}
} else {
throw error
}
} This seems to be falling through now to Is there another/better way to handle unique key violation errors in the latest fluent-postgres-driver/postgres-kit? (I could of course avoid the UK violation here by checking for existance first but the error case is very rare and I don't want to have to run the extra query on the happy path for every request.) |
I've just seen vapor/postgres-kit#244, which I'm guessing will address all this. |
@finestructure Are we good to close this with the linked PR? |
@0xTim I've just had a look and our errors are actually happening with that version (postgres-kit 2.11.2). It seems that all errors are now |
In particular the |
@gwynne ping |
There is an additional problem I've just run into while trying to switch over to func attach(to build: Build, on database: Database) async throws {
$build.id = try build.requireID()
build.$docUpload.id = try requireID()
try await database.transaction {
try await self.save(on: $0)
try await build.save(on: $0)
}
} It turns out that a func attach(to build: Build, on database: Database) async throws {
$build.id = try build.requireID()
build.$docUpload.id = try requireID()
do {
try await database.transaction {
do {
try await self.save(on: $0)
try await build.save(on: $0)
} catch {
print("in transaction: \(type(of: error))")
throw error
}
}
} catch {
print("outside transaction: \(type(of: error))")
throw error
}
}
The data still seems to be there. In fact the error reports as
|
|
You can also explicitly check whether an error is a constraint violation (including a uniqueness violation) by using the |
Thanks for taking a look, @gwynne ! Casting to } catch {
req.logger.critical("\(error)")
dump(error)
print("type(of: error)", type(of: error))
print("error as? PSQLError:", error as? PSQLError)
print("error as? DatabaseError:", error as? DatabaseError)
throw error
}
|
|
Either: req.logger.critical("\(String(reflecting: error))") or req.logger.report(error) should at least get you the debug information. The difference between the error type inside versus outside the transaction is an unfortunate side effect of the limitations of the workaround, although I think I can fix that specific limitation at the same time as fixing the DatabaseError conformance. (of course, what we really should be doing is getting #360 taken care of, so I can ditch the workaround completely) |
Thanks for the additional info, @gwynne ! I think I'll keep fluent-posgres-driver pinned to 2.6.2 for now until everything has settled down. If it helps, I can clean up and point you to the branch where I'm testing the migration. Maybe the additional context would be useful? |
@finestructure Thanks for reporting this. I have been struggling with a similar issue, seeing error |
Also had to do this, actually:
|
This has finally been addressed by #372; update your dependencies (and don't forget to un-pin A few critical notes:
|
Is there perhaps a way to restore the original error reporting behaviour with this new error type? The problem with For example, we have
This now fails with
This is from an error reported via This is having quite an impact on our server logs and on downstream systems looking at these logs. I'm sure there were good reasons to make this change, I'm just wondering if it's feasible to restore the original behaviour with these changes in place? It's going to be tricky for us to audit that we're reporting from everywhere in this new way and some of the behaviour we can't restore ( Plus the resulting errors would be much more verbose (but that's a change we can likely absorb ok, because in testing the original error is still contained within the bigger error message - that means matchers will still work 🤞). |
@finestructure |
Yeah we should never report critical errors in Fluent, though we used to allow the log level to be set for SQL logging - is that what you meant @finestructure ? |
@gwynne @0xTim It's entirely possible I'm messing that up somehow. Let me try and distill this into one test that's failing depending on what This is current main:
And this is current main when I unpin
What the test does is configure a XCTAssertEqual(logs.count, 1)
let log = try XCTUnwrap(logs.first)
XCTAssertEqual(log.level, .critical)
XCTAssertEqual(log.message, #"server: duplicate key value violates unique constraint "idx_repositories_owner_name" (_bt_check_unique)"#) There are no other code changes that could affect the change in assert level we're seeing, as far as I can see, just the Does help to explain the issue? Is there something wrong with the way we're using/testing this? |
To be honest, I'm at a loss as far as the log level goes - even a code search across the entire Vapor org doesn't show anywhere that we were doing anything at the |
@fabianfett BTW, told you the new conformance was a little verbose... 😜 |
@finestructure Oh, I keep forgetting to mention - for a check of this nature you can leverage |
Oooh, thanks so much @gwynne , that was exactly it. We had case let .failure(error) where error is PostgresError:
// Escalate database errors to critical
Current.logger().critical("\(error)")
try await recordError(database: database, error: error, stage: stage)
and that was now falling through to a more generic logging statement. |
Yes, I've actually been using that already, and you're right - this test is just to ensure our logging assumptions are being met. I do however have a to do to come back to I'll have a look and open a PR, ok? |
It'd be quite welcome! But be aware you'll have to add the requirement to the EDIT: Come to think of it, if you added |
@fabianfett Did I miss anything in the above re: an easier way to check for an appropriate |
Ah missed this edit... I've got the four repos forked and with branches for the change. Is this worth PRing then? I'd mentioned this to Fabian earlier that I miss the granularity we had from Postgres with the Code enum. I wonder if there'd be a way to preserve these codes in a generic fashion such that you can still test them depending on your backend. That could avoid having to come up with types of errors that don't exist universally. |
Turns out I'd actually forgotten to do I think some sort of Perhaps some |
Another thought: if enum State {
case true
case false
case unknown // undetermined, undefined?
} backends that can't resolve certain conditions would have a way to opt out. |
@gwynne @fabianfett I'm tempted to reopen this because I'm still struggling to retain the original behaviour but I wanted to check if I'm perhaps missing something obvious first. The problem is that this now obscures errors during development. For instance just now, in a test, I'm getting this error:
I suppose I could wrap the whole test each time this happens and do the casting but it feels like that's less than ideal. Would it be possible to restore the original test error reporting at least for |
Sorry, but I feed I need to reopen this issue to see if there's a better way of dealing with these error messages. I just got the following running the tests where I'd forgotten to start the docker container with the database:
In this case it's very easy to find the problem but imagine it's a more complex problem or you're a new contributor. Surely there has to be a better way to surface errors? The error message says the motivation for the initial change was to prevent leaking sensitive data. Would it not be possible to at least do this in release builds only, if it's not possible to restrict it only to errors that are actually sensitive? |
@finestructure We're still trying to come up with a good answer to this that's both safe and helpful. Our current thinking is to add an environment variable that switches on verbose-always errors (something like |
Thanks @gwynne , that would indeed be very helpful. Anything that's a one-off setup that we can also bake into the project so that contributors don't trip over this is great 🤗 As to naming, maybe even Although in effect we will enable this in general for |
I have this code:
which worked up through fluent-postgres-driver 2.6.2 and postgres-kit 2.10.1. It lets us know when a user tries to change their email address to an email address that is already claimed by someone else, violating a unique constraint on the table. I updated to the latest everything (fluent-postgres-driver 2.8.0, postgres-kit 2.12.0, postgres-nio 1.17.0) and now the error is a PSQLError instead of a PostgresError, so It looks like I can resolve it by doing
as a way to detect the code for a unique violation, though it feels a bit off to look up a value in PSQLError in the enum cases of PostgresError. @gwynne I wonder why this change (throwing PSQLError instead of PostgresError) was not considered a semver major change, since it was a breaking change. Is there another way I can find out when breaking changes are introduced to packages I'm using? Should I generally not be relying on Error types remaining unchanged? |
@rausnitz So there are a lot of factors in play here:
Also, to specifically address this:
Swift does not consider the specific errors thrown by a given API part of that API's contract (notwithstanding APIs which use That being said, it's easily arguable that the overhaul of PostgresNIO should have been a new major version. We went back and forth on this several times during the development, and at the end of the day the fact of the matter is that I probably made the wrong call when I talked Fabian into doing it the way we did; it was a decision made based on the lack of any consensus in Vapor as a whole (at least at the time) on how to handle versioning, and concern about how to leverage the new APIs in higher layers (PostgresKit and FluentPostgresDriver). Since that time there has been considerably more discussion and benefit of experience, and if we were doing it now, I think we'd go the other way. I probably will do exactly that with the MySQLNIO updates - but even if I don't, I can promise right now there will not be a repeat of this problem 😰. As for solutions to what we have with PostgresNIO now, there's considerable reluctance to provide a way to change PostgresNIO's default behavior, but the current plan is actually better than that - we're going to make |
I want to raise, that I disagree with this statement. Code that is not in use is very much useless. If we had tagged a major for PostgresNIO, we would have been on the hook to maintain two versions of the library. This is for a very simple reason: Most PostgresNIO users don't use it directly but indirectly through Fluent or SQLKit. Those users would have been stuck on the previous version of PostgresNIO until Fluent would have made a major release. Looking at it from today's perspective, this state would have been ongoing for two and a half years by now. For us maintainers this would have meant, that we need to fix bugs in the PostgresNIO With the approach of slowly changing the underpinnings of PostgresNIO we have enabled huge performance and stability improvements for all users. All Fluent users benefit from those changes. This is especially important as we have fixed tons of edge cases, that are extremely hard to debug for users, that are not comfortable with the Postgres wire protocol. Thanks to the huge adoption that the rewrite has seen (thanks to us not tagging a major release), we are now extremely confident in which APIs work and which don't. Once we will tag the next major version, we will "just" remove all APIs that we don't want to maintain further. Very little actual code changes will be required besides this. Regarding the |
Thanks @gwynne and @fabianfett, that is helpful context. From my perspective, the most important thing is being prepared for changes like this going forward. It sounds like I should expect that error types may change. I can also try to keep up more with release notes, though Vapor has a lot of dependencies, so I don't know if that is feasible. |
@fabianfett asked me to write up a summary of the "ideal" error interface from our point of view. Here's what I've come up with 🙂 The original interface worked just fine for us, so I think in terms of an API that was pretty much what we needed. One thing that stood out (but I'm not sure how fixable that is), was that we were reaching beyond Fluent into a PG specific error. That's certainly fine and we're so tied to PG that this isn't really a problem. It's just slightly odd that for specific error handling you then need to import PG modules. A Fluent wrapper (like @gwynne suggested above) would be closer what I'd have wished for. I haven't thought through how you'd pass along specific error codes but I think preserving any explicit codes as they are thrown by the underlying driver would be helpful. A "self-documenting" enum (or struct with static) that gives them labels would be great but getting at the underlying raw value also feels quite important for web searches etc when something goes wrong. Old My understanding is that new I'm sure you've considered all this but just for the sake of thinking out loud it seems the options are:
I'm not sure if 3. is feasible but I think it'd be the ideal version of this. Earlier, we discussed how an env variable could configure this and the suggestion was to tie this to debug mode. However, I can see that making trouble-shooting of live issues quite hard. I don't think that's a setting we'd want, having thought about it some more. I can see how bigger orgs / setups than ours cannot afford to treat their logs as secret and would have a need for this, maybe even as the default. But turning every error into I'm wondering: How would I actually get an informative error message in case something goes wrong in prod? Re-deploy with a new error handler somewhere that does the And if it's tied to an env variable with a debug build, that would mean a ~30min turnaround and running a live system with Also, there mere act of redeploying might make errors go away and now you might be constantly fiddling with the logging until you catch the error with the right setup. So I think if at all possible, only masking the actually sensitive errors is the best option (i.e. 3.). Does that description make sense and help? I hope it does, and thanks a lot for your time looking into this and for your help. |
I'm not sure if this is a fluent-postgres-driver issue or one in postgres-kit - but FWIW, I can avoid it by pinning to
.package(url: "https://github.com/vapor/fluent-postgres-driver.git", exact: "2.6.2")
.We have a test that asserts on the error description when violating a not-null constraint:
With the latest version of fluent-postgres-driver this starts failing because the error message is now
The operation couldn’t be completed. (PostgresNIO.PSQLError error 1.)
,Of course the exact error message isn't important but the point of the test is to ensure that we have relevant information about an error in the logs.
We have another test that checks unique key violations:
That test is also failing due to that same new error message,
The operation couldn’t be completed. (PostgresNIO.PSQLError error 1.)
.Ideally, both violations would still be reported as such (with some detail as to which kind of constraint was violated) and also include the name of the constraint that was violated (if available).
Is there a way to recover this error information with the latest version of fluent-postgres-driver/postgres-kit?
The text was updated successfully, but these errors were encountered: