Skip to content

feat(smtp): distinguish TLS handshake failure from post-TLS application error#718

Merged
phillip-stephens merged 8 commits into
zmap:masterfrom
apowis:adam/smtp-tls-error-distinction
May 29, 2026
Merged

feat(smtp): distinguish TLS handshake failure from post-TLS application error#718
phillip-stephens merged 8 commits into
zmap:masterfrom
apowis:adam/smtp-tls-error-distinction

Conversation

@apowis

@apowis apowis commented May 18, 2026

Copy link
Copy Markdown
Contributor

When scanning with TLS (SMTPS or STARTTLS), zgrab2 previously had no way to tell whether a failed scan was caused by the TLS handshake itself failing, or by the application-layer exchange failing after TLS was successfully established. Both cases returned a generic error status, and the `tls` JSON object gave no indication of whether the handshake completed.

This PR adds that distinction via two changes:

`TLSLog.HandshakeCompletedSuccessfully bool` (`json:"handshake_completed_successfully"`)
Set to `true` in `TLSConnection.Handshake()` only when the handshake returns `nil`. All modules that call `tlsConn.GetLog()` get this field in their JSON output automatically — no per-module changes required.

The `handshake_completed_successfully` field is also registered in the shared `tls_log` SubRecord in `zgrab2_schemas/zgrab2/zgrab2.py` so zschema validation passes for all modules.

`SCAN_POST_TLS_APPLICATION_ERROR` (`"tls-application-error"`)
A new scan status for "TLS handshake succeeded but the application-layer exchange failed." The existing `SCAN_HANDSHAKE_ERROR` already covers TLS failure; this fills the gap.

The SMTP module is updated as a POC:

  • Both TLS paths (SMTPS and STARTTLS) now return `SCAN_HANDSHAKE_ERROR` explicitly on handshake failure, and capture `TLSLog` even on failure (partial handshake data is preserved)
  • Application-layer errors after a successful TLS handshake return `SCAN_POST_TLS_APPLICATION_ERROR`
  • Pre-TLS errors (e.g. a rejected STARTTLS command) continue to return `SCAN_APPLICATION_ERROR` unchanged

How to Test

```
go test github.com/zmap/zgrab2/...
go test github.com/zmap/zgrab2/modules/smtp -v -run 'TestScanSMTPS|TestScanSTARTTLS|TestHandshake'
```

Three new SMTP scanner tests cover the three states: SMTPS handshake failure, STARTTLS handshake failure, and TLS-success with a 500 error banner (uses a real in-process TLS handshake between stdlib server and zcrypto client over `net.Pipe`).

Notes

The `handshake_completed_successfully` field is available to all other TLS-capable modules already — only the status-code handling needs updating per protocol. If this approach is accepted, IMAP, POP3, FTP, HTTP, and the remaining modules will follow in separate PRs.

@phillip-stephens phillip-stephens self-requested a review May 20, 2026 02:54

@phillip-stephens phillip-stephens left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @apowis, really appreciate the work on this and the in-depth unit tests. Also good catch on the STATUS_HANDSHAKE_ERROR when we fail to establish a TLS connection.

To clarify what you think the desired status on a failed SMTP (or other protocol where the protocol occurs both before and after a TLS handshake):

  • Failure with initial TCP connection - SCAN_CONNECTION_REFUSED/TIMEOUT/CLOSED
  • Failure with application protocol, pre-TLS - SCAN_APPLICATION_ERROR
  • Failure with TLS handshake - SCAN_HANDSHAKE_ERROR
  • Failure with application protocol, post-TLS - SCAN_TLS_APPLICATION_ERROR

Comment thread status.go Outdated
Comment thread tls.go Outdated
apowis added a commit to apowis/zgrab2 that referenced this pull request May 20, 2026
Rename SCAN_TLS_APPLICATION_ERROR -> SCAN_POST_TLS_APPLICATION_ERROR and
HandshakeComplete -> HandshakeCompletedSuccessfully (field, JSON tag, schema)
per reviewer suggestions for clarity.
apowis added a commit to apowis/zgrab2 that referenced this pull request May 20, 2026
Rename SCAN_TLS_APPLICATION_ERROR -> SCAN_POST_TLS_APPLICATION_ERROR and
HandshakeComplete -> HandshakeCompletedSuccessfully (field, JSON tag, schema)
per reviewer suggestions. Update all test names and references accordingly.
@apowis apowis force-pushed the adam/smtp-tls-error-distinction branch 3 times, most recently from 933b145 to 49aa8d5 Compare May 20, 2026 09:16
@apowis

apowis commented May 20, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the review @phillip-stephens!

I've addressed both points:

  • Renamed SCAN_TLS_APPLICATION_ERRORSCAN_POST_TLS_APPLICATION_ERROR as suggested
  • Renamed HandshakeCompleteHandshakeCompletedSuccessfully to remove the ambiguity you flagged

I also took the opportunity to add a small feature while I was in this area: ServerHelloReceived bool on TLSLog. The motivation ties directly to your summary of the status hierarchy — when a TLS handshake fails, it's useful to know why. If ServerHelloReceived is true, the target definitively responded as a TLS server (it sent a ServerHello) even though the handshake didn't complete — which is meaningful signal for callers trying to distinguish "not a TLS server" from "TLS server, handshake failed for another reason" (cipher mismatch, cert rejection, etc.). As a bonus, zcrypto captures partial handshake data (ServerHello, certificates) before returning a handshake error, so HandshakeLog is populated in those cases too.

Happy to pull ServerHelloReceived out into a separate follow-up PR if you'd prefer to keep this one focused — just let me know.


Re: applying this to other protocols including HTTP

For simple protocols like IMAP, POP3, and FTP — where the module owns the TLS connection directly and calls Handshake() explicitly — extending this work is straightforward, mirroring what was done for SMTP here.

HTTP is more nuanced. The HTTP module delegates TLS entirely to lib/http/transport rather than managing the handshake itself, so the three new behaviours land differently:

  • HandshakeCompletedSuccessfully — already works for free. The transport calls TLSConnection.Handshake() via the TLS wrapper and attaches conn.GetLog() to the request, so the field is populated with no further changes needed.
  • SCAN_POST_TLS_APPLICATION_ERROR — achievable with a small change to modules/http/scanner.go. Because TLSLog is attached to the request object when TLS succeeds, we can detect the combination of a non-nil TLSLog and a failed http.Client.Do() and return SCAN_POST_TLS_APPLICATION_ERROR rather than a generic error.
  • ServerHelloReceived on a failed TLS handshake — this is the hard case. When DialTLSContext fails, the transport returns (nil, err) and the TLSConnection carrying the partial HandshakeLog is dropped. Unlike SMTP, there is no place to recover it. Fixing this properly would require either a custom error type that carries a *TLSLog alongside the error, or a hook in the transport that preserves the partial log on a failed dial — doable, but a separate piece of work.

Happy to pick that up as a follow-on once this lands.

@apowis apowis requested a review from phillip-stephens May 20, 2026 09:19
@apowis apowis force-pushed the adam/smtp-tls-error-distinction branch from 49aa8d5 to 1c72142 Compare May 20, 2026 09:22
apowis added a commit to apowis/zgrab2 that referenced this pull request May 20, 2026
Rename SCAN_TLS_APPLICATION_ERROR -> SCAN_POST_TLS_APPLICATION_ERROR and
HandshakeComplete -> HandshakeCompletedSuccessfully (field, JSON tag, schema)
per reviewer suggestions. Update all test names and references accordingly.
@apowis apowis force-pushed the adam/smtp-tls-error-distinction branch from 1c72142 to 6fac127 Compare May 20, 2026 09:26
@phillip-stephens

Copy link
Copy Markdown
Contributor

Let's pull out the ServerHelloReceived into a separate PR. I'd want to see how the http case would be handled and be sure it can be done cleanly before I'd want to get this in on smtp. We can handle all modules at once for that PR.

Also would love to get a PR to add HandshakeCompletedSuccessfully to the other TLS-using modules.

Thanks for this and just @ me when you've pulled out the ServerHelloReceived and we can get this merged in.

apowis added a commit to apowis/zgrab2 that referenced this pull request May 21, 2026
Rename SCAN_TLS_APPLICATION_ERROR -> SCAN_POST_TLS_APPLICATION_ERROR and
HandshakeComplete -> HandshakeCompletedSuccessfully (field, JSON tag, schema)
per reviewer suggestions. Update all test names and references accordingly.
@apowis apowis force-pushed the adam/smtp-tls-error-distinction branch from 6fac127 to 823e723 Compare May 21, 2026 09:26
@apowis

apowis commented May 21, 2026

Copy link
Copy Markdown
Contributor Author

Done — I've pulled ServerHelloReceived out completely. The branch is rebased on master and pushed.

For the HandshakeCompletedSuccessfully rollout to other modules, I've broken it down into a staged plan:

  1. PR A — IMAP + POP3 (one PR): identical tlsActive / appErrStatus() pattern to what was done in SMTP here. Trivial lift-and-shift.
  2. PR B — FTP: GetFTPSCertificates return type changes to *ScanError; implicit-TLS path returns SCAN_HANDSHAKE_ERROR explicitly. No post-TLS application exchange, so SCAN_POST_TLS_APPLICATION_ERROR is out of scope for this module.
  3. PR C — ManageSieve: STARTTLS is unconditional. TLS wrapper failure → SCAN_HANDSHAKE_ERROR; post-TLS capability-read failure → SCAN_POST_TLS_APPLICATION_ERROR.
  4. PR D — MySQL: one-liner — swap TryGetScanStatus for an explicit SCAN_HANDSHAKE_ERROR at the tlsWrapper call. No post-TLS exchange.
  5. PR E — Postgres: DoSSL error → SCAN_HANDSHAKE_ERROR (currently SCAN_APPLICATION_ERROR). Needs care around the connectErr.Status == SCAN_APPLICATION_ERROR retry path — that's for RequestSSL rejection, not DoSSL, so they must stay separate.
  6. PR F — MSSQL: TLS lives inside TDS framing; most involved. Needs a careful read of the error paths in connection.go before touching anything.
  7. PR G — HTTP: two sub-tasks. (A) client.Do failures after TLS succeeds → SCAN_POST_TLS_APPLICATION_ERROR (straightforward). (B) Surfacing a partial TLSLog when DialTLSContext fails requires threading a custom error type through the transport — that's the harder case I described previously, kept as a follow-on.

We could maybe compress some of these into one PR, such as combining A-E and leaving F and G separate. Let us know what you think.

apowis added 3 commits May 21, 2026 10:29
Set to true only when the full TLS handshake completes without error,
covering both Handshake() and HandshakeContext(). Partial handshake data
(ServerHello, certificates) is still surfaced in HandshakeLog on failure
via the existing deferred GetHandshakeLog() capture.

Registers the new field in the shared tls_log zgrab2 schema SubRecord.
Adds unit tests covering success and failure cases.
Distinguishes a protocol-level failure that occurs after a successful TLS
handshake from a generic SCAN_APPLICATION_ERROR. Callers can now tell
whether a scan failure happened before or after TLS was established.
…on error

Track whether TLS is active in the scanner so that application-level
errors after a successful handshake return SCAN_POST_TLS_APPLICATION_ERROR
instead of SCAN_APPLICATION_ERROR. TLS handshake failures return
SCAN_HANDSHAKE_ERROR. TLSLog is populated from partial handshake data even
on failure, so callers always see what zcrypto captured.

Adds a test that exercises the post-TLS error path using a net.Pipe fake
server and the real TLS wrapper with InsecureSkipVerify.
@apowis apowis force-pushed the adam/smtp-tls-error-distinction branch from 823e723 to 656f1b9 Compare May 21, 2026 09:32
@phillip-stephens

Copy link
Copy Markdown
Contributor

That plan sounds broadly fine with a few changes.

Personally, I think we should only use the new SCAN_POST_TLS_APPLICATION_ERROR for those application protocols like SMTP that have app logic both pre- and post-handshake and keep the status quo for the modules that do a usual TCP -> TLS -> App Logic.

For something like HTTP, returning SCAN_POST_TLS_APPLICATION_ERROR implies there's a pre-TLS app error, which doesn't sit right.

Let's:

  1. Retain using SCAN_APPLICATION_ERROR for all modules except those with pre and post-handshake app logic (SMTP, ManageSieve(?))
  2. Document the variety of values for statuses in the README.md. (I'm good with this being a final PR after getting the modules in sync with this new status paradigm). I'll open an issue for this now to track.
  3. combining A-E and leaving F and G separate - Sounds good to me

Thanks for this work, feels like an overdue alignment of our error statuses!

@phillip-stephens phillip-stephens left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final nit and then looks good to me

Comment thread status.go Outdated
Co-authored-by: Phillip Stephens <pstephens9275@gmail.com>
@apowis apowis requested a review from phillip-stephens May 26, 2026 12:11
@apowis apowis force-pushed the adam/smtp-tls-error-distinction branch from f149816 to 45bdc97 Compare May 26, 2026 12:39

@phillip-stephens phillip-stephens left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks again!

@phillip-stephens phillip-stephens enabled auto-merge (squash) May 29, 2026 02:52
@phillip-stephens phillip-stephens merged commit a1844a1 into zmap:master May 29, 2026
29 checks passed
apowis added a commit to apowis/zgrab2 that referenced this pull request Jun 2, 2026
…s for all rolled-out modules

Add a shared testhelpers package (MakeL4Dialer, GenerateTestCert,
RunTLSServer, MakeFailingTLSWrapper, MakeInsecureTLSWrapper) to avoid
repeating the net.Pipe/cert-gen/TLS-server boilerplate across six test files.

Each module gets the same three test shapes introduced for SMTP in PR zmap#718:
  - <Module>HandshakeError: TLS wrapper returns error -> SCAN_HANDSHAKE_ERROR
  - <Module>STARTTLSHandshakeError: pre-TLS exchange succeeds, wrapper fails
  - <Module>HandshakeCompletedSuccessfully: real stdlib TLS server completes
    handshake, result.TLSLog.HandshakeCompletedSuccessfully == true

ManageSieve additionally gets TestManageSievePostTLSCapabilitiesError which
verifies that a dropped connection after a successful handshake returns
SCAN_POST_TLS_APPLICATION_ERROR.

Also fixes a pre-existing bug in the FTP scanner where implicit-TLS
TLSLog was captured into a local results variable via a defer but the
returned value was &ftp.results (a copy made before the defer ran),
meaning TLSLog was always nil on implicit-TLS success. TLSLog is now
captured directly into results before ftp is initialised.
apowis added a commit to apowis/zgrab2 that referenced this pull request Jun 2, 2026
…for all modules

Each module gets the same three test shapes introduced for SMTP in PR zmap#718:
  - Implicit TLS or STARTTLS handshake error -> SCAN_HANDSHAKE_ERROR
  - STARTTLS handshake error (where applicable)
  - Real stdlib TLS handshake -> TLSLog.HandshakeCompletedSuccessfully == true

ManageSieve additionally gets TestManageSievePostTLSCapabilitiesError:
server closes after a successful handshake without sending post-TLS
capabilities -> SCAN_POST_TLS_APPLICATION_ERROR.

MySQL uses a hand-crafted HandshakeV10 binary packet with CLIENT_SSL set
so SupportsTLS() returns true and the scanner reaches the TLS wrapper.

Postgres uses a minimal SSLRequest exchange (8 bytes client, 'S' server)
to get the scanner to DoSSL, then verifies TLSLog is populated even when
the post-TLS Postgres conversation fails (server closes after handshake).
apowis added a commit to apowis/zgrab2 that referenced this pull request Jun 2, 2026
…for all modules

Each module gets the same three test shapes introduced for SMTP in PR zmap#718:
  - Implicit TLS or STARTTLS handshake error -> SCAN_HANDSHAKE_ERROR
  - STARTTLS handshake error (where applicable)
  - Real stdlib TLS handshake -> TLSLog.HandshakeCompletedSuccessfully == true

ManageSieve additionally gets TestManageSievePostTLSCapabilitiesError:
server closes after a successful handshake without sending post-TLS
capabilities -> SCAN_POST_TLS_APPLICATION_ERROR.

MySQL uses a hand-crafted HandshakeV10 binary packet with CLIENT_SSL set
so SupportsTLS() returns true and the scanner reaches the TLS wrapper.

Postgres uses a minimal SSLRequest exchange (8 bytes client, 'S' server)
to get the scanner to DoSSL, then verifies TLSLog is populated even when
the post-TLS Postgres conversation fails (server closes after handshake).
apowis added a commit to apowis/zgrab2 that referenced this pull request Jun 2, 2026
…for all modules

Each module gets the same three test shapes introduced for SMTP in PR zmap#718:
  - Implicit TLS or STARTTLS handshake error -> SCAN_HANDSHAKE_ERROR
  - STARTTLS handshake error (where applicable)
  - Real stdlib TLS handshake -> TLSLog.HandshakeCompletedSuccessfully == true

ManageSieve additionally gets TestManageSievePostTLSCapabilitiesError:
server closes after a successful handshake without sending post-TLS
capabilities -> SCAN_POST_TLS_APPLICATION_ERROR.

MySQL uses a hand-crafted HandshakeV10 binary packet with CLIENT_SSL set
so SupportsTLS() returns true and the scanner reaches the TLS wrapper.

Postgres uses a minimal SSLRequest exchange (8 bytes client, 'S' server)
to get the scanner to DoSSL, then verifies TLSLog is populated even when
the post-TLS Postgres conversation fails (server closes after handshake).
phillip-stephens added a commit that referenced this pull request Jun 3, 2026
…MySQL, Postgres (#727)

* chore: add shared TLS test helpers for module scanner tests

Adds modules/testhelpers with five helpers used across the per-module TLS
scanner tests:
  - MakeL4Dialer: wraps a net.Conn as an L4Dialer
  - GenerateTestCert: throwaway self-signed RSA cert
  - RunTLSServer: stdlib TLS server goroutine with a post-handshake callback
  - MakeFailingTLSWrapper: always returns a handshake error
  - MakeInsecureTLSWrapper: real zcrypto wrapper with InsecureSkipVerify

* feat: roll out SCAN_HANDSHAKE_ERROR to IMAP, POP3, FTP, ManageSieve, MySQL, Postgres

Each module now returns SCAN_HANDSHAKE_ERROR (instead of TryGetScanStatus
or SCAN_APPLICATION_ERROR) when the TLS handshake itself fails, and captures
any partial TLSLog before returning the error.

ManageSieve also gets SCAN_POST_TLS_APPLICATION_ERROR for post-TLS capability
read failures, since it has application-layer exchanges both before and after
the STARTTLS handshake (matching the SMTP pattern).

IMAP, POP3, FTP, MySQL, and Postgres keep SCAN_APPLICATION_ERROR for
application-level errors — these are TCP-TLS-App protocols with no pre-TLS
app logic, per reviewer guidance.

FTP: GetFTPSCertificates return type changed from error to *zgrab2.ScanError
so the caller can propagate the exact status without a TryGetScanStatus call.

FTP (bugfix): implicit-TLS TLSLog was being written into a local results
variable via a defer, but the returned value was &ftp.results (a value copy
made before the defer ran), so TLSLog was silently nil on every implicit-TLS
success. Fixed by capturing it directly before initialising ftp.

Postgres: DoSSL now stores the TLS connection even on handshake failure so
GetTLSLog can surface partial handshake data. The newConnection caller is
updated from SCAN_APPLICATION_ERROR to SCAN_HANDSHAKE_ERROR, which also
fixes an unintended retry: the SSL-retry loop was incorrectly re-attempting
on TLS handshake failures (it should only retry on RequestSSL rejection).

* test: verify SCAN_HANDSHAKE_ERROR and HandshakeCompletedSuccessfully for all modules

Each module gets the same three test shapes introduced for SMTP in PR #718:
  - Implicit TLS or STARTTLS handshake error -> SCAN_HANDSHAKE_ERROR
  - STARTTLS handshake error (where applicable)
  - Real stdlib TLS handshake -> TLSLog.HandshakeCompletedSuccessfully == true

ManageSieve additionally gets TestManageSievePostTLSCapabilitiesError:
server closes after a successful handshake without sending post-TLS
capabilities -> SCAN_POST_TLS_APPLICATION_ERROR.

MySQL uses a hand-crafted HandshakeV10 binary packet with CLIENT_SSL set
so SupportsTLS() returns true and the scanner reaches the TLS wrapper.

Postgres uses a minimal SSLRequest exchange (8 bytes client, 'S' server)
to get the scanner to DoSSL, then verifies TLSLog is populated even when
the post-TLS Postgres conversation fails (server closes after handshake).

* move conn close defer right after conn creation in ftp/scanner

* added 2 new asserts to tests for mysql and postgres

---------

Co-authored-by: Phillip Stephens <phillip@cs.stanford.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants