Replace routerify in file transfer server with axum #2461

jarhodes314 · 2023-11-16T12:24:48Z

Proposed changes

This is a rewrite of the file transfer server to use axum instead of routerify, which forms the groundwork for adding HTTPS and certificate authentication support once #2430 is merged.

The basic migration to axum was pretty trivial (see 4051213), but once this was complete, I refactored the code considerably to be more idiomatic axum code. The highlights of these changes are:

Meaningful handler return values: the http handlers no longer hand-build responses, they return normal Rust values (e.g. StatusCode where all we do is set the HTTP status code, an error enum for handling any unsuccessful response)
Local HTTP testing: the tests no longer listen on a port to test HTTP functionality, the services are now called directly
Extractors: boilerplate called by the handlers to extract path information from the URI and convert that into an absolute path to read from/write to the filesystem is now called before a handler runs and passed in as an argument to the handler
General security improvements: the API no longer exposes absolute paths to files to the user (e.g. when we respond with 404 not found for /tedge/file-transfer/some/file, we say File not found: "some/file" rather than File not found: "/var/tedge/file-transfer/some/file"). It also fixes what appeared to be a (very limited in scope) path traversal attack, where a user could access the data directory directly and not just the file-transfer subdirectory of this
Removing race conditions: the previous code checked that files existed before reading them, and handled errors based on that, but there's a file could be deleted between checking it exists and attempting to read it, which would result in a spurious HTTP 500 error which should be a 404.

Types of changes

Intended to be an improvement, but some minor bugs were fixed along the way, so checking both of those.

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
Documentation Update (if none of the other choices apply)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

Checklist

I have read the CONTRIBUTING doc
I have signed the CLA (in all commits with git commit -s)
I ran cargo fmt as mentioned in CODING_GUIDELINES
I used cargo clippy as mentioned in CODING_GUIDELINES
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)

Further comments

Signed-off-by: James Rhodes <jarhodes314@gmail.com>

- Add streaming support to download endpoint - Use FileTransferPaths to avoid repetition - Remove redundant code from upload handler - Improve error messages for upload failures - Modify remaining tests to use API instead of internal furnctions Signed-off-by: James Rhodes <jarhodes314@gmail.com>

Signed-off-by: James Rhodes <jarhodes314@gmail.com>

jarhodes314 · 2023-11-16T12:29:45Z

crates/core/tedge_agent/src/file_transfer_server/http_rest.rs

+use super::error::FileTransferRequestError as Error;
+use super::request_files::FileTransferPaths;
+
+// TODO is this not just a repeat of code in tedge_config?
 const HTTP_FILE_TRANSFER_PORT: u16 = 8000;


Probably outside of the scope of these changes, but it feels like we shouldn't duplicate what tedge_config knows here. If someone wants the default values, they could surely achieve that with a default config?

Related to this, it feels like the API for HttpConfig is a bit flawed. We always construct default values and then override them with the builder API, which seems pointless (and leads to the multiple sources of truth issues).

jarhodes314 · 2023-11-16T12:34:05Z

crates/core/tedge_agent/src/file_transfer_server/http_rest.rs

-
-    let relative_path = Utf8PathBuf::from(relative_path);
-    Some((relative_path, file_name.into()))
+fn separate_path_and_file_name(input: &Utf8Path) -> Option<(&Utf8Path, &str)> {


The semantics of this function are completely different to what was originally written (and very directly tested):

thin-edge.io/crates/core/tedge_agent/src/file_transfer_server/http_rest.rs

Lines 250 to 261 in ccff7ee

#[test_case("/tedge/some/dir/file", "/tedge/some/dir", "file")]

#[test_case("/tedge/some/dir/", "/tedge/some/dir", "")]

fn test_separate_path_and_file_name(

input: &str,

expected_path: &str,

expected_file_name: &str,

) {

let (actual_path, actual_file_name) =

separate_path_and_file_name(Utf8Path::new(input)).unwrap();

assert_eq!(actual_path, expected_path);

assert_eq!(actual_file_name, expected_file_name);

}

However these tests were not testing anything the user could achieve, as the API did path cleanup (which removed trailing slashes) and (I think) the code we call afterwards would have returned an error if we had been able to call this with a trailing slash (as we would have created the entire path as directories and then attempted to create a file with an empty name inside that).

codecov · 2023-11-16T12:35:31Z

Codecov Report

Merging #2461 (fd5a907) into main (a030447) will increase coverage by 0.1%.
Report is 3 commits behind head on main.
The diff coverage is 87.7%.

Additional details and impacted files

Files	Coverage Δ
crates/core/tedge_agent/src/agent.rs	`0.0% <ø> (ø)`
...core/tedge_agent/src/file_transfer_server/actor.rs	`86.2% <88.8%> (+0.9%)`	⬆️
...ge_agent/src/file_transfer_server/request_files.rs	`88.2% <88.2%> (ø)`
...core/tedge_agent/src/file_transfer_server/error.rs	`47.6% <41.1%> (-27.4%)`	⬇️
.../tedge_agent/src/file_transfer_server/http_rest.rs	`89.6% <90.7%> (+24.2%)`	⬆️

... and 8 files with indirect coverage changes

jarhodes314 · 2023-11-16T12:36:01Z

crates/core/tedge_agent/src/file_transfer_server/http_rest.rs

-
-    if let Some((relative_path, file_name)) = separate_path_and_file_name(&full_path) {
-        let root_path = file_transfer.data_dir.clone();
-        let directories_path = root_path.join(relative_path);


This relative_path was very definitely an absolute path, so this code was entirely superfluous, as it just returned relative_path in any case.

jarhodes314 · 2023-11-16T12:41:15Z

crates/core/tedge_agent/src/file_transfer_server/error.rs

+            DeleteIoError { path, .. } => {
+                tracing::error!("{self}");
+                (
+                    StatusCode::FORBIDDEN,


Do these really want to be 403 Forbidden (it's what the code did previously)? It feels like an encapsulation failure to recognise for certain paths that we can't delete/upload to them.

IMO "could not delete path due to IO error" could happen due e.g. path not existing, in which case we need to return 404, or due to server not having permissions to do filesystem delete, in which case we should return 500, so if I'm not wrong, then even using a single return code for all IO errors is kinda bad. As for 403 Forbidden, I also feel that it should be used when the client that made the request doesn't have permissions to do something, which doesn't seem to be the case here.

Yeah, that was roughly what I was thinking. We currently handle the not found case by returning 202 Accepted, like the successful deletion case (which I assume is to make DELETE requests idempotent). Returning 404 Not Found may also make sense, or if the file previously existed, 410 Gone, although a success response tells the user the file definitely no longer exists, and the request definitely succeeded.

I think if we've hit the case handled by this code, it's some error accessing the file system, which in my mind should definitely be a 500 Internal Server Error. This code may also be caused by trying to delete a directory, which I talk about in a different comment.

403 doesn't look bad, as the spec mentions the following: The access is tied to the application logic, such as insufficient rights to a resource. which is the case here, right? I vaguely remember a similar discussion happening when the original author introduced this case in the first place and we settled with 403 for the same reason.

I don't understand what application logic is relevant here. With the current API at least, we have no control over permissions. If there's a permissions error on the file system, that implies some other process is interfering with /var/tedge/file-transfer, which is not an application-level error, it's an error from the environment. The question here mostly boils down to whether we want to assume it's some file that should have never been there (in which case encapsulating the error entirely and responding with 404 Not Found is appropriate, or whether we assume the file was uploaded correctly, but something as changed the internal permissions so we can't access it, in which case a 500 Internal Server Error is appropriate.

Thinking more broadly, a useful litmus test to apply here would be "if one user receives a 403 for a resource, there must exist some other user that is able to access the resource in question". If not, and the 403 response is global, then I don't believe this can be considered an authorization issue.

@didier-wenzek @reubenmiller do you have opinions on this point?

Deleting a resource that doesn't exist is definitely not an error. Returning 202 in such a case makes the request idempotent.

Trying to delete a non-empty directory, should be considered as a user error as we don't want to support recursive deletes. So 400 makes sense. I would do the same for empty-directories.

If the error is due to missing access, I would say this is a 500 as the file-transfer service should own all the files under file transfer.

403 doesn't look bad, as the spec mentions the following: The access is tied to the application logic, such as insufficient rights to a resource.

From RFC 9110:

The 403 (Forbidden) status code indicates that the server understood the request but refuses to fulfill it.

If we get an IO error due to missing file permissions, I would say that the server tried to execute the request, but failed, due to a misconfiguration (e.g. not having write access to file-transfer directory), in which case 500 would be fitting.

Trying to delete a non-empty directory, should be considered as a user error as we don't want to support recursive deletes. So 400 makes sense. I would do the same for empty-directories.

I think if it is a directory, 404 makes more sense as it the error is in the URL and the issue is that the resource in question cannot be interacted with (and essentially doesn't exist), not that the request is inherently invalid. I'll make sure the error message clarifies that the file we attempted to delete is a directory.

Other than that, it seems like there is now broad consensus that 500 is the appropriate response for an IO error, so I'll sort that out too.

jarhodes314 · 2023-11-16T12:48:35Z

crates/core/tedge_agent/src/file_transfer_server/http_rest.rs

-        // One must check that once normalized (i.e. any `..` removed)
-        // the path is still under the file transfer dir
-        let clean_path = clean_utf8_path(&full_path);
-        if clean_path.starts_with(&self.data_dir) {


This was only checking that the file was in e.g. /var/tedge since all the manipulation in this function was based around file-transfer being part of the URL path (i.e. in the variable path in this function), so it was possible for a user to read from/write to /var/tedge, not just /var/tedge/file-transfer. I'm assuming this wasn't intentional and I've fixed it by storing file_transfer_dir inside HttpConfig instead.

crates/core/tedge_agent/src/file_transfer_server/http_rest.rs

Signed-off-by: James Rhodes <jarhodes314@gmail.com>

jarhodes314 · 2023-11-16T12:53:55Z

crates/core/tedge_agent/src/file_transfer_server/actor.rs

+    async fn check_server_does_not_panic_when_port_is_in_use() -> anyhow::Result<()> {
        let ttd = TempTedgeDir::new();

        let http_config = HttpConfig::default()
            .with_data_dir(ttd.utf8_path_buf().into())
            .with_port(3746);
        let config_clone = http_config.clone();

-        // Spawn HTTP file transfer server
-        // handle_one uses port 3000.
-        let builder_one = FileTransferServerBuilder::new(http_config);
-        let handle_one = tokio::spawn(async move { builder_one.build().run().await });
-
-        // handle_two will not be able to bind to the same port.
-        let builder_two = FileTransferServerBuilder::new(config_clone);
-        let handle_two = tokio::spawn(async move { builder_two.build().run().await });
-
-        // although the code inside handle_two throws an error it does not panic.
-        tokio::time::sleep(std::time::Duration::from_millis(100)).await;
+        // Both servers will attempt to bind to port 3746.
+        let server_one = FileTransferServerBuilder::new(http_config).build().run();
+        // This server will not be able to bind to the same port.
+        let server_two = FileTransferServerBuilder::new(config_clone).build().run();

-        // to check for the error, we assert that handle_one is still running
-        // while handle_two is finished.
-        assert!(!handle_one.is_finished());
-        assert!(handle_two.is_finished());
+        tokio::select! {
+            // Ensure we bind server_one first by polling that future first
+            biased;
+            res = server_one => bail!("expected second server to finish first, but first finished with: {res:?}"),
+            res = server_two => ensure!(res.is_err(), "expected server two to fail with port binding error, but no error was found"),
+            _ = tokio::time::sleep(Duration::from_secs(5)) => bail!("timed out waiting for actor to stop running"),
+        }


This refactoring is incidental, I noticed the test had both a hard sleep (so it was comparatively slow) and not terribly deterministic (no guarantee server one would attempt to bind first, no guarantee the port binding would fail within 100ms). The timeout-based approach gives a lot more leniency to the servers to fail, and polling server_one first ensures that it binds to a port first (assuming that is the first thing it attempts to do, which I think is a reasonable assumption).

In response to #2461 (comment), I've now changed this test again, and as a result the remaining complexity around which port binds first (i.e. what I'm relying on the biased directive for) has disappeared entirely.

github-actions · 2023-11-16T13:27:25Z

Robot Results

✅ Passed	❌ Failed	⏭️ Skipped	Total	Pass %	⏱️ Duration
376	0	3	376	100	55m40.082s

crates/core/tedge_agent/src/file_transfer_server/error.rs

Bravo555 · 2023-11-17T00:51:46Z

crates/core/tedge_agent/src/file_transfer_server/error.rs

+        use FileTransferRequestError::*;
+        match &self {
+            FromIo(_) | PathRejection(_) => {
+                tracing::error!("{self}");


suggestion: This will log an error not only when a request is handled, but every time .into_response() is called.

Axum has a tracing feature that makes it output traces when handling events. Could it be used for logging errors like we try to do here?

I'm not sure axum's tracing is quite what we want here. Here, I want to determine whether I'm hiding information from the user (which is when we have an internal server error), and if so, log that information. This avoids polluting the server logs with errors arising from clients misusing the API, but means we don't expose implementation details to the user of the API. If we rely on some sort of middleware for logging, either it would take an input of FileTransferRequestError, in which case we might as well do everything here as we'd still need to log selectively, or we'd take the converted response, by which point we know the status code, so it's easy to log selectively, but we've thrown away the pertinent information (e.g. an IO error).

Cargo.toml

crates/core/tedge_agent/src/file_transfer_server/actor.rs

crates/core/tedge_agent/src/file_transfer_server/error.rs

albinsuresh · 2023-11-17T09:51:54Z

crates/core/tedge_agent/src/file_transfer_server/error.rs

+            DeleteIoError { path, .. } => {
+                tracing::error!("{self}");
+                (
+                    StatusCode::FORBIDDEN,


403 doesn't look bad, as the spec mentions the following: The access is tied to the application logic, such as insufficient rights to a resource. which is the case here, right? I vaguely remember a similar discussion happening when the original author introduced this case in the first place and we settled with 403 for the same reason.

crates/core/tedge_agent/src/file_transfer_server/actor.rs

crates/core/tedge_agent/src/file_transfer_server/http_rest.rs

crates/core/tedge_agent/src/file_transfer_server/request_files.rs

Signed-off-by: James Rhodes <jarhodes314@gmail.com>

albinsuresh

Happy to approve once the DeleteIoError response is changed from FORBIDDEN to INTERNAL_SERVER_ERROR as concluded.

jarhodes314 · 2023-11-17T16:27:54Z

Happy to approve once the DeleteIoError response is changed from FORBIDDEN to INTERNAL_SERVER_ERROR as concluded.

Done in 1ca42b2

Signed-off-by: James Rhodes <jarhodes314@gmail.com>

crates/core/tedge_agent/src/file_transfer_server/http_rest.rs

crates/core/tedge_agent/src/file_transfer_server/actor.rs

didier-wenzek

Approved.

The next step will be to enable HTTPS support to this file transfer service, introducing http.cert_path, http.key_path and http.ca_path and considering to reuse http.client.auth.cert_file and http.client.auth.cert_file to authenticate tedge-agent when acting as a client (auth proxy) as well as a server (file transfer).

Co-authored-by: Albin Suresh <albinsuresh@hotmail.com>

Bravo555

I got answers to my question and most pressing feedback was addressed, so approving.

crates/core/tedge_agent/src/file_transfer_server/error.rs

Bravo555 · 2023-11-20T09:34:13Z

crates/core/tedge_agent/src/file_transfer_server/error.rs

+    #[error("Cannot upload: {path:?} is a directory, not a file")]
+    CannotUploadDirectory { path: RequestPath },
+
+    #[error("Request to delete {path:?} failed: {source}")]


note: We need to be careful with including source error message in parent error message.

I think the recommendation is to not include source error messages like this because proper error reporting/diagnostics will themselves call .source() to print messages for the entire error chain. However, for code that is not aware of this, and doesn't depend on any error reporting crates, it will most likely just grab outer error message. It depends where the error is used. For the FTS, I think the errors appear only in HTTP responses and logs, but 1) I'm not sure how they are exactly reported by axum 2) it's all internal so it doesn't matter that much.

I'm aware, but knew exactly where these error messages were going (which is only when we call .to_string()), and I didn't want to have to convert them to something like anyhow::Error just to get a backtrace. You're right though, a lot of thin-edge errors are in library code where they're eventually going to become anyhow::Error and in that case it's really annoying (reqwest is the example of a public crate that does this that springs to mind).

jarhodes314 added 9 commits November 16, 2023 12:03

Replace routerify with axum in file transfer service

4051213

Signed-off-by: James Rhodes <jarhodes314@gmail.com>

Refactor tests to not spin up server

46f3573

Signed-off-by: James Rhodes <jarhodes314@gmail.com>

Convert GET code to more idiomatic axum

e5861c0

Signed-off-by: James Rhodes <jarhodes314@gmail.com>

Axumify delete endpoint

12cbcf3

Signed-off-by: James Rhodes <jarhodes314@gmail.com>

Further tidy http_rest code

a2eaf4d

Signed-off-by: James Rhodes <jarhodes314@gmail.com>

Refactor extractors into separate module

62dd8f5

Signed-off-by: James Rhodes <jarhodes314@gmail.com>

Reactor test to be more deterministic and avoid hard wait

63f7f9a

Signed-off-by: James Rhodes <jarhodes314@gmail.com>

Remove unneeded TODOs

8a920e3

Signed-off-by: James Rhodes <jarhodes314@gmail.com>

jarhodes314 commented Nov 16, 2023

View reviewed changes

jarhodes314 had a problem deploying to Test Pull Request November 16, 2023 12:32 — with GitHub Actions Failure

jarhodes314 commented Nov 16, 2023

View reviewed changes

crates/core/tedge_agent/src/file_transfer_server/http_rest.rs Show resolved Hide resolved

Update docstring

09da1df

Signed-off-by: James Rhodes <jarhodes314@gmail.com>

jarhodes314 commented Nov 16, 2023

View reviewed changes

jarhodes314 temporarily deployed to Test Pull Request November 16, 2023 12:58 — with GitHub Actions Inactive

jarhodes314 requested review from albinsuresh and Bravo555 November 16, 2023 12:59

Bravo555 reviewed Nov 17, 2023

View reviewed changes

albinsuresh reviewed Nov 17, 2023

View reviewed changes

jarhodes314 added 2 commits November 17, 2023 11:21

Merge branch 'main' into feat/file_transfer_axum

22995e5

Address review comments

93aa002

Signed-off-by: James Rhodes <jarhodes314@gmail.com>

jarhodes314 temporarily deployed to Test Pull Request November 17, 2023 11:31 — with GitHub Actions Inactive

Merge remote-tracking branch 'origin/main' into feat/file_transfer_axum

d7a9445

albinsuresh reviewed Nov 17, 2023

View reviewed changes

jarhodes314 force-pushed the feat/file_transfer_axum branch from a8f3179 to 0056f9c Compare November 17, 2023 16:27

jarhodes314 temporarily deployed to Test Pull Request November 17, 2023 16:33 — with GitHub Actions Inactive

Further refactor error handling

1ca42b2

Signed-off-by: James Rhodes <jarhodes314@gmail.com>

jarhodes314 force-pushed the feat/file_transfer_axum branch from 0056f9c to 1ca42b2 Compare November 17, 2023 16:57

jarhodes314 temporarily deployed to Test Pull Request November 17, 2023 17:04 — with GitHub Actions Inactive

albinsuresh approved these changes Nov 20, 2023

View reviewed changes

crates/core/tedge_agent/src/file_transfer_server/http_rest.rs Outdated Show resolved Hide resolved

crates/core/tedge_agent/src/file_transfer_server/actor.rs Outdated Show resolved Hide resolved

didier-wenzek approved these changes Nov 20, 2023

View reviewed changes

Fix typos

fd5a907

Co-authored-by: Albin Suresh <albinsuresh@hotmail.com>

jarhodes314 temporarily deployed to Test Pull Request November 20, 2023 09:27 — with GitHub Actions Inactive

Bravo555 approved these changes Nov 20, 2023

View reviewed changes

jarhodes314 merged commit a67dba6 into thin-edge:main Nov 20, 2023
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace routerify in file transfer server with axum #2461

Replace routerify in file transfer server with axum #2461

jarhodes314 commented Nov 16, 2023 •

edited

jarhodes314 Nov 16, 2023 •

edited

jarhodes314 Nov 16, 2023 •

edited

codecov bot commented Nov 16, 2023 •

edited

jarhodes314 Nov 16, 2023

jarhodes314 Nov 16, 2023

Bravo555 Nov 16, 2023 •

edited

jarhodes314 Nov 16, 2023 •

edited

albinsuresh Nov 17, 2023

jarhodes314 Nov 17, 2023

jarhodes314 Nov 17, 2023

didier-wenzek Nov 17, 2023 •

edited

Bravo555 Nov 17, 2023

jarhodes314 Nov 17, 2023 •

edited

jarhodes314 Nov 16, 2023

jarhodes314 Nov 16, 2023 •

edited

jarhodes314 Nov 17, 2023

github-actions bot commented Nov 16, 2023 •

edited

Bravo555 Nov 17, 2023

jarhodes314 Nov 17, 2023

albinsuresh Nov 17, 2023

albinsuresh left a comment

jarhodes314 commented Nov 17, 2023 •

edited

didier-wenzek left a comment

Bravo555 left a comment

Bravo555 Nov 20, 2023

jarhodes314 Nov 20, 2023

	#[test_case("/tedge/some/dir/file", "/tedge/some/dir", "file")]
	#[test_case("/tedge/some/dir/", "/tedge/some/dir", "")]
	fn test_separate_path_and_file_name(
	input: &str,
	expected_path: &str,
	expected_file_name: &str,
	) {
	let (actual_path, actual_file_name) =
	separate_path_and_file_name(Utf8Path::new(input)).unwrap();
	assert_eq!(actual_path, expected_path);
	assert_eq!(actual_file_name, expected_file_name);
	}

Replace routerify in file transfer server with axum #2461

Replace routerify in file transfer server with axum #2461

Conversation

jarhodes314 commented Nov 16, 2023 • edited

Proposed changes

Types of changes

Paste Link to the issue

Checklist

Further comments

jarhodes314 Nov 16, 2023 • edited

Choose a reason for hiding this comment

jarhodes314 Nov 16, 2023 • edited

Choose a reason for hiding this comment

codecov bot commented Nov 16, 2023 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bravo555 Nov 16, 2023 • edited

Choose a reason for hiding this comment

jarhodes314 Nov 16, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

didier-wenzek Nov 17, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jarhodes314 Nov 17, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jarhodes314 Nov 16, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Nov 16, 2023 • edited

Robot Results

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

albinsuresh left a comment

Choose a reason for hiding this comment

jarhodes314 commented Nov 17, 2023 • edited

didier-wenzek left a comment

Choose a reason for hiding this comment

Bravo555 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jarhodes314 commented Nov 16, 2023 •

edited

jarhodes314 Nov 16, 2023 •

edited

jarhodes314 Nov 16, 2023 •

edited

codecov bot commented Nov 16, 2023 •

edited

Bravo555 Nov 16, 2023 •

edited

jarhodes314 Nov 16, 2023 •

edited

didier-wenzek Nov 17, 2023 •

edited

jarhodes314 Nov 17, 2023 •

edited

jarhodes314 Nov 16, 2023 •

edited

github-actions bot commented Nov 16, 2023 •

edited

jarhodes314 commented Nov 17, 2023 •

edited