Skip to content

GH-37891: [C++][Parquet] Refine several classes in Parquet encryption #46202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 19, 2025

Conversation

kapoisu
Copy link
Contributor

@kapoisu kapoisu commented Apr 22, 2025

What changes are included in this PR?

  • Pass movable objects by value to functions that keep a copy of them.
  • Use emplace() instead of insert() when the arguments can be moved safely.
  • Remove unnecessary const qualifier on member variables that makes the class non-copyable and non-movable.
  • Remove unnecessary constructors and destructors that can be generated automatically by compilers.

Are these changes tested?

Yes, by existing tests.

Are there any user-facing changes?

No.

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @wgtmac this lgtm but I don't know whether this affect py or other interfaces

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Apr 22, 2025
@kapoisu
Copy link
Contributor Author

kapoisu commented Apr 22, 2025

I've reverted the incorrect change within key_wrapping_test.cc. The file_system there shouldn't be moved.

@wgtmac
Copy link
Member

wgtmac commented Apr 23, 2025

Are these the only places that we can apply it? Any other similar place in the parquet subdirectory?

@kapoisu
Copy link
Contributor Author

kapoisu commented Apr 23, 2025

Are these the only places that we can apply it? Any other similar place in the parquet subdirectory?

I'm sure there are more (e.g. FileKeyWrapper). Further changes can be included in this PR if that is desirable.

@wgtmac
Copy link
Member

wgtmac commented Apr 23, 2025

Yes, I think we should not create similar PRs for a single issue.

@kapoisu kapoisu force-pushed the GH-37891 branch 3 times, most recently from b3597e8 to 591c82a Compare May 2, 2025 18:47
@kapoisu kapoisu changed the title GH-37891: [C++] Refine class parquet::encryption::FileSystemKeyMaterialStore GH-37891: [C++] Refine several classes in parquet namespace May 24, 2025
@mapleFU
Copy link
Member

mapleFU commented May 29, 2025

@wgtmac should we take an another round in this?

@pitrou pitrou changed the title GH-37891: [C++] Refine several classes in parquet namespace GH-37891: [C++][Parquet] Refine several classes in parquet namespace Jun 12, 2025
@pitrou pitrou changed the title GH-37891: [C++][Parquet] Refine several classes in parquet namespace GH-37891: [C++][Parquet] Refine several classes in Parquet encryption Jun 12, 2025
@pitrou
Copy link
Member

pitrou commented Jun 12, 2025

@kapoisu Were you planning to add more changes to this PR? Otherwise I think it can be merged.

@kapoisu
Copy link
Contributor Author

kapoisu commented Jun 12, 2025

@kapoisu Were you planning to add more changes to this PR? Otherwise I think it can be merged.

I shared some additional opinions in the original issue, but haven’t received a response from the original author yet. I’d like to get some feedback from others on that first.

Besides, the original issue isn’t limited to parquet namespace. It just happens that my current modifications are still within that scope. If we try to cover all the necessary changes for the entire project in this PR, it might result in at least hundreds of commits to review.

@kapoisu
Copy link
Contributor Author

kapoisu commented Jun 13, 2025

I noticed that the FileKeyUnwrapper class has public constructors that accept either owned or non-owned pointer to KeyToolkit, originating from PR #40329. Internally, the implementation only uses the non-owned pointer, but it keeps a copy of shared_ptr when the owned version is used, to avoid dropping the reference count to 0.

If this class is intended for internal use only, there doesn’t seem to be a strong reason to keep the owned versions. Removing them would make the semantics clearer, as users would then understand they are responsible for managing the lifetime of the object. It would also simplify future additions by avoiding the need to introduce two constructors for every new parameter combination.

@adamreeve any thoughts on this?

@adamreeve
Copy link
Contributor

@adamreeve any thoughts on this?

This sounds good to me, but given this is technically a public class, maybe we should first mark the constructors that accept an owned pointer as deprecated and then remove them in a later release? Although I doubt this change would affect many, if any, people so I wouldn't be against just removing them.

kapoisu added 3 commits June 19, 2025 09:43
* Pass std::string by value to constructors and Make().
* Pass std::shared_ptr<::arrow::fs::FileSystem> by value to
  constructors and Make().
* Use emplace() instead of insert() in AddKeyMaterial() because the
  parameters can be moved.
kapoisu added 10 commits June 19, 2025 09:43
* Pass movable objects by value to functions that keep a copy of them.
* Update the call sites accordingly.
* Pass movable objects by value to functions that keep a copy of them.
* Remove unnecessary const member variable that makes the class non-
  copyable and non-movable.
* Pass movable objects by value to functions that keep a copy of them.
* Replace const std::shared_ptr<T>& with const T& in cases where the
  function doesn't manage or share ownership.
* Update the call sites accordingly.
* Pass movable objects by value to functions that keep a copy of them.
* Remove unnecessary const member variables that make the class non-
  copyable and non-movable.
* Remove unnecessary constructors and destructors that can be generated
  automatically by compilers.
* Pass movable objects by value to functions that keep a copy of them.
* Remove unnecessary const member variables that make the class non-
  copyable and non-movable.
* Replace const std::shared_ptr<T>& with const T& in cases where the
  function doesn't manage or share ownership.
* Update the call sites accordingly.
* Pass movable objects to functions that keep a copy of them.
* Remove unnecessary const member variables that make the class non-
  copyable and non-movable.
* Remove unnecessary constructors and destructors that can be generated
  automatically by compilers.
* Pass movable objects by value to functions that keep a copy of them.
* Update the call sites accordingly.
* Pass movable objects to functions that keep a copy of them.
* Remove unnecessary destructors that can be generated automatically by
  compilers.
@pitrou
Copy link
Member

pitrou commented Jun 19, 2025

I've rebased on git main, will merge if CI passes

@pitrou pitrou merged commit 0bd4e73 into apache:main Jun 19, 2025
33 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Jun 19, 2025
@kapoisu kapoisu deleted the GH-37891 branch June 19, 2025 14:22
Copy link

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 0bd4e73.

There were 118 benchmark results with an error:

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants