Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework bundled compilation to support included extensions #127

Merged
merged 23 commits into from Mar 9, 2023

Conversation

ankrgyl
Copy link
Contributor

@ankrgyl ankrgyl commented Mar 7, 2023

This change reworks the bundled compilation process so that we can support built-in extensions, like httpfs, to the compilation process. There are a number of changes that support it:

  • Reuse package_build.build_package from the DuckDB repo to gather source files, including extensions. We no longer use the amalgamation, and end up pulling more than 3,000 files. To keep the number of files in the repo small, we tar.gz the duckdb directory (and untar it on demand). This is all managed from a new script called update_sources.py.
  • DuckDB is now a submodule of the repo, so that when we want to release a new version, we simply update the repo's submodule and run the script.
  • Copy the openssl-finding code from rust-openssl, and invoke it when including the httpfs feature
  • Add a set of new features (1 per duckdb extension) that control what gets bundled in. We can extend this to include all of the builtin plugins to duckdb.

@ankrgyl
Copy link
Contributor Author

ankrgyl commented Mar 7, 2023

CC @wangfenjin

@wangfenjin
Copy link
Collaborator

You can run sh add_rust_fmt_hook.sh in the folder to make sure code will be formatted before commit

https://github.com/wangfenjin/duckdb-rs/blob/main/add_rustfmt_hook.sh

@wangfenjin
Copy link
Collaborator

Do you think it's better to depends on the openssl crate directly? Instead of copy all the code to our repo

@ankrgyl
Copy link
Contributor Author

ankrgyl commented Mar 7, 2023

You can run sh add_rust_fmt_hook.sh in the folder to make sure code will be formatted before commit

Awesome, I've done that, but for some reason I think it's still producing slightly different results (like maybe the line lengths are different). Let's see if this most recent CI passes...

Do you think it's better to depends on the openssl crate directly? Instead of copy all the code to our repo

I don't think that's possible unfortunately — I had to make some modifications including returning some values out of the main function. We could potentially upstream some changes to that crate that make this logic importable, but I'd prefer to separate that from this effort.

@codecov
Copy link

codecov bot commented Mar 7, 2023

Codecov Report

Merging #127 (0024aff) into main (74cd7f0) will decrease coverage by 0.14%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main     #127      +/-   ##
==========================================
- Coverage   59.54%   59.40%   -0.14%     
==========================================
  Files          26       26              
  Lines        1483     1483              
==========================================
- Hits          883      881       -2     
- Misses        600      602       +2     
Impacted Files Coverage Δ
src/lib.rs 72.82% <0.00%> (-2.18%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@wangfenjin
Copy link
Collaborator

Let's add a feature maybe called extension-full to include all the extensions:
https://github.com/wangfenjin/duckdb-rs/blob/main/Cargo.toml#L27

And then add the flag in ci to make it build passed
https://github.com/wangfenjin/duckdb-rs/blob/main/.github/workflows/rust.yaml#L108-L131

Thanks @ankrgyl

@wangfenjin
Copy link
Collaborator

It will be better if you can also help add some tests to verify these extensions actually works, the tests will also be used as examples for other users.

The tests can be enabled by #[cfg(feature = "foo")]

@wangfenjin
Copy link
Collaborator

If you don't have the time to add tests it's fine, I'll do that later after this one merged

@wangfenjin
Copy link
Collaborator

Is the submodule and tar.gz v0.7.1 ?

@wangfenjin
Copy link
Collaborator

https://github.com/wangfenjin/duckdb-rs/blob/main/libduckdb-sys/upgrade.sh#L14-L16 we may also update this file to make update version easier in the future. But that fine I can do that after this one get merged

@snth
Copy link
Contributor

snth commented Mar 7, 2023

Thank you @ankrgyl . This looks great and I'll be your first happy user of this as I've been hoping for this functionality for some time.

I especially like what you did with tar.gz on the duckdb directory, nice!

@ankrgyl
Copy link
Contributor Author

ankrgyl commented Mar 8, 2023

If you don't have the time to add tests it's fine, I'll do that later after this one merged

Would prefer if you could do that in a separate change. Thank you for offering!

Is the submodule and tar.gz v0.7.1 ?

Yes although I noticed that yours are generated on an x86 linux machine and mine on an arm64 mac. Maybe we should generate them on one of your machines?

https://github.com/wangfenjin/duckdb-rs/blob/main/libduckdb-sys/upgrade.sh#L14-L16 we may also update this file to make update version easier in the future. But that fine I can do that after this one get merged

https://github.com/wangfenjin/duckdb-rs/blob/main/libduckdb-sys/upgrade.sh#L14-L16 we may also update this file to make update version easier in the future. But that fine I can do that after this one get merged

I think the new update_sources.py script should encompass all of the functionality from this script. Could you verify that?

@ankrgyl
Copy link
Contributor Author

ankrgyl commented Mar 8, 2023

Additionally, I discovered that the build scripts we pulled from OpenSSL were exit(0)-ing in certain cases (that I encountered on Linux in the CI for QueryScript).

I fixed those in this commit: a3c3ced

@wangfenjin
Copy link
Collaborator

Yes although I noticed that yours are generated on an x86 linux machine and mine on an arm64 mac. Maybe we should generate them on one of your machines?

I generated on a intel mac. Never notice the difference of this, maybe we should delete the file and always generate the file on the fly in the build process. I'll follow up this issue

@ankrgyl
Copy link
Contributor Author

ankrgyl commented Mar 8, 2023

I generated on a intel mac. Never notice the difference of this, maybe we should delete the file and always generate the file on the fly in the build process. I'll follow up this issue

Ah got it. I think technically it may be ok (?) at least according to CI, but I agree that generating it dynamically is safer.

@ankrgyl
Copy link
Contributor Author

ankrgyl commented Mar 8, 2023

@wangfenjin could you help me with windows? I'm not familiar nor have an environment of my own to test with.

I followed https://github.com/sfackler/rust-openssl/blob/master/.github/workflows/ci.yml#L93 to add VCPKG and hoped that would work...

@ankrgyl
Copy link
Contributor Author

ankrgyl commented Mar 8, 2023

If it still not work, we may try sfackler/rust-openssl#1062 (comment)

Just pushed a change to try this (although not sure I got the syntax correct).

ankrgyl added a commit to qslang/queryscript that referenced this pull request Mar 8, 2023
This change takes advantage of
duckdb/duckdb-rs#127 to directly bundle the
relevant extensions into DuckDB. With this change, we can be sure that
QueryScript will work across all platforms it compiles on, and startup
times are much (~0.5-1s) faster.
libduckdb-sys/openssl/find_normal.rs Show resolved Hide resolved
libduckdb-sys/build.rs Outdated Show resolved Hide resolved
.github/workflows/rust.yaml Outdated Show resolved Hide resolved
.github/workflows/rust.yaml Outdated Show resolved Hide resolved
Change-Id: I67dd345659e9957cac8590aad1d7dd8ac190cb53
This reverts commit 96bdc43.
Change-Id: Ifa042c7f4be521a210636c7870e1e225946910e8
@wangfenjin
Copy link
Collaborator

Some follow up:

  • Add tests
  • remove the generated bindings and always generate on the fly
  • Fix the upgrade.sh to use the Python script

@wangfenjin wangfenjin mentioned this pull request Mar 9, 2023
3 tasks
@wangfenjin
Copy link
Collaborator

Thanks @ankrgyl for your help! I’ll merge this PR now

@wangfenjin wangfenjin merged commit 0da3aef into duckdb:main Mar 9, 2023
@nickpresta
Copy link
Contributor

@wangfenjin Thanks for merging this change in. Any change a new release can be cut with these features available?

@wangfenjin
Copy link
Collaborator

We try to keep the version same with duckdb for easy maintenance, for now maybe you can use master branch, I always keep master branch works

@snth
Copy link
Contributor

snth commented Mar 15, 2023

Thank you @ankrgyl . Going to try this for my own project at the earliest opportunity.

I've been waiting for this for a while and looking at the changes in the PR, I certainly couldn't have done it without you so very grateful! 🙏🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants