Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter to string command line interface #12

Merged
merged 23 commits into from
Jul 17, 2023
Merged

Filter to string command line interface #12

merged 23 commits into from
Jul 17, 2023

Conversation

JiaT75
Copy link
Contributor

@JiaT75 JiaT75 commented Jan 5, 2023

Pull request checklist

Please check if your PR fulfills the following requirements:

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been reviewed and added / updated if needed (for bug fixes / features)
  • Build was run locally and without warnings or errors
  • All previous and new tests pass

Pull request type

Please check the type of change your PR introduces:

  • Bugfix
  • Feature
  • Code style update (formatting, renaming, typo fix)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Documentation content changes
  • Other (please describe):

What is the current behavior?

xz does not have an option to convert a single string to a full filter chain

Related Issue URL:

What is the new behavior?

  • New --filters command line option
  • --long-help update
  • man page update

Does this introduce a breaking change?

  • Yes
  • No

Other information

JiaT75 added 23 commits July 17, 2023 22:59
The --filters option uses the new lzma_str_to_filters() function
to convert a string into a full filter chain. Using this option
will reset all previous filters set by --preset, --[filter], or
--filters.
Converting from string to filter will also need to be done for block
specific filter chains.
This is a little cleaner than the previous implementation of
forget_filter_chain(). It is also more consistent since
lzma_str_to_filters() will always terminate the filter chain so there
is no need to terminate it later in coder_set_compression_settings().
The new command line options are meant to be combined with --block-list.
They work as an optional extension to --block-list to specify a custom
filter chain for each block listed. The new options allow the creation
of up to 9 reusable filter chains. For instance:

xz --block-list=1:10MiB,3:5MiB,,2:5MiB,1:0 --filters1=delta--lzma2 \
--filters2=x86--lzma2 --filters3=arm64--lzma2

Will create the following blocks:
1. A block of size 10 MiB with filter chain delta, lzma2.
2. A block of size 5 MiB with filter chain arm64, lzma2.
3. A block of size 5 MiB with filter chain arm64, lzma2.
4. A block of size 5 MiB with filter chain x86, lzma2.
5. A block containing the rest of the file contents with filter chain
   delta, lzma2.
--block-list is only supported with compression in xz format. This avoids
silently ignoring when --block-list is unused.
This will only free filter chains created with --filters1-9 since the
default filter chain may be set from a static function variable. The
complexity to free the default filter chain is not worth the burden on
code maintenance.
The block splitting logic and split_block() function are not needed if
encoders are disabled. This will help slightly reduce the binary size
when built without encoders and allow split_block() to use functions
that require encoders being enabled.
Previously, only the default filter chain could have its memory usage
adjusted. The filter chains specified with --filtersX were not checked
for memory usage. Now, all used filter chains will be adjusted if
necessary.
When opt_block_size is not used, the Block size for mt encoder is
derived from the minimum of the largest Block specified by
--block-list and the recommended Block size on all filter chains
calculated by lzma_mt_block_size(). This avoids using unnecessary
memory and ensures that all Blocks are large enough for the most memory
needy filter chain.
If a filter chain is set but not used in --block-list, it introduced
unexpected behavior such as requiring an unneeded amount of memory to
compress, reducing the number of threads in multi-threaded encoding, and
printing an incorrect amount of memory needed to decompress.

This also renames filters_init_mask => filters_used_mask. A filter is
assumed to be used if it is specified in --filtersX until
coder_set_compression_settings() determines which filters are referenced
in --block-list.
The --block-list option description needed updating since the new
--filtersX option changes how it can be used. The new entry for
--filters1=FILTERS ... --filter9=FILTERS was created right after
the --filters option.
The --filters-help can be used to help create filter chains with the
--filters and --filtersX options. The message in --long-help is too
short to fully explain the syntax to construct complex filter chains.

In --robot mode, xz will only print the output from liblzma function
lzma_str_list_filters.
The order is now consistent with the order the command line arguments
are documented earlier in the man page. The new order is:
1. --list
2. --info-memory
3. --version

Instead of the previous order:
1. --version
2. --info-memory
3. --list
Changed will print => prints in xz --robot --version description to
match --robot --info-memory description.
* Moved max_block_list_size from a global to local variable.
* Reworded error message in validate_block_list_filter().
* Removed helper function filter_chain_error().
* Changed 1 << X to 1U << X in many places
The Memory limit information section described three output
columns when it actually has six. This was reworded to
"multiple" to make it more future proof.
@JiaT75 JiaT75 merged commit f99e2e4 into master Jul 17, 2023
0 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5.5.0 Item earmarked for 5.5.0 release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant