New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add script to mass convert images to webp. #6594
Conversation
I was wondering about this. Was there, by any chance, conversions that increased the file size (regardless of how much the increase was)? |
Some, but not many, and not by much. They're included in conversion-bad.csv. |
I see there are quite a few in "bad", mainly unit and terrain images. I wonder how much sense it would make to just state that unit and terrain images stay as PNG? I do notice a lot of unit and terrain images are also listed in "good" though… mostly with percentages above 70%, though there are also some outliers around 30% and I even spotted one as low as 9% which seems quite weird. |
@Pentarctagon Actually, there is a way. A few days ago, I was reading the WebP specifications to see if I could update wmlscope's |
It looks like |
That I don't know. I had been thinking that webp -> webp might show improvements as improvements are made to libwebp/cwebp, but it might be that for whatever reason that doesn't work well.
That sounds like it wouldn't fully be able to check due to VP8X though? |
Not quite. A VP8X file can contain various types of chunks (alpha, animation frames...) and it also contains at least a VP8 or VP8L chunk. I think that, for our purposes, we can safely assume that our files contain only one frame and no animations, so searching for the first |
It looks like the webp images generated both by gimp and by cwebp say they are VP8X for lossy conversion. cwebp for lossless has VP8L. So that might work actually. |
If we want to consider VP8X lossy by default, indeed this simplifies a lot of things. |
Yeah. I assume checking whether a lossless version of a lossy webp image is smaller is pointless, and even if there are some size savings at some point we shouldn't keep using lossy encoding again and again on the same image. |
You can start replacing the if check with this one (I'm still testing it, but I want to let you have a look anyway):
Some other random thoughts:
|
I don't follow this one - what does commas being a valid character for a filename have to do with using them in the contents of the file? |
In your CSV files, you're using commas as field separators and no character as field qualifier. In this situation, if a file path contains a comma, this one will be interpreted as a field separator rather than as character, causing issues (stuff placed in the wrong columns) when you attempt to import the resulting CSV file into a spreadsheet. |
50ebd6f
to
dc335ee
Compare
Alright, the above comments have been addressed. |
dc335ee
to
2fe7fe3
Compare
And should be good to go. The only question I have being whether it really makes sense to try reconverting lossy webp images at all given the conversion is, well, lossy. |
Generally I would avoid conversion of lossy images at all, but the earlier discussion seemed to suggest that it wouldn't be easy to tell if an image was using lossy or lossless compression. |
I think we have a reasonable way of telling apart lossy from lossless. The script is assuming VP8X is lossy when it can contain either, but I'm not sure how to tell which it really is for that variation. |
print("cwebp executable not found in PATH, exiting.") | ||
sys.exit(1) | ||
|
||
image_dirs = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you should use os.path.join()
instead of slashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
utils/optiwebp.py
Outdated
else: | ||
print(filename, "is not a valid WebP file", file=sys.stderr) | ||
continue | ||
elif filetype == ".jpg": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JPEG files can also have the .jpeg
extension, so you should add or filetype == ".jpeg"
(or you can use elif filetype in (".jpg", ".jpeg"):
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
duration = time.time() - start | ||
|
||
hours_duration = int(duration / 3600) | ||
minutes_duration = int((duration - (hours_duration * 3600)) / 60) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, using the modulo operator is better: minutes_duration = int((duration % 3600) / 60)
.
If you want, you can also remove the int()
casting and use the integer division operator //
(example: hours_duration = duration // 3600
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
utils/optiwebp.py
Outdated
if os.path.exists(good_conversion): | ||
os.remove(good_conversion) | ||
with open(good_conversion, "a") as f: | ||
f.write("filename,old_size,new_size,change_in_percent,change_in_bytes\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can move this line and line 134 inside the with ... as
block at line 139.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
utils/optiwebp.py
Outdated
initial_total_size = 0 | ||
final_total_size = 0 | ||
|
||
with open(good_conversion, "a") as good_f, open(bad_conversion, "a") as bad_f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you move lines 129 and 134 here, you can replace the opening mode "a"
with "w"
and overwrite the files even if they already exist (no need to remove a file then append to an empty file).
You should also wrap this block inside a try ... except OSError:
block, in case the user cannot write to the files for any reason (like missing permissions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
With the file already open and read into a variable, it is possible to use the |
At that point I kinda start feeling "if they don't clearly define in their spec how to tell them apart, then I don't want to try guessing at it unless I really have to". |
2fe7fe3
to
6e0b34a
Compare
All review comments have been addressed, I believe. |
This is a script that converts png to lossless webp and jpg to lossy webp (quality 90), as well as reconverts existing webp images using lossless conversion. This PR is not for actually converting any images.
Remaining work:
conversion-good.csv has a list of the files that had their size reduced by more than 10%.
conversion-bad.csv has a list of files that did not have their size reduced by more than 10%.
A quick summary: