New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a string encoding selector to the product importer #36819
Conversation
Test Results SummaryCommit SHA: 6cd66c5
To view the full API test report, click here. To view the full E2E test report, click here. To view all test reports, visit the WooCommerce Test Reports Dashboard. |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## trunk #36819 +/- ##
==========================================
- Coverage 46.7% 46.7% -0.0%
- Complexity 17178 17182 +4
==========================================
Files 429 429
Lines 64779 64791 +12
==========================================
+ Hits 30242 30245 +3
- Misses 34537 34546 +9
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Konamiman things work as you described during the import preview stage (when explictly selecting ISO-8859-1
as the encoding), but after import the product title looks awry in both the product list table and the product editor:
P?t’ Nat’ P?pin Batch 4 – Domaine Achill?e *
Instead of:
Pét' Nat' Pépin Batch 4 - Domaine Achillée *
I did verify that the source file was correctly saved with the relevant encoding first of all, ie:
% file -I samplecsv8859-1.csv
samplecsv8859-1.csv: text/plain; charset=iso-8859-1
Do you see the same?
plugins/woocommerce/includes/admin/importers/views/html-product-csv-import-form.php
Outdated
Show resolved
Hide resolved
Co-authored-by: Barry Hughes <3594411+barryhughes@users.noreply.github.com>
@barryhughes Most likely you are testing with the old |
I think you must be right (or, a cached version was being consumed). Working now, thanks! |
Hi, what would be the code to set a default encoding? |
All Submissions:
Changes proposed in this Pull Request:
The product importer tries to guess the string encoding of the supplied CSV file and falls back to UTF-8 when it's unable to do so. This is unreliable since it's not really possible to success in the detection for encodings like ISO-8859-1 (for European languages) which are incompatible with UTF-8, thus the strings containing non-ASCII characters in these files will be imported incorrectly.
This pull request adds a new dropdown to the first step of the product importer, under "advanced options", that allows to manually select the character encoding of the input file from the list of encodings supported by the server. Default is "Autodetect" which just relies on autodetection, which was the previous behavior.
Inspired by: #36416 (which can't be accepted because it uses
utf8_encode
, which always uses ISO-8859-1 and is deprecated).How to test the changes in this Pull Request:
Go to the products page, click the "Import" button on the top of the page.
Notice the new "Character encoding of the file" dropdown, leave it as "Autodetect".
Import the file you just created (you need to manually map the "Nom" column to "Name")
Notice how in the "Column mapping" screen there's no sample value for the "Nom" (name) column, and no name for the last column ("Tarif régulier")
Notice how the name of the product has "?" character instead of "é" in the products page.
Notice how now the product name appears as a sample for the name column in "Column mapping" and the last column has the proper "Tarif régulier" name in the column mappings screen.
Notice how the name of the product is now correct in the products page.
Other information:
pnpm --filter=<project> changelog add
?FOR PR REVIEWER ONLY: