Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a string encoding selector to the product importer #36819

Merged
merged 5 commits into from Feb 24, 2023

Conversation

Konamiman
Copy link
Contributor

All Submissions:

Changes proposed in this Pull Request:

The product importer tries to guess the string encoding of the supplied CSV file and falls back to UTF-8 when it's unable to do so. This is unreliable since it's not really possible to success in the detection for encodings like ISO-8859-1 (for European languages) which are incompatible with UTF-8, thus the strings containing non-ASCII characters in these files will be imported incorrectly.

This pull request adds a new dropdown to the first step of the product importer, under "advanced options", that allows to manually select the character encoding of the input file from the list of encodings supported by the server. Default is "Autodetect" which just relies on autodetection, which was the previous behavior.

Inspired by: #36416 (which can't be accepted because it uses utf8_encode, which always uses ISO-8859-1 and is deprecated).

image

image

  • This PR is a very minor change/addition and does not require testing instructions (if checked you can ignore/remove the next section).

How to test the changes in this Pull Request:

  1. Create a product dump file that has non-ASCII character both in the row headers and in the product names, save it using the ISO-8859-1 encoding. Example file contents, supplied in a comment in Fix: France product import issue as France main language #36416:
ID,Nom,Stock,Tarif régulier
555,Pét' Nat' Pépin Batch 4 - Domaine Achillée *,123
  1. Go to the products page, click the "Import" button on the top of the page.

  2. Notice the new "Character encoding of the file" dropdown, leave it as "Autodetect".

  3. Import the file you just created (you need to manually map the "Nom" column to "Name")

  • Notice how in the "Column mapping" screen there's no sample value for the "Nom" (name) column, and no name for the last column ("Tarif régulier")

  • Notice how the name of the product has "?" character instead of "é" in the products page.

  1. Repeat the import but this time select ISO-8859-1 (or Windows-1252) in the encoding selection dropdown.
  • Notice how now the product name appears as a sample for the name column in "Column mapping" and the last column has the proper "Tarif régulier" name in the column mappings screen.

  • Notice how the name of the product is now correct in the products page.

  1. Repeat but this time converting the file to UTF-8 and leaving the encoding as "Autodetect", UTF-8 was already supported and it should still work.

Other information:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes, as applicable?
  • Have you created a changelog file for each project being changed, ie pnpm --filter=<project> changelog add?

FOR PR REVIEWER ONLY:

  • I have reviewed that everything is sanitized/escaped appropriately for any SQL or XSS injection possibilities. I made sure Linting is not ignored or disabled.

@Konamiman Konamiman self-assigned this Feb 13, 2023
@Konamiman Konamiman requested review from a team and barryhughes and removed request for a team February 13, 2023 16:43
@github-actions github-actions bot added the plugin: woocommerce Issues related to the WooCommerce Core plugin. label Feb 13, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Feb 13, 2023

Test Results Summary

Commit SHA: 6cd66c5

Test 🧪Passed ✅Failed 🚨Broken 🚧Skipped ⏭️Unknown ❔Total 📊Duration ⏱️
API Tests25900202610m 49s
E2E Tests189006019516m 37s

To view the full API test report, click here.
To view the full E2E test report, click here.
To view all test reports, visit the WooCommerce Test Reports Dashboard.

@codecov
Copy link

codecov bot commented Feb 14, 2023

Codecov Report

Merging #36819 (6cd66c5) into trunk (c8074a7) will decrease coverage by 0.0%.
The diff coverage is 25.6%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             trunk   #36819     +/-   ##
==========================================
- Coverage     46.7%    46.7%   -0.0%     
- Complexity   17178    17182      +4     
==========================================
  Files          429      429             
  Lines        64779    64791     +12     
==========================================
+ Hits         30242    30245      +3     
- Misses       34537    34546      +9     
Impacted Files Coverage Δ
...mmerce/includes/admin/class-wc-admin-importers.php 0.0% <0.0%> (ø)
...rters/class-wc-product-csv-importer-controller.php 36.1% <28.6%> (-0.1%) ⬇️
.../includes/import/class-wc-product-csv-importer.php 74.9% <44.4%> (-0.7%) ⬇️

Copy link
Member

@barryhughes barryhughes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Konamiman things work as you described during the import preview stage (when explictly selecting ISO-8859-1 as the encoding), but after import the product title looks awry in both the product list table and the product editor:

P?t’ Nat’ P?pin Batch 4 – Domaine Achill?e *

Instead of:

Pét' Nat' Pépin Batch 4 - Domaine Achillée *

I did verify that the source file was correctly saved with the relevant encoding first of all, ie:

% file -I samplecsv8859-1.csv
samplecsv8859-1.csv: text/plain; charset=iso-8859-1

Do you see the same?

Co-authored-by: Barry Hughes <3594411+barryhughes@users.noreply.github.com>
@Konamiman
Copy link
Contributor Author

@barryhughes Most likely you are testing with the old wc-product-import.js, please run pnpm run build and try again.

@barryhughes
Copy link
Member

I think you must be right (or, a cached version was being consumed). Working now, thanks!

@barryhughes barryhughes merged commit af7c3f3 into trunk Feb 24, 2023
@barryhughes barryhughes deleted the add/encoding-selector-to-product-importer branch February 24, 2023 17:08
@github-actions github-actions bot added this to the 7.6.0 milestone Feb 24, 2023
@Flyerg
Copy link

Flyerg commented Apr 13, 2023

Hi, what would be the code to set a default encoding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plugin: woocommerce Issues related to the WooCommerce Core plugin.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants