Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal characters in datasets #16

Closed
col-panic opened this issue Apr 15, 2015 · 13 comments
Closed

Illegal characters in datasets #16

col-panic opened this issue Apr 15, 2015 · 13 comments

Comments

@col-panic
Copy link

It seems that there is some problems with the created datasets when it comes to character encoding.

We face entries like

11:15:52.795 [main] WARN  java.lang.Throwable - java.sql.SQLException: Incorrect string value: '\xC2\x92s gl...' for column 'DSCR' at row 1
11:15:59.071 [main] WARN  java.lang.Throwable - java.sql.SQLException: Incorrect string value: '\xC2\x928 af...' for column 'DSCR' at row 1
11:16:11.601 [main] WARN  java.lang.Throwable - java.sql.SQLException: Incorrect string value: '\xC2\x9636 M...' for column 'DSCR' at row 1
11:16:12.961 [main] WARN  java.lang.Throwable - java.sql.SQLException: Incorrect string value: '\xC2\x96 Jog...' for column 'DSCR' at row 1
11:16:29.075 [main] WARN  java.lang.Throwable - java.sql.SQLException: Incorrect string value: '\xC2\x89 100...' for column 'DSCR' at row 1
11:17:15.998 [main] WARN  java.lang.Throwable - java.sql.SQLException: Incorrect string value: '\xE2\x89\xA4 30...' for column 'LIMITATION_TXT' at row 1
11:17:16.009 [main] WARN  java.lang.Throwable - java.sql.SQLException: Incorrect string value: '\xE2\x89\xA4 30...' for column 'LIMITATION_TXT' at row 1
11:17:16.023 [main] WARN  java.lang.Throwable - java.sql.SQLException: Incorrect string value: '\xE2\x89\xA4 30...' for column 'LIMITATION_TXT' at row 1
11:17:16.033 [main] WARN  java.lang.Throwable - java.sql.SQLException: Incorrect string value: '\xE2\x89\xA4 30...' for column 'LIMITATION_TXT' at row 1
11:17:25.245 [main] WARN  java.lang.Throwable - java.sql.SQLException: Incorrect string value: '\xE2\x89\xA5 30...' for column 'LIMITATION_TXT' at row 1
11:17:25.256 [main] WARN  java.lang.Throwable - java.sql.SQLException: Incorrect string value: '\xE2\x89\xA5 30...' for column 'LIMITATION_TXT' at row 1
11:17:26.249 [main] WARN  java.lang.Throwable - java.sql.SQLException: Incorrect string value: '\xE2\x89\xA5 16...' for column 'LIMITATION_TXT' at row 1
11:17:26.277 [main] WARN  java.lang.Throwable - java.sql.SQLException: Incorrect string value: '\xE2\x89\xA5 16...' for column 'LIMITATION_TXT' at row 1
11:17:30.666 [main] WARN  java.lang.Throwable - java.sql.SQLException: Incorrect string value: '\xCE\xB1) ni...' for column 'LIMITATION_TXT' at row 1

on certain mysql databases. I could track it to entries like

WARNING SCHAR SEMPER Cookie-O�s glutenfrei 150 g
WARNING SCHAR CER�8 after sting Roll-on 20 ml
WARNING SCHAR DermaSilk Set Body + Strumpfhöschen 24�36 Mon (98)
WARNING SCHAR Inkosport Activ Pro 80 Himbeer � Joghurt Ds 750g
WARNING SCHAR Ethacridin lactat 1� 100ml

vi shows the data e.g. like this

3731928     <DSCRD>Ethacridin lactat 1<89> 100ml                        </DSCRD>
3731929     <DSCRF>Ethacridin lactat 1<89> 100ml                        </DSCRF>
3731930     <SORTD>ETHACRIDIN LACTAT 1<89> 100ML                        </SORTD>
3731931     <SORTF>ETHACRIDIN LACTAT 1<89> 100ML                        </SORTF>

where the <89> is an unwritable sign.

Could you please ensure, that only valid characters are used in the xml files?

@zdavatz
Copy link
Owner

zdavatz commented Apr 24, 2015

what charset is you computer set to? Characters all display correctly on my Mac and Linux.

@zdavatz
Copy link
Owner

zdavatz commented Apr 26, 2015

/tmp/oddb2xml> file -i oddb_article.xml
oddb_article.xml: application/xml; charset=utf-8
/tmp/oddb2xml> file -i oddb_product.xml
oddb_product.xml: application/xml; charset=utf-8

@col-panic
Copy link
Author

I am currently very busy, please give me some time for feedback!

@zdavatz
Copy link
Owner

zdavatz commented Apr 26, 2015

ok, sure.

@ngiger
Copy link
Contributor

ngiger commented Apr 27, 2015

I found the problem. I must convert each line from ISO-8859-9 (transfer.dat) to UTF-8 before exctracting the name. Should be fixed soon.

@zdavatz
Copy link
Owner

zdavatz commented Apr 27, 2015

Ok, this will come with version 2.0.6 latest by Thursday for the event: http://hin.ch/anlass-mediupdate

@col-panic
Copy link
Author

Thanks a lot @ngiger 👍

@zdavatz
Copy link
Owner

zdavatz commented Apr 28, 2015

Das 2.0.6 gem ist draussen. Kannst Du bitte testen Marco ob bei Dir jetzt alles geht. Danke für Dein Feedback.

@col-panic
Copy link
Author

Okay, das sieht jetzt besser aus!

<DSCR>Ethacridin lactat 1‰ 100ml</DSCR>

werde noch den import testen 👍

@zdavatz
Copy link
Owner

zdavatz commented Apr 29, 2015

Danke Dir!

@zdavatz
Copy link
Owner

zdavatz commented Apr 29, 2015

Version 2.0.8 ist draussen. Bitte testen.

@zdavatz zdavatz closed this as completed Apr 29, 2015
@col-panic
Copy link
Author

SIeht gut aus, nach Update keine Probleme mehr! Merci!

@zdavatz
Copy link
Owner

zdavatz commented May 2, 2015

nicht vergessen immer vor dem laufen lassen des Jobs den /downloads Ordner komplett zu löschen mit "rm -r downloads".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants