Description
Apache Hop version?
2.12
Java version?
18
Operating system
Windows
What happened?
I used transform "Get data from XML" to process a file that is Windows-1252 encoding and there is a special character in it, an error happened as below no matter what encoding I used unless I specified encoding in the XML file. (No encoding info in the XML decoration)
Error:
org.dom4j.DocumentException: Error on line 13 of document file:///C:/workspace/hop/windows-1252 : Invalid byte 1 of 1-byte UTF-8 sequence.
I viewed the source code and I think that I found the root cause.
As the link below, it seems that it uses read function of SAXReader incorrectly.
As document said, the second parameter is systemId not encoding.

It should use function setEncoding to specify encoding of input source before calling read function.
Please feel free to correct me if something wrong.
Note: XML input stream (Stax) is working with specified encoding.
Issue Priority
Priority: 2
Issue Component
Component: Transforms