staedi api : EDIStreamReader fail to parse if segment element has accent marks . #454

sharukhshaik126 · 2024-04-24T21:01:45Z

Describe the bug
In my x12 EDI file I have NM1*IL segment which contains alphabets accents marks & it not parse element by EDIstreamreader class.

To Reproduce
Parse any EDI x12 file with accent marks
Eg: NM1*IL*1*VíAK SéVAG*KIAZDEN****34*673459754~

Expected behavior
Edistreamreader has to parse elements which has accent marks in both linux and windows env.

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

MikeEdgar · 2024-04-24T21:05:39Z

Hi @sharukhshaik126 , what is the character encoding of the data you are reading? You can use one of the overloads of EDIInputFactory#createEDIStreamReader to provide the correct encoding for your input.

sharukhshaik126 · 2024-04-25T05:45:13Z

@MikeEdgar actually when i tried to parse edi file with accent marks it throw below exception :
Unable to Stream EDI File : Error parsing input in segment NM1 at position 767, element 2
the same segment with accent marks working fine in windows env and it throws above exception in linux env, even if not specify char set encoding to edistreamreader class.

my code block :

EDIInputFactory inputFactory = EDIInputFactory.newFactory();
inputFactory.setProperty(EDIInputFactory.EDI_IGNORE_EXTRANEOUS_CHARACTERS, true);
InputStream inputStream = new FileInputStream(sourceFile);
EDIStreamReader ediReader = inputFactory.createEDIStreamReader(inputStream);

MikeEdgar · 2024-04-25T11:16:17Z

@sharukhshaik126 you'll need to provide the name of the character encoding when you create the EDIStreamReader.

Something like this (I am only guessing on the encoding in this example):

EDIStreamReader ediReader = inputFactory.createEDIStreamReader(inputStream, "ISO-8859-1");

sharukhshaik126 · 2024-04-25T11:28:56Z

Let me try with different encoding "UTF-8" and ISO-8859-1 , Thanks @MikeEdgar

MikeEdgar · 2024-04-25T11:30:59Z

FYI that the default is UTF-8 if nothing is given.

sharukhshaik126 · 2024-04-25T11:53:01Z

Thanks , will try to define exact charset encoding to parse it. will update you here.

MikeEdgar · 2024-04-29T11:58:49Z

@sharukhshaik126 any luck?

sharukhshaik126 · 2024-04-29T19:43:12Z

@MikeEdgar No in linux still it throws exception after setting encoding to UTF-8
fail to parse edi file : /tmp/test/Halin_C_frdsw.txt | UTF-8
io.xlate.edi.stream.EDIStreamException: Error parsing input in segment NM1 at position 767, element 2
at io.xlate.edi.internal.stream.StaEDIStreamReader.lambda$executeTask$1(StaEDIStreamReader.java:186)
at io.xlate.edi.internal.ThrowingRunnable.run(ThrowingRunnable.java:19)
at io.xlate.edi.internal.stream.StaEDIStreamReader.executeTask(StaEDIStreamReader.java:181)
at io.xlate.edi.internal.stream.StaEDIStreamReader.nextEvent(StaEDIStreamReader.java:212)
at io.xlate.edi.internal.stream.StaEDIStreamReader.next(StaEDIStreamReader.java:241)
at com.mage.edireader.EDIFileParser.main(EDIFileParser.java:79)
Caused by: java.nio.charset.MalformedInputException: Input length = 1
at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:274)
at io.xlate.edi.internal.stream.tokenization.Lexer.readCharacter(Lexer.java:339)
at io.xlate.edi.internal.stream.tokenization.Lexer.readCharacterUnchecked(Lexer.java:313)
at io.xlate.edi.internal.stream.tokenization.Lexer.parse(Lexer.java:192)
at io.xlate.edi.internal.stream.tokenization.Lexer.parse(Lexer.java:174)
at io.xlate.edi.internal.ThrowingRunnable.run(ThrowingRunnable.java:17)
... 4 more
Exception in thread "main" io.xlate.edi.stream.EDIStreamException: Exception flushing output stream in segment NM1 at position 767, element 2
at io.xlate.edi.internal.stream.StaEDIStreamWriter.flush(StaEDIStreamWriter.java:240)
at io.xlate.edi.internal.stream.StaEDIStreamWriter.close(StaEDIStreamWriter.java:230)
at com.mage.edireader.EDIFileParser.main(EDIFileParser.java:187)
Caused by: java.io.IOException: Stream Closed
at java.base/java.io.FileOutputStream.writeBytes(Native Method)
at java.base/java.io.FileOutputStream.write(FileOutputStream.java:354)
at java.base/sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233)
at java.base/sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:312)
at java.base/sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:316)
at java.base/sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:153)
at java.base/java.io.OutputStreamWriter.flush(OutputStreamWriter.java:251)
at io.xlate.edi.internal.stream.StaEDIStreamWriter.flush(StaEDIStreamWriter.java:237)
... 2 more

MikeEdgar · 2024-04-29T20:26:39Z

Did you also try with ISO-8859-1 ? As far as I can tell it does include í and é characters.

MikeEdgar · 2024-05-03T20:10:55Z

@sharukhshaik126 can you possibly provide a test file without sensitive data that I can use to reproduce the issue? Using the sample text you gave originally I haven't been able to trigger any errors.

sharukhshaik126 · 2024-05-16T04:25:22Z

@MikeEdgar after using charger encode ad ISO-8859-1
The EDI file parsed successfully

MikeEdgar · 2024-05-16T10:37:17Z

Great news! Thanks for the update @sharukhshaik126 . I'll go ahead and close the issue, but please re-open if this still isn't resolved in your opinion and we'll discuss further.

MikeEdgar added the question label Apr 25, 2024

MikeEdgar closed this as completed May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

staedi api : EDIStreamReader fail to parse if segment element has accent marks . #454

staedi api : EDIStreamReader fail to parse if segment element has accent marks . #454

sharukhshaik126 commented Apr 24, 2024 •

edited by MikeEdgar

Loading

MikeEdgar commented Apr 24, 2024

sharukhshaik126 commented Apr 25, 2024 •

edited

Loading

MikeEdgar commented Apr 25, 2024

sharukhshaik126 commented Apr 25, 2024

MikeEdgar commented Apr 25, 2024

sharukhshaik126 commented Apr 25, 2024

MikeEdgar commented Apr 29, 2024

sharukhshaik126 commented Apr 29, 2024 •

edited

Loading

MikeEdgar commented Apr 29, 2024

MikeEdgar commented May 3, 2024

sharukhshaik126 commented May 16, 2024

MikeEdgar commented May 16, 2024

staedi api : EDIStreamReader fail to parse if segment element has accent marks . #454

staedi api : EDIStreamReader fail to parse if segment element has accent marks . #454

Comments

sharukhshaik126 commented Apr 24, 2024 • edited by MikeEdgar Loading

MikeEdgar commented Apr 24, 2024

sharukhshaik126 commented Apr 25, 2024 • edited Loading

MikeEdgar commented Apr 25, 2024

sharukhshaik126 commented Apr 25, 2024

MikeEdgar commented Apr 25, 2024

sharukhshaik126 commented Apr 25, 2024

MikeEdgar commented Apr 29, 2024

sharukhshaik126 commented Apr 29, 2024 • edited Loading

MikeEdgar commented Apr 29, 2024

MikeEdgar commented May 3, 2024

sharukhshaik126 commented May 16, 2024

MikeEdgar commented May 16, 2024

sharukhshaik126 commented Apr 24, 2024 •

edited by MikeEdgar

Loading

sharukhshaik126 commented Apr 25, 2024 •

edited

Loading

sharukhshaik126 commented Apr 29, 2024 •

edited

Loading