Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

staedi api : EDIStreamReader fail to parse if segment element has accent marks . #454

Closed
sharukhshaik126 opened this issue Apr 24, 2024 · 12 comments
Labels

Comments

@sharukhshaik126
Copy link

sharukhshaik126 commented Apr 24, 2024

Describe the bug
In my x12 EDI file I have NM1*IL segment which contains alphabets accents marks & it not parse element by EDIstreamreader class.

To Reproduce
Parse any EDI x12 file with accent marks
Eg: NM1*IL*1*VíAK SéVAG*KIAZDEN****34*673459754~

Expected behavior
Edistreamreader has to parse elements which has accent marks in both linux and windows env.

Additional context
Add any other context about the problem here.

@MikeEdgar
Copy link
Member

Hi @sharukhshaik126 , what is the character encoding of the data you are reading? You can use one of the overloads of EDIInputFactory#createEDIStreamReader to provide the correct encoding for your input.

@sharukhshaik126
Copy link
Author

sharukhshaik126 commented Apr 25, 2024

@MikeEdgar actually when i tried to parse edi file with accent marks it throw below exception :
Unable to Stream EDI File : Error parsing input in segment NM1 at position 767, element 2
the same segment with accent marks working fine in windows env and it throws above exception in linux env, even if not specify char set encoding to edistreamreader class.

my code block :

EDIInputFactory inputFactory = EDIInputFactory.newFactory();
inputFactory.setProperty(EDIInputFactory.EDI_IGNORE_EXTRANEOUS_CHARACTERS, true);
InputStream inputStream = new FileInputStream(sourceFile);
EDIStreamReader ediReader = inputFactory.createEDIStreamReader(inputStream);

@MikeEdgar
Copy link
Member

@sharukhshaik126 you'll need to provide the name of the character encoding when you create the EDIStreamReader.

Something like this (I am only guessing on the encoding in this example):

EDIStreamReader ediReader = inputFactory.createEDIStreamReader(inputStream, "ISO-8859-1");

@sharukhshaik126
Copy link
Author

Let me try with different encoding "UTF-8" and ISO-8859-1 , Thanks @MikeEdgar

@MikeEdgar
Copy link
Member

FYI that the default is UTF-8 if nothing is given.

@sharukhshaik126
Copy link
Author

Thanks , will try to define exact charset encoding to parse it. will update you here.

@MikeEdgar
Copy link
Member

@sharukhshaik126 any luck?

@sharukhshaik126
Copy link
Author

sharukhshaik126 commented Apr 29, 2024

@MikeEdgar No in linux still it throws exception after setting encoding to UTF-8
fail to parse edi file : /tmp/test/Halin_C_frdsw.txt | UTF-8
io.xlate.edi.stream.EDIStreamException: Error parsing input in segment NM1 at position 767, element 2
at io.xlate.edi.internal.stream.StaEDIStreamReader.lambda$executeTask$1(StaEDIStreamReader.java:186)
at io.xlate.edi.internal.ThrowingRunnable.run(ThrowingRunnable.java:19)
at io.xlate.edi.internal.stream.StaEDIStreamReader.executeTask(StaEDIStreamReader.java:181)
at io.xlate.edi.internal.stream.StaEDIStreamReader.nextEvent(StaEDIStreamReader.java:212)
at io.xlate.edi.internal.stream.StaEDIStreamReader.next(StaEDIStreamReader.java:241)
at com.mage.edireader.EDIFileParser.main(EDIFileParser.java:79)
Caused by: java.nio.charset.MalformedInputException: Input length = 1
at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:274)
at io.xlate.edi.internal.stream.tokenization.Lexer.readCharacter(Lexer.java:339)
at io.xlate.edi.internal.stream.tokenization.Lexer.readCharacterUnchecked(Lexer.java:313)
at io.xlate.edi.internal.stream.tokenization.Lexer.parse(Lexer.java:192)
at io.xlate.edi.internal.stream.tokenization.Lexer.parse(Lexer.java:174)
at io.xlate.edi.internal.ThrowingRunnable.run(ThrowingRunnable.java:17)
... 4 more
Exception in thread "main" io.xlate.edi.stream.EDIStreamException: Exception flushing output stream in segment NM1 at position 767, element 2
at io.xlate.edi.internal.stream.StaEDIStreamWriter.flush(StaEDIStreamWriter.java:240)
at io.xlate.edi.internal.stream.StaEDIStreamWriter.close(StaEDIStreamWriter.java:230)
at com.mage.edireader.EDIFileParser.main(EDIFileParser.java:187)
Caused by: java.io.IOException: Stream Closed
at java.base/java.io.FileOutputStream.writeBytes(Native Method)
at java.base/java.io.FileOutputStream.write(FileOutputStream.java:354)
at java.base/sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233)
at java.base/sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:312)
at java.base/sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:316)
at java.base/sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:153)
at java.base/java.io.OutputStreamWriter.flush(OutputStreamWriter.java:251)
at io.xlate.edi.internal.stream.StaEDIStreamWriter.flush(StaEDIStreamWriter.java:237)
... 2 more

@MikeEdgar
Copy link
Member

Did you also try with ISO-8859-1 ? As far as I can tell it does include í and é characters.

@MikeEdgar
Copy link
Member

@sharukhshaik126 can you possibly provide a test file without sensitive data that I can use to reproduce the issue? Using the sample text you gave originally I haven't been able to trigger any errors.

@sharukhshaik126
Copy link
Author

@MikeEdgar after using charger encode ad ISO-8859-1
The EDI file parsed successfully

@MikeEdgar
Copy link
Member

Great news! Thanks for the update @sharukhshaik126 . I'll go ahead and close the issue, but please re-open if this still isn't resolved in your opinion and we'll discuss further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants