Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance of csvwriter.processRecord() with conversions #75

Closed
blackfoxoh opened this issue Feb 22, 2016 · 10 comments
Closed

performance of csvwriter.processRecord() with conversions #75

blackfoxoh opened this issue Feb 22, 2016 · 10 comments
Assignees
Milestone

Comments

@blackfoxoh
Copy link

Hi
I've finished an implementation where I use CsvWriter (brand new final version 2.0 of your great lib) with an OutputValueSwitch and then write out rows by providing them as a Map with the method processRecord().
One rowtype of the switch has 57 columns and 3 of them apply a TrimConversion. The Map I'm passing in has 41 entrys so even some columns remain empty.
The prcoessRecord() takes approximately 1700ms for writing one row which makes it very long-running.
A second rowtype of the switch has 16 columns without any conversions, the input map is nearly the same as for the former rowtype (so there are much more values than needed for this rowtype). This second rowtype takes at most one or two millicesonds.
If I comment out the conversions for the first rowtype it is ultrafast as well. Therefore I think the problem might not be specific to processing a map nor the fact of using an OutputValueSwitch but just in case it matters I described the whole use case...

If you need any further details just let me know

@jbax
Copy link
Member

jbax commented Feb 22, 2016

Thanks! Can you give a sample of code + input that simulates the problem so I can be sure I'm testing using a valid scenario?

@blackfoxoh
Copy link
Author

this nearly drove me crazy now...
Tryed to provide a testcase to you and couldn't reproduce it. Finally it turned out the problem was 30 centimeters in front of the monitor: I had updated my dependencies in the IDE but in the runtime there still was an snapshotbuild of version 2.0.0 which was a couple of days older which seems to be having the performanceproblem with conversions. With the final release in place everything works fine!
Sorry for opening an issue.

@blackfoxoh
Copy link
Author

By the way I have written a little Conversion that maybe is useful for others as well - feel free to add it to the existing built in Conversions for a future release if you think its worth it.
My use case for this is a CSV-export with the target application requiring a maximum length in some fileds...

public class TrimToLengthConversion extends TrimConversion {

   int length;

   /**
    * Constructor
    *
    * @param length
    *           the maximum length the value is allowed to have
    */
   public TrimToLengthConversion(int length) {
      this.length = length;
   }

   /**
    * cuts off from the end of the string if it doesn't fit into the required length after trim()
    *
    * @see com.univocity.parsers.conversions.TrimConversion#execute(java.lang.String)
    */
   @Override
   public String execute(String input) {
      String value = super.execute(input);
      if (value.length() > length) {
         value = value.substring(0, length).trim();
      }
      return value;
   }

}

and some unittests (junit :-)):

import static org.junit.Assert.assertEquals;

import org.junit.Test;

public class TrimToLengthConversionTest {

   private static final String TEST_INPUT17 = "Dies ist ein Text";
   private static final String TEST_INPUT18 = "Dies ist ein Text ";
   private static final String TEST_INPUT34 = " Dies ist ein Text mit 34 Zeichen. ";
   private static final String EXPECTED_O8 = "Dies ist";
   private static final String EXPECTED_17 = "Dies ist ein Text";

   @Test
   public void testExecuteString1() {
      int length = 8;
      TrimToLengthConversion conv = new TrimToLengthConversion(length);
      String output = conv.execute(TEST_INPUT17);
      assertEquals(EXPECTED_O8, output);
      assertEquals(length, output.length());

      output = conv.execute(TEST_INPUT34);
      assertEquals(EXPECTED_O8, output);
      assertEquals(length, output.length());
   }

   @Test
   public void testExecuteString2() {
      TrimToLengthConversion conv = new TrimToLengthConversion(17);
      String output = conv.execute(TEST_INPUT17);
      assertEquals("Input has been cut off, although the required maximum length hasn't been exceeded", TEST_INPUT17, output);

      output = conv.execute(TEST_INPUT34);
      assertEquals(EXPECTED_17, output); // test the input gets trim()ed first and checked for cut off afterwards
   }

   @Test
   public void testExecuteString3() {
      TrimToLengthConversion conv = new TrimToLengthConversion(18);
      String output = conv.execute(TEST_INPUT34);
      assertEquals(EXPECTED_17, output); // first trim() input, then cut off and finally should be trim()ed again to remove whitespaces now at the end. the result is shorter than the specified maximum length of 18 now...

      output = conv.execute(TEST_INPUT18);
      assertEquals("Input didn't got trim()ed (without exceeding maximum length)", TEST_INPUT17, output);
   }

}

@jbax
Copy link
Member

jbax commented Feb 23, 2016

Thank you for the suggestion! I'll add the option to trim to a given length in the TrimConversion itself and also update the annotation class to take the maximum length as an optional argument.
Cheers!

@jbax jbax modified the milestones: 2.0.1, 2.1.0 Feb 23, 2016
@jbax jbax self-assigned this Feb 23, 2016
jbax added a commit that referenced this issue Mar 8, 2016
Adding trim-to-length support in Conversions and annotation.
@jbax
Copy link
Member

jbax commented Mar 8, 2016

Added support for trim to length. Now the existing TrimConversion and Trim annotation allow you to specify a maximum length which will limit the conversion result.

Available on version 2.1.0-SNAPSHOT

@jbax jbax modified the milestones: 2.0.2, 2.1.0 Apr 4, 2016
@jbax
Copy link
Member

jbax commented Apr 4, 2016

Trim to length has been included in version 2.0.2 already. No need to wait for 2.1.0 as this was a minimal change.

@blackfoxoh
Copy link
Author

Hi,
I've just tryed to migrate from my own TrimToLengthConversion to the now built-in feature and discovered one little issue:
The main goal of a trim-Conversion is to cut off whitespaces at the beginning/end of the input. When trimming to a specified length this can result in whitespaces at the end of the resulting string. Could be fixed with return input.substring(0, length).trim(); (add a second trim() in the inner if-block).
The unittests posted above should cover this usecase within the last of the tests...

jbax added a commit that referenced this issue Sep 7, 2017
…ving whitespaces at end of truncated value.
@jbax
Copy link
Member

jbax commented Sep 7, 2017

Thanks for that! I've just fixed this to behave as it should and released a 2.5.5-SNAPSHOT build to include this adjustment.

@akrymskiy
Copy link

Hi - I commented on the commit itself, but cross-posting here just in case. The change seems to have broken TrimConversion for empty (0-length strings). We are getting following when writing out file with empty strings in fields which have @Trim annotation applied:

"causeClass": "java.lang.StringIndexOutOfBoundsException",
"causeMessage": [
"String index out of range: 0"
],
"causeStackTrace": [
"java.lang.String.charAt(String.java:658)",
"com.univocity.parsers.conversions.TrimConversion.execute(TrimConversion.java:64)",
"com.univocity.parsers.conversions.TrimConversion.revert(TrimConversion.java:92)",
"com.univocity.parsers.conversions.TrimConversion.revert(TrimConversion.java:25)",

@jbax
Copy link
Member

jbax commented Sep 20, 2017

Thanks @Avksoft I've fixed and released a 2.5.6-SNAPSHOT build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants