Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected content conversion for a hex string data #47

Closed
alexeygrigorev opened this issue Jan 3, 2017 · 3 comments
Closed

Unexpected content conversion for a hex string data #47

alexeygrigorev opened this issue Jan 3, 2017 · 3 comments

Comments

@alexeygrigorev
Copy link

I'm reading the following csv file:

uuid,document_id,timestamp,platform,geo_location,traffic_source
1fd5f051fba643,120,31905835,1,RS,2
8557aa9004be3b,120,32053104,1,VN>44,2
c351b277a358f0,120,54013023,1,KR>12,1
8205775c5387f9,120,44196592,1,IN>16,2
9cb0ccd8458371,120,65817371,1,US>CA>807,2
2aa611f32875c7,120,71495491,1,CA>ON,2
f55a6eaf2b34ab,120,73309199,1,BR>27,2
cc01b582c8cbff,120,50033577,1,CA>BC,2
6c802978b8dd4d,120,66590306,1,CA>ON,2

But paratext reads it as following:

selection_167

The uuid conversion is totally unexpected - and the issue persists even if I say text_names=['uuid']

@alexeygrigorev
Copy link
Author

#31 seems to solve it

selection_168

(although there's an issue with parsing the last column)

@deads
Copy link
Contributor

deads commented Feb 17, 2017

Thank you for reporting your issue. Indeed, #31 solves the issue, but we are waiting for the PR issuer to remerge so we can run the tests on the PR before merging into master.

Most of the regression tests assume all data is double-quoted because this is what I do for most of the data files I used in a production environment. paratext supports backslash-escape sequences so in theory any arbitrary byte sequence can be represented.

If you have a very messy CSV file, you can use: paratext.serial.write_frame, which will write the data out using a configurable backslash escaping scheme (writing arbitrary 8-bit, printable ASCII, UTF-8, etc). In fact, the regression tests generate arbitrary UTF-8 and byte data, save in all possible formats, and read it back in. However, the key assumption to get this to work is that all non-numeric data is backslash-escaped.

@deads
Copy link
Contributor

deads commented Feb 20, 2017

This issue has been resolved in the latest master.

@deads deads closed this as completed Feb 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants