Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Similar to #104, Fields are missing on output in default mode #335

Closed
clalcorn opened this issue Nov 15, 2018 · 8 comments
Closed

Similar to #104, Fields are missing on output in default mode #335

clalcorn opened this issue Nov 15, 2018 · 8 comments

Comments

@clalcorn
Copy link

Similar to previously reported #104 , There are fields missing in complex arrays. For example:

C:\Users\Glyph>json2csv -V 4.2.1
C:\Users\Glyph>echo [{"a":1,"b":2}, {"a":3,"b":4},{"c":5}] | json2csv "a","b" 1,2 3,4 ,

Expected Behavior is:
C:\Users\Glyph>echo [{"a":1,"b":2}, {"a":3,"b":4},{"c":5}] | json2csv "a","b","c" 1,2, 3,4, ,,5

@knownasilya
Copy link
Collaborator

knownasilya commented Nov 15, 2018

How do you feel about submitting a failing test to show this doesn't work? Also, did you give v4.3.0 a try?

@clalcorn
Copy link
Author

Sure, you might need to help me, I am not sure how to do that.

@knownasilya
Copy link
Collaborator

knownasilya commented Nov 15, 2018

You can start here https://github.com/zemirco/json2csv/blob/master/test/JSON2CSVParser.js#L7 and create a test and fixture data based on your setup. Submit a pull request, and if you are stuck, I can guide you from there.

@clalcorn
Copy link
Author

So, I'm am self admittedly the worst noob here. So I am totally lost as to what to do with that link. I'll try around here a bit more. But I think it is pretty obvious if you run the above command, you will see the first output. If you downgrade to 3.6.1 or even 4.1.1, the output is what is listed in the expected behavior.

But I will try and see if I can sort out the test.

@knownasilya
Copy link
Collaborator

It's alright if things aren't coming together, feel free to ask questions 😃

@juanjoDiaz
Copy link
Collaborator

Hei,

Nothing to be fixed. That's the expected behaviour. 😄

json2csv CLI process the elements one at a time (to keep memory footprint low and performance high), so if you don't pass the fields option it assumes the keys in the first item as the keys to process.

As an alternative, you can load the entire JSON object in memory using the --no-streaming flag, which reads all the keys from all the objects before doing the actual processing. You'll get all the keys but it will be a bit slower and you'll consume a lot of memory during the entire processing time (around 2x the size of the JSON).

[{"a":1,"b":2}, {"a":3,"b":4},{"c":5}] | json2csv --no-streaming  # "a","b","c" 1,2, 3,4, ,,5

@knownasilya
Copy link
Collaborator

@juanjoDiaz I wasn't aware that was a change that was made, seems like a note in the README about that might be helpful for future users.

@clalcorn
Copy link
Author

@juanjoDiaz @knownasilya Thanks to both of you. Makes sense. I did try the --no-streaming previously and gave up because it had been running for about an hour :) We are working with some seriously huge json files. I tried it just now with a smaller file and found what you mentioned. So thanks!

For us then we will just have to work with a config file in and try to prepopulate the fields. For reference, our JSON is formed something like this:
{ "config_env": "master" "version": 1, "data": [{ "field1": "data", "field2": "info", "field3": ["tag","tag2","tag3"], "field5": "string" }, { "field1": "data", "field2": "info", "field2a": "string", "field3": ["tag","tag2","tag3"], "field5": "string", "field6": "string" }], "some_info": "string", "time_stamp": "2018-08-11 21:00:00" }

So as you can see each node or what have you doesn't always have the same fields, though we do have an established schema, which rarely has additions. We are currently unwinding the data path: son2csv -i jsonfile.json -o jsonfile.csv -F -S . -u data -c fieldsConfig.json (I didn't see a way to unwind two different paths, is that possible?).

Since a normal file is 1.5 million lines, I think the --no-streaming option is out, and we'll just have to use the -c config file option and keep an up to date schema, or perhaps pre-scan the json file for a schema and then update the fieldsConfig.json file. Something like that. Any ideas or thoughts appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants