Similar to #104, Fields are missing on output in default mode #335

clalcorn · 2018-11-15T20:28:17Z

Similar to previously reported #104 , There are fields missing in complex arrays. For example:

C:\Users\Glyph>json2csv -V 4.2.1
C:\Users\Glyph>echo [{"a":1,"b":2}, {"a":3,"b":4},{"c":5}] | json2csv "a","b" 1,2 3,4 ,

Expected Behavior is:
C:\Users\Glyph>echo [{"a":1,"b":2}, {"a":3,"b":4},{"c":5}] | json2csv "a","b","c" 1,2, 3,4, ,,5

The text was updated successfully, but these errors were encountered:

knownasilya · 2018-11-15T20:47:43Z

How do you feel about submitting a failing test to show this doesn't work? Also, did you give v4.3.0 a try?

clalcorn · 2018-11-15T20:50:19Z

Sure, you might need to help me, I am not sure how to do that.

knownasilya · 2018-11-15T20:55:09Z

You can start here https://github.com/zemirco/json2csv/blob/master/test/JSON2CSVParser.js#L7 and create a test and fixture data based on your setup. Submit a pull request, and if you are stuck, I can guide you from there.

clalcorn · 2018-11-15T21:01:16Z

So, I'm am self admittedly the worst noob here. So I am totally lost as to what to do with that link. I'll try around here a bit more. But I think it is pretty obvious if you run the above command, you will see the first output. If you downgrade to 3.6.1 or even 4.1.1, the output is what is listed in the expected behavior.

But I will try and see if I can sort out the test.

knownasilya · 2018-11-15T21:10:49Z

It's alright if things aren't coming together, feel free to ask questions 😃

juanjoDiaz · 2018-11-15T21:10:54Z

Hei,

Nothing to be fixed. That's the expected behaviour. 😄

json2csv CLI process the elements one at a time (to keep memory footprint low and performance high), so if you don't pass the fields option it assumes the keys in the first item as the keys to process.

As an alternative, you can load the entire JSON object in memory using the --no-streaming flag, which reads all the keys from all the objects before doing the actual processing. You'll get all the keys but it will be a bit slower and you'll consume a lot of memory during the entire processing time (around 2x the size of the JSON).

[{"a":1,"b":2}, {"a":3,"b":4},{"c":5}] | json2csv --no-streaming  # "a","b","c" 1,2, 3,4, ,,5

knownasilya · 2018-11-15T21:12:25Z

@juanjoDiaz I wasn't aware that was a change that was made, seems like a note in the README about that might be helpful for future users.

clalcorn · 2018-11-15T22:03:39Z

@juanjoDiaz @knownasilya Thanks to both of you. Makes sense. I did try the --no-streaming previously and gave up because it had been running for about an hour :) We are working with some seriously huge json files. I tried it just now with a smaller file and found what you mentioned. So thanks!

For us then we will just have to work with a config file in and try to prepopulate the fields. For reference, our JSON is formed something like this:
{ "config_env": "master" "version": 1, "data": [{ "field1": "data", "field2": "info", "field3": ["tag","tag2","tag3"], "field5": "string" }, { "field1": "data", "field2": "info", "field2a": "string", "field3": ["tag","tag2","tag3"], "field5": "string", "field6": "string" }], "some_info": "string", "time_stamp": "2018-08-11 21:00:00" }

So as you can see each node or what have you doesn't always have the same fields, though we do have an established schema, which rarely has additions. We are currently unwinding the data path: son2csv -i jsonfile.json -o jsonfile.csv -F -S . -u data -c fieldsConfig.json (I didn't see a way to unwind two different paths, is that possible?).

Since a normal file is 1.5 million lines, I think the --no-streaming option is out, and we'll just have to use the -c config file option and keep an up to date schema, or perhaps pre-scan the json file for a schema and then update the fieldsConfig.json file. Something like that. Any ideas or thoughts appreciated!

juanjoDiaz mentioned this issue Nov 17, 2018

Document how fields work in the CLI #336

Merged

knownasilya closed this as completed in #336 Nov 17, 2018

juanjoDiaz mentioned this issue Feb 7, 2019

Headers from multiple objects #354

Closed

5 tasks

himanshu-pathak mentioned this issue Sep 11, 2023

the first object in array defines the schema and results in missing fields in csv output #585

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Similar to #104, Fields are missing on output in default mode #335

Similar to #104, Fields are missing on output in default mode #335

clalcorn commented Nov 15, 2018

knownasilya commented Nov 15, 2018 •

edited

Loading

clalcorn commented Nov 15, 2018

knownasilya commented Nov 15, 2018 •

edited

Loading

clalcorn commented Nov 15, 2018

knownasilya commented Nov 15, 2018

juanjoDiaz commented Nov 15, 2018

knownasilya commented Nov 15, 2018

clalcorn commented Nov 15, 2018

Similar to #104, Fields are missing on output in default mode #335

Similar to #104, Fields are missing on output in default mode #335

Comments

clalcorn commented Nov 15, 2018

knownasilya commented Nov 15, 2018 • edited Loading

clalcorn commented Nov 15, 2018

knownasilya commented Nov 15, 2018 • edited Loading

clalcorn commented Nov 15, 2018

knownasilya commented Nov 15, 2018

juanjoDiaz commented Nov 15, 2018

knownasilya commented Nov 15, 2018

clalcorn commented Nov 15, 2018

knownasilya commented Nov 15, 2018 •

edited

Loading

knownasilya commented Nov 15, 2018 •

edited

Loading