Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when parsing a json array, [] object yielded at end #7

Closed
ilackarms opened this issue Sep 14, 2017 · 21 comments

Comments

@ilackarms
Copy link

commented Sep 14, 2017

i've noticed that when i parse an array, after yielding each of the objects in the array, an empty array object [] is yielded at the end

reproduce:

input:

{
    "kind": "PodList",
    "apiVersion": "v1",
    "metadata": {
        "selfLink": "/api/v1/pods",
        "resourceVersion": "1315"
    },
    "items": [
        {
            "metadata": {
                "name": "redis-master3",
                "namespace": "default",
                "selfLink": "/api/v1/pods/redis-master3?namespace=default",
                "uid": "1da148b4-cef5-11e4-ac24-3c970e4a436a",
                "resourceVersion": "1301",
                "creationTimestamp": "2015-03-20T13:34:48+02:00",
                "labels": {
                    "mylabel": "mylabelvalue",
                    "role": "pod"
                }
            },
            "spec": {
                "volumes": null,
                "containers": [
                    {
                        "name": "master",
                        "image": "dockerfile/redis",
                        "ports": [
                            {
                                "hostPort": 6379,
                                "containerPort": 6379,
                                "protocol": "TCP"
                            }
                        ],
                        "resources": {
                            "limits": {
                                "cpu": "100m"
                            }
                        },
                        "terminationMessagePath": "/dev/termination-log",
                        "imagePullPolicy": "IfNotPresent",
                        "securityContext": {
                            "capabilities": {}
                        }
                    },
                    {
                        "name": "php-redis",
                        "image": "kubernetes/example-guestbook-php-redis",
                        "ports": [
                            {
                                "hostPort": 8000,
                                "containerPort": 80,
                                "protocol": "TCP"
                            }
                        ],
                        "resources": {
                            "limits": {
                                "cpu": "100m",
                                "memory": "50000000"
                            }
                        },
                        "terminationMessagePath": "/dev/termination-log",
                        "imagePullPolicy": "IfNotPresent",
                        "securityContext": {
                            "capabilities": {}
                        }
                    }
                ],
                "restartPolicy": {
                    "always": {}
                },
                "dnsPolicy": "ClusterFirst"
            },
            "status": {
                "phase": "Pending"
            }
        }
    ]
}

parsing code:

streamer.get(key: 'items') do |object|
  p object
end

result:

{"metadata"=>{"name"=>"redis-master3", "namespace"=>"default", "selfLink"=>"/api/v1/pods/redis-master3?namespace=default", "uid"=>"1da148b4-cef5-11e4-ac24-3c970e4a436a", "resourceVersion"=>"1301", "creationTimestamp"=>"2015-03-20T13:34:48+02:00", "labels"=>{"mylabel"=>"mylabelvalue", "role"=>"pod"}}, "spec"=>{"volumes"=>nil, "containers"=>[{"name"=>"master", "image"=>"dockerfile/redis", "ports"=>[{"hostPort"=>6379, "containerPort"=>6379, "protocol"=>"TCP"}], "resources"=>{"limits"=>{"cpu"=>"100m"}}, "terminationMessagePath"=>"/dev/termination-log", "imagePullPolicy"=>"IfNotPresent", "securityContext"=>{"capabilities"=>{}}}, {"name"=>"php-redis", "image"=>"kubernetes/example-guestbook-php-redis", "ports"=>[{"hostPort"=>8000, "containerPort"=>80, "protocol"=>"TCP"}], "resources"=>{"limits"=>{"cpu"=>"100m", "memory"=>"50000000"}}, "terminationMessagePath"=>"/dev/termination-log", "imagePullPolicy"=>"IfNotPresent", "securityContext"=>{"capabilities"=>{}}}], "restartPolicy"=>{"always"=>{}}, "dnsPolicy"=>"ClusterFirst"}, "status"=>{"phase"=>"Pending"}}
[]
@thisismydesign

This comment has been minimized.

Copy link
Owner

commented Sep 15, 2017

Issue is caused by a bug that values within an array were handled as if they have keys (namely the previous key in the JSON object) while they should not have keys at all.

@thisismydesign

This comment has been minimized.

Copy link
Owner

commented Sep 15, 2017

Issue is fixed in v1.1.1. Please verify and close this issue if you're satisfied.

@ilackarms

This comment has been minimized.

Copy link
Author

commented Sep 15, 2017

now I'm only getting the first object in the array

@thisismydesign

This comment has been minimized.

Copy link
Owner

commented Sep 15, 2017

What do you mean?

@thisismydesign

This comment has been minimized.

Copy link
Owner

commented Sep 15, 2017

I see what you mean, hang on..

@ilackarms

This comment has been minimized.

Copy link
Author

commented Sep 15, 2017

input:

{
    "kind": "ServiceList",
    "apiVersion": "v1",
    "metadata": {
        "selfLink": "/api/v1/services",
        "resourceVersion": "59"
    },
    "items": [
        {
            "metadata": {
                "name": "kubernetes",
                "namespace": "default",
                "selfLink": "/api/v1/services/kubernetes?namespace=default",
                "uid": "016e9dcd-ce39-11e4-ac24-3c970e4a436a",
                "resourceVersion": "6",
                "creationTimestamp": "2015-03-19T15:08:16+02:00",
                "labels": {
                    "component": "apiserver",
                    "provider": "kubernetes"
                }
            },
            "spec": {
                "port": 443,
                "protocol": "TCP",
                "selector": null,
                "clusterIP": "10.0.0.2",
                "containerPort": 0,
                "sessionAffinity": "None"
            },
            "status": {}
        },
        {
            "metadata": {
                "name": "kubernetes-ro",
                "namespace": "default",
                "selfLink": "/api/v1/services/kubernetes-ro?namespace=default",
                "uid": "015b78bf-ce39-11e4-ac24-3c970e4a436a",
                "resourceVersion": "5",
                "creationTimestamp": "2015-03-19T15:08:15+02:00",
                "labels": {
                    "component": "apiserver",
                    "provider": "kubernetes"
                }
            },
            "spec": {
                "port": 80,
                "protocol": "TCP",
                "selector": null,
                "clusterIP": "10.0.0.1",
                "containerPort": 0,
                "sessionAffinity": "None"
            },
            "status": {}
        }
    ]
}

output:

{"metadata"=>{"name"=>"kubernetes", "namespace"=>"default", "selfLink"=>"/api/v1/services/kubernetes?namespace=default", "uid"=>"016e9dcd-ce39-11e4-ac24-3c970e4a436a", "resourceVersion"=>"6", "creationTimestamp"=>"2015-03-19T15:08:16+02:00", "labels"=>{"component"=>"apiserver", "provider"=>"kubernetes"}}, "spec"=>{"port"=>443, "protocol"=>"TCP", "selector"=>nil, "clusterIP"=>"10.0.0.2", "containerPort"=>0, "sessionAffinity"=>"None"}, "status"=>{}}

only the first object is yielded

@ilackarms

This comment has been minimized.

Copy link
Author

commented Sep 15, 2017

i am also experiencing another new bug: certain keys are rendered as nil

input:

{
    "kind": "PodList",
    "apiVersion": "v1",
    "metadata": {
        "selfLink": "/api/v1/pods",
        "resourceVersion": "1315"
    },
    "items": [
        {
            "metadata": {
                "name": "redis-master3",
                "namespace": "default",
                "selfLink": "/api/v1/pods/redis-master3?namespace=default",
                "uid": "1da148b4-cef5-11e4-ac24-3c970e4a436a",
                "resourceVersion": "1301",
                "creationTimestamp": "2015-03-20T13:34:48+02:00",
                "labels": {
                    "mylabel": "mylabelvalue",
                    "role": "pod"
                }
            },
            "spec": {
                "volumes": null,
                "containers": [
                    {
                        "name": "master",
                        "image": "dockerfile/redis",
                        "ports": [
                            {
                                "hostPort": 6379,
                                "containerPort": 6379,
                                "protocol": "TCP"
                            }
                        ],
                        "resources": {
                            "limits": {
                                "cpu": "100m"
                            }
                        },
                        "terminationMessagePath": "/dev/termination-log",
                        "imagePullPolicy": "IfNotPresent",
                        "securityContext": {
                            "capabilities": {}
                        }
                    },
                    {
                        "name": "php-redis",
                        "image": "kubernetes/example-guestbook-php-redis",
                        "ports": [
                            {
                                "hostPort": 8000,
                                "containerPort": 80,
                                "protocol": "TCP"
                            }
                        ],
                        "resources": {
                            "limits": {
                                "cpu": "100m",
                                "memory": "50000000"
                            }
                        },
                        "terminationMessagePath": "/dev/termination-log",
                        "imagePullPolicy": "IfNotPresent",
                        "securityContext": {
                            "capabilities": {}
                        }
                    }
                ],
                "restartPolicy": {
                    "always": {}
                },
                "dnsPolicy": "ClusterFirst"
            },
            "status": {
                "phase": "Pending"
            }
        }
    ]
}

output:

{"metadata"=>{"name"=>"redis-master3", "namespace"=>"default", "selfLink"=>"/api/v1/pods/redis-master3?namespace=default", "uid"=>"1da148b4-cef5-11e4-ac24-3c970e4a436a", "resourceVersion"=>"1301", "creationTimestamp"=>"2015-03-20T13:34:48+02:00", "labels"=>{"mylabel"=>"mylabelvalue", "role"=>"pod"}}, "spec"=>{"volumes"=>nil, "labels"=>[{"name"=>"master", "image"=>"dockerfile/redis", nil=>[{"hostPort"=>6379, "containerPort"=>6379, "protocol"=>"TCP"}], "resources"=>{"limits"=>{"cpu"=>"100m"}}, "terminationMessagePath"=>"/dev/termination-log", "imagePullPolicy"=>"IfNotPresent", "securityContext"=>{"capabilities"=>{}}}, {"name"=>"php-redis", "image"=>"kubernetes/example-guestbook-php-redis", "securityContext"=>{"capabilities"=>{}}, "resources"=>{"limits"=>{"cpu"=>"100m", "memory"=>"50000000"}}, "terminationMessagePath"=>"/dev/termination-log", "imagePullPolicy"=>"IfNotPresent"}], "restartPolicy"=>{"always"=>{}}, "dnsPolicy"=>"ClusterFirst"}, "status"=>{"phase"=>"Pending"}}

(notice the line nil=>[{"hostPort"=>6379, "containerPort"=>6379, "protocol"=>"TCP"}]) parsed from the "ports" key above

thisismydesign added a commit that referenced this issue Sep 15, 2017
@thisismydesign

This comment has been minimized.

Copy link
Owner

commented Sep 15, 2017

Turns out the cause was identified correctly (values within an array were handled as if they have keys) but I made wrong assumptions regarding fixing it. v1.1.2 should be fine, I also added more tests covering handling of arrays. Please verify again.

@ilackarms

This comment has been minimized.

Copy link
Author

commented Sep 18, 2017

now i'm getting the whole object back as a single array. what I want to do is yield each object within the array one-by-one. is this possible with json-streamer? I've tried playing with combinations of key: 'items' and nesting_level: X`, but so far nothing has worked.

@thisismydesign

This comment has been minimized.

Copy link
Owner

commented Sep 18, 2017

Since the items key points to an array it will return an array, there's no way around that using the key matcher.

However..

Input: #7 (comment)
The following parameters {nesting_level: 2, yield_values: false}
Result:

{"metadata"=>{"name"=>"kubernetes", "namespace"=>"default", "selfLink"=>"/api/v1/services/kubernetes?namespace=default", "uid"=>"016e9dcd-ce39-11e4-ac24-3c970e4a436a", "resourceVersion"=>"6", "creationTimestamp"=>"2015-03-19T15:08:16+02:00", "labels"=>{"component"=>"apiserver", "provider"=>"kubernetes"}}, "spec"=>{"port"=>443, "protocol"=>"TCP", "selector"=>"null", "clusterIP"=>"10.0.0.2", "containerPort"=>0, "sessionAffinity"=>"None"}, "status"=>{}}

{"metadata"=>{"name"=>"kubernetes-ro", "namespace"=>"default", "selfLink"=>"/api/v1/services/kubernetes-ro?namespace=default", "uid"=>"015b78bf-ce39-11e4-ac24-3c970e4a436a", "resourceVersion"=>"5", "creationTimestamp"=>"2015-03-19T15:08:15+02:00", "labels"=>{"component"=>"apiserver", "provider"=>"kubernetes"}}, "spec"=>{"port"=>80, "protocol"=>"TCP", "selector"=>"null", "clusterIP"=>"10.0.0.1", "containerPort"=>0, "sessionAffinity"=>"None"}, "status"=>{}}

Is this what you're looking for?

Using v1.3.0 (latest).

@ilackarms

This comment has been minimized.

Copy link
Author

commented Sep 18, 2017

@thisismydesign I'm looking for the same output, but with the parameters {key: 'items', nesting_level: 2} however i notice that everything with nesting_level 2 gets printed. I'd only like to access the contents of the array "items". is that possible?

@ilackarms

This comment has been minimized.

Copy link
Author

commented Sep 18, 2017

e.g. with json body

{
    "kind": "ServiceList",
    "apiVersion": "v1",
    "metadata": {
        "selfLink": "/api/v1/services",
        "resourceVersion": "59"
    },
    "items1": [
        {
            "metadata": {
                "name": "kubernetes",
                "namespace": "default",
                "selfLink": "/api/v1/services/kubernetes?namespace=default",
                "uid": "016e9dcd-ce39-11e4-ac24-3c970e4a436a",
                "resourceVersion": "6",
                "creationTimestamp": "2015-03-19T15:08:16+02:00",
                "labels": {
                    "component": "apiserver",
                    "provider": "kubernetes"
                }
            },
            "spec": {
                "port": 443,
                "protocol": "TCP",
                "selector": "null",
                "clusterIP": "10.0.0.2",
                "containerPort": 0,
                "sessionAffinity": "None"
            },
            "status": {}
        },
        {
            "metadata": {
                "name": "kubernetes-ro",
                "namespace": "default",
                "selfLink": "/api/v1/services/kubernetes-ro?namespace=default",
                "uid": "015b78bf-ce39-11e4-ac24-3c970e4a436a",
                "resourceVersion": "5",
                "creationTimestamp": "2015-03-19T15:08:15+02:00",
                "labels": {
                    "component": "apiserver",
                    "provider": "kubernetes"
                }
            },
            "spec": {
                "port": 80,
                "protocol": "TCP",
                "selector": "null",
                "clusterIP": "10.0.0.1",
                "containerPort": 0,
                "sessionAffinity": "None"
            },
            "status": {}
        }
    ],
    "items2": [
        {
            "metadata": {
                "name": "kubernetes",
                "namespace": "default",
                "selfLink": "/api/v1/services/kubernetes?namespace=default",
                "uid": "016e9dcd-ce39-11e4-ac24-3c970e4a436a",
                "resourceVersion": "6",
                "creationTimestamp": "2015-03-19T15:08:16+02:00",
                "labels": {
                    "component": "apiserver",
                    "provider": "kubernetes"
                }
            },
            "spec": {
                "port": 443,
                "protocol": "TCP",
                "selector": "null",
                "clusterIP": "10.0.0.2",
                "containerPort": 0,
                "sessionAffinity": "None"
            },
            "status": {}
        },
        {
            "metadata": {
                "name": "kubernetes-ro",
                "namespace": "default",
                "selfLink": "/api/v1/services/kubernetes-ro?namespace=default",
                "uid": "015b78bf-ce39-11e4-ac24-3c970e4a436a",
                "resourceVersion": "5",
                "creationTimestamp": "2015-03-19T15:08:15+02:00",
                "labels": {
                    "component": "apiserver",
                    "provider": "kubernetes"
                }
            },
            "spec": {
                "port": 80,
                "protocol": "TCP",
                "selector": "null",
                "clusterIP": "10.0.0.1",
                "containerPort": 0,
                "sessionAffinity": "None"
            },
            "status": {}
        }
    ]
}

i'd like to yield the contents of items1, item-by-item

@thisismydesign

This comment has been minimized.

Copy link
Owner

commented Sep 18, 2017

Not possible unfortunately. The closest you can do is what you already did: get the items1 array and iterate over it. Is that acceptable for your use case?

I think this would be possible with JSONPath though. Like I mentioned in your PR I'll think about supporting it but not sure about the effort yet.

@ilackarms

This comment has been minimized.

Copy link
Author

commented Sep 18, 2017

that would defeat the purpose of using json-streamer; what we want to do is yield items one-by-one from a very large array rather than having to load the whole thing into memory

@thisismydesign

This comment has been minimized.

Copy link
Owner

commented Sep 18, 2017

Just wanted to point out that depending on the dispersion of data under separate itemN keys it may still be an improvement.

In any case: how important is this for you? Would you consider implementing your own solution or is it rather just nice to have?

@thisismydesign

This comment has been minimized.

Copy link
Owner

commented Sep 18, 2017

Actually.. since the aggregator is exposed you can technically do this:

nesting_level = 2
key = 'items1'
streamer.get(nesting_level: nesting_level, yield_values: false) do |object|
  if streamer.aggregator[nesting_level-2]&.dig(:key) == key
    p "ensured that #{object} is within #{key}"
  end
end

Probably easier than reinventing the wheel, at least for now.

I plan to keep aggregator exposed for exactly such cases.

@thisismydesign

This comment has been minimized.

Copy link
Owner

commented Sep 19, 2017

I was planning to do some refactoring and now was a great time to do it as it solves your issue.

In v2.0.0 I created abstraction layers for the callback handler and conditions. This will allow

  • the possibility to substitute json-stream with any other parser providing SAX-like events
  • custom conditions that can handle virtually any scenario

For your use case:

conditions = Json::Streamer::Conditions.new
conditions.yield_object = lambda do |aggregator:, object:|
  aggregator.level.eql?(2) && aggregator.key_for_level(1).eql?('items1')
end

streamer.get_with_conditions(conditions) do |object|
  p object
end

See also the new section of the README and this test case.

@ilackarms

This comment has been minimized.

Copy link
Author

commented Sep 19, 2017

cool! will try it out now

@thisismydesign

This comment has been minimized.

Copy link
Owner

commented Sep 21, 2017

@ilackarms any update?

@ilackarms

This comment has been minimized.

Copy link
Author

commented Sep 25, 2017

perfect! works well. thank you so much for the support

@ilackarms ilackarms closed this Sep 25, 2017
@thisismydesign

This comment has been minimized.

Copy link
Owner

commented Sep 26, 2017

Glad to hear that it's working. My pleasure. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.