Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better key matching? #16

Closed
simlu opened this issue Apr 26, 2019 · 28 comments
Closed

Better key matching? #16

simlu opened this issue Apr 26, 2019 · 28 comments
Labels
⭐⭐⭐ SSS ⭐⭐⭐ Super Star Support Plan - put a star on project to have such an awesome label on your issue. wontfix This will not be worked on

Comments

@simlu
Copy link

simlu commented Apr 26, 2019

I was thinking about writing a library like this myself :) I'm the author of object-scan and was thinking it might be a great fit to improve key matching and efficiency of this library (I'm not a big fan of regex matching).

Take a look and let me know what you think! It would certainly be more in line with the existing lodash selector syntax.

Would be very happy to help with any questions! Cheers, L~

@YuriGor
Copy link
Owner

YuriGor commented Apr 26, 2019

Hi @simlu, welcome to my tree :)

Yes, I have such thoughts, but I still had no chance to discuss them, so sorry if I will overload you with ideas below, you inspired this)

It would be nice to have some wildcard match method and let user use masks as some shorthand instead of iteratee/predicate.

Right now I use my own in the omitDeep and pickDeep methods. It currently works as endsWith and/or regexp,

So instead of endsWith criteria it may accept wildcard mask.

As I see object-scan has no dependencies,
so do you use your own wildcard mask matching implementation?

Can you expose it (or better publish as a esm package) separately?

Or I'd better steal a piece of code?
I will need something like isMatch(path, mask): boolean - to check given path
and possibly getMatch(path, mask): array - to get a list of matching substrings.
Do you use regexps internally?

It would certainly be more in line with the existing lodash selector syntax.

I am not sure what are you talking about, does Lodash have some selectors support?

All I saw somehow related to Lodash selectors is separate lib spotlight also by @jdalton.

Did you try it? Looks promising, but I had no time for testing.

Combining spotlight and, suddenly, CSS selector syntax may unleash a power I afraid even think about:

  • datatypes ==> tag names
  • field names | indexes ==> classes
  • value matching ==> attribute selectors
  • same hierarchy selectors
    • with any nested child by space,
    • immediate child by greaterthan,
    • immediate sibling by plus,
    • nth child and nth child of type

BTW I used to implement relative JS object paths using colon : symbol to refer one level up (like .. in filesystem relative paths ) and slash / to refer a root, so any other paths treated as starting from the current level of iteration/declaration.

All the iteratees/predicates in deepdash already receive a context object as an argument, with full parents stack, so relative paths will also be easy to apply.

Sounds like a great separate project) Maybe it already exists somewhere in the npm)

@simlu
Copy link
Author

simlu commented Apr 26, 2019

There is a lot in here. Let me try to reply to it in order :)

The reason why I dislike regex matching against path is that this requires the whole object to be traversed. Using glob-style syntax (which is what object-scan aims to support as per blackflux/object-scan#279), you don't need to traverse the entire object, which can result in massive performance improvements. It further is much more intuitive and accessible than using regex. I would encourage you not to support regex matching for entire paths and go with glob matching instead. Is there a good reason that you need to support it? Are there any use cases you can think of that are not supported?

Yes, object-scan uses it's own wildcard matching. The reason is that it is smart about what parts of the input it needs to traverse. Globbing against all paths would be possible, but very bad performance wise. Wildcard/glob style matching for path segments is done here. So yes, regex are used internally, but only for path segments(!)

Regarding spotlight: The functionality would be very easy to recreate with object-scan. The idea of object-scan is that it is very generic and very powerful, so that it can support use-cases like deepdash easily. All your use cases and what you are thinking of is supporting becomes very easy with object-scan.

Take a good look at how object-scan operates and the more complex test examples that use callbacks. I think it would simplify and generify your library significantly. I've scanned over all your functions and most of them would be less than 15 lines to implement with object scan. I might create some examples tonight if you're interested!

Unrelated little note on pathToString: I'm pretty sure the escape convention is to use backslash. Not quoting. Trying to be lodash compatible that is.

@YuriGor
Copy link
Owner

YuriGor commented Apr 26, 2019

About pathToString, it actually uses back slashes
If you see some bug, please report

About regexp: I already dropped it in the 'tree' iteration mode so for now only constant "children" paths are supported - it allowed me to not iterate over all the possible ways just for testing them with regex if it's a children collection or not, so I know that feel, bro.

I agree - using some glob as a pre-filter before actual iteration can add a lot of benefits,
if you made it work per level, not by full paths matching.
I see it more as a separate method, so it can be used in the chain/flow:

_(obj).glob('['a.*.f']').filterDeep((v,k,c,p)=>{ /* do some extra filterig */ }).eachDeep((v,k,p,c)=>{/* do final job */});

I'll think about it.
If you have some time for wasting such way - I'll be happy to see your examples 👍

@simlu
Copy link
Author

simlu commented Apr 27, 2019

Oh wow, I had no idea "." is the same as ["\\""].["\\""] using { '"': { '"': 'a' } }.

That's up to you. I like doing a single iteration, but chaining is certainly also possible. The idea is that object-scan does a single pass: On the "top-down" part of the iteration it allows for "breaking" the traversal and on the "bottom-up" part the results can be filtered. But you can call it multiple times. It's extremely versatile and powerful.

Here is a basic example of how you'd implement pickDeep:

const set = require('lodash.set');
const objectScan = require('object-scan');

const pickDeep = (haystack, needles) => {
  const result = {};
  objectScan(needles, {
    filterFn: (key, value) => set(result, key, value)
  })(haystack);
  return result;
}


const input = {
  a: { b: 'c', d: 'e' },
  f: { g: 'h' }
};

console.log(pickDeep(input, ['a.b']));

Demo: https://frontarm.com/demoboard/?id=1c33ad8a-36e1-4b54-adf0-1fcea40acbca

You do have a lot of options, but as far as I can tell, they're all easily supported. I'm really just trying to demonstrate the power and ease of object-scan here. If you have questions on how you'd achieve a specific use case I'd be very happy to give input.

@YuriGor
Copy link
Owner

YuriGor commented Apr 27, 2019

Your example works more like _.get implementation, not _.pickDeep
Take a look at this fork:
https://frontarm.com/demoboard/?id=d90dc3cc-22a1-4399-8da5-567e629dc025
to compare with deepdash alongside.
I changed your wildcard mask to make it behave like pickDeep (['a.b','*.a.b']),
but it still works differently
To make it work the same I will need double star ** glob pattern, but it looks like doesn't work in this case.
Maybe I don't know the correct syntax yet, could you please check.

@YuriGor YuriGor added the wontfix This will not be worked on label Apr 27, 2019
@YuriGor
Copy link
Owner

YuriGor commented Apr 27, 2019

Finally, I think I will not add wildcard support:

  • the best place for glob is pickDeep/omitDeep, but the main idea of these methods is to extract a subset from some regular data (tree-like collection of objects). Same as its flat analog, but instead of absolute keys/paths, it expects relative ones. So I think endsWith is still the best way for default simplest string criteria handling.
  • I have no resources to reimplement level-by-level wildcard pattern matching for now, maybe later as a separate method, but it already exists as a separate package, so why not to use it explicitly, if someone needs it?
  • And just replacing regexp with glob patterns and still use regexp internally to match full path doesn't make sense. Regexp is a native part of Javascript, it's a very powerful tool and I have nothing against letting use it for matching paths. If someone prefers wildcards - minimatch package is a way to go. No need to include it as a dependency.

So to recap - I think using endsWith for simple cases + Regexp for complicated ones will be enough to cover all the possible needs with minimal overhead.
Wildcard matching is a specific search case, and I personally never needed it in my wide practice of processing huge nested data, so let it be out of the scope of this library.

@simlu
Copy link
Author

simlu commented Apr 27, 2019

As to your test case. I've adjusted it here https://frontarm.com/demoboard/?id=9a39401c-5b82-4669-9e10-d5d3b1a508e9

@simlu
Copy link
Author

simlu commented Apr 27, 2019

Finally, I think I will not add wildcard support:

  • the best place for glob is pickDeep/omitDeep, but the main idea of these methods is to extract a subset from some regular data (tree-like collection of objects). Same as its flat analog, but instead of absolute keys/paths, it expects relative ones. So I think endsWith is still the best way for default simplest string criteria handling.

Right. And I'm saying object scan would work great for that (see example above). I also believe that all your methods would be easily implemented using object scan. Which one do you think is not a good fit and I'll take a stab at it ;)

  • I have no resources to reimplement level-by-level wildcard pattern matching for now, maybe later as a separate method, but it already exists as a separate package, so why not to use it explicitly, if someone needs it?

That's exactly my point. You don't need to. Just use object scan.

  • And just replacing regexp with glob patterns and still use regexp internally to match full path doesn't make sense. Regexp is a native part of Javascript, it's a very powerful tool and I have nothing against letting use it for matching paths. If someone prefers wildcards - minimatch package is a way to go. No need to include it as a dependency.

Not sure I follow on this one. How would you use minimatch with this library?

So to recap - I think using endsWith for simple cases + Regexp for complicated ones will be enough to cover all the possible needs with minimal overhead.
Wildcard matching is a specific search case, and I personally never needed it in my wide practice of processing huge nested data, so let it be out of the scope of this library.

That's like saying "I never needed react for front end dev, good old vanilla js always worked fine for me" :) I'm a huge fan of regex and it does have its place. It's also very powerful and complicated and a lot of users don't understand it. That's why no one uses it for path matching. Glob is the golden standard for path matching. Regex is not.

Having said all that, it's your library and there is a lot of work put into it (never be scared to "kill your darlings" though). Are there other libraries like yours that use regex for path matching like you do? I've never seen it before.

@YuriGor
Copy link
Owner

YuriGor commented Apr 27, 2019

Deepdash actually doesn't offer a path matching.
pathMatches method is not documented, it's exposed as API for historical reasons and doesn't provide any real value for end-user.

The core method is forEachDeep - it tries to conform his flat older brother Array.forEach but in the context of nested objects and the most often specific case tree-like structures with predefined "children" collection field name.
Another big stone - filterDeep - it also tries to mimic native filter by the same way.
All the rests are related utilities or kind of shorthands based on these two.

So regexp here is a cheap backdoor for complex stuff. It may help someone in some cases, others will ignore it.

Your final pickDeep implementation does not look as sexy as "get me some data by this pattern" anymore.
Also, I see no performance advantage with your implementation
https://frontarm.com/demoboard/?id=64bb97bd-f3e7-4c8a-8e1b-72927e98bf59

Maybe it will be more obvious with another use case, but I don't expect a huge difference in the most real-world examples.

@simlu
Copy link
Author

simlu commented Apr 27, 2019

Deepdash actually doesn't offer a path matching.
pathMatches method is not documented, it's exposed as API for historical reasons and doesn't provide any real value for end-user.

The core method is forEachDeep - it tries to conform his flat older brother Array.forEach but in the context of nested objects and the most often specific case tree-like structures with predefined "children" collection field name.
Another big stone - filterDeep - it also tries to mimic native filter by the same way.
All the rests are related utilities or kind of shorthands based on these two.

I see, so it would really be a matter of supporting those two. Seems easy enough. I might give an example later.

So regexp here is a cheap backdoor for complex stuff. It may help someone in some cases, others will ignore it.

It sounds like regex is here just because that's how you implemented it. Not really because it's helping with any major use case. Good argument to get rid of it :)

Your final pickDeep implementation does not look as sexy as "get me some data by this pattern" anymore.
Also, I see no performance advantage with your implementation
https://frontarm.com/demoboard/?id=64bb97bd-f3e7-4c8a-8e1b-72927e98bf59

Sorry, I just hacked it in. This is sexy again :)
https://frontarm.com/demoboard/?id=5950523e-2567-4bc1-88bb-f0b285d24dda

I see a performance improvement of ~40 percent? I consider that significant. This is a weird case because it requires a full tree traversal since you are looking for "endsWith". In other cases the difference will be much bigger.

Maybe it will be more obvious with another use case, but I don't expect a huge difference in the most real-world examples.

I'm seeing a significant improvement. Can you please double check?

@simlu
Copy link
Author

simlu commented Apr 27, 2019

Just playing around with forEachDeep. The signature for objectScan filterFn can be found here. It's very similar to your iterator signature, hence this should be an easy transition as far as I can tell (minus the regex part).

Here is what I did. It's only a PoC: https://frontarm.com/demoboard/?id=62019249-4e0b-4207-b0c5-1bc2ef7049fe

I don't think the goal would be to translate the signatures, but rather to see if the use cases are solvable using objectScan and filterFn or breakFn directly.


tl;dr Taking a step back. You've written an extension to lodash around your own forEachDeep. I've basically written forEachDeep as a separate package (single purpose). Conceptually it would make sense for you to use it. It would be a good amount work, but ultimately the separation would make things a lot cleaner.

There are a lot of reasons why you might decide against doing it: E.g. too much work, don't trust my code, etc. But I think the regex reason is a very weak one.

Anyways, I've said everything I wanted to say here I think. I'd be happy to help with any questions though.

@YuriGor
Copy link
Owner

YuriGor commented Apr 27, 2019

Maybe performance was so for hacked version only, but I saw my code was slightly faster.

I am quite busy right now, Ill take a look in few days, and will start our horses again :)

Thank you for your attention, I really appreciate it, you already made me to look at the problem from the different point.
It's not so easy to find motivated and skilled reviewer, who is ready to spend a time for this.
So be ready I'll steal some ideas from your code in the future ;)

Definitely, some wild regexp bitten you, when you was a kid: there is no regexp in the forEachDeep nor in the filterDeep, it used in the pick\omitDeep only

@YuriGor
Copy link
Owner

YuriGor commented Apr 27, 2019

In your most recent fiddle I see the same picture:

  • a very first run shows your method is 40% faster
  • but subsequent page reloads show mine is 14% ahead

Comparing deepDash.pickDeep with objectScan.pickDeep >>>>> ok <<<<<
VM40 8058525ee5952d818c1c0e294c9d4365.js:1 Method objectScan.pickDeep (827.23ms) is 40.66% faster then deepDash.pickDeep (1393.93ms)
VM290 8058525ee5952d818c1c0e294c9d4365.js:1 Comparing deepDash.pickDeep with objectScan.pickDeep >>>>> ok <<<<<
VM290 8058525ee5952d818c1c0e294c9d4365.js:1 Method deepDash.pickDeep (1404.07ms) is 15.6% faster then objectScan.pickDeep (1663.58ms)
VM538 8058525ee5952d818c1c0e294c9d4365.js:1 Comparing deepDash.pickDeep with objectScan.pickDeep >>>>> ok <<<<<
VM538 8058525ee5952d818c1c0e294c9d4365.js:1 Method deepDash.pickDeep (1406.29ms) is 14.94% faster then objectScan.pickDeep (1653.37ms)
8058525ee5952d818c1c0e294c9d4365.js:1 Comparing deepDash.pickDeep with objectScan.pickDeep >>>>> ok <<<<<
8058525ee5952d818c1c0e294c9d4365.js:1 Method deepDash.pickDeep (1443.75ms) is 14.55% faster then objectScan.pickDeep (1689.64ms)

I don't know what is it, some cache/init/whatever. I've never focused on performance in the Deepdash before.

@YuriGor
Copy link
Owner

YuriGor commented Apr 27, 2019

I tested the same script in the local nodejs at my laptop and got stable result:
your method is always 20% faster.

const set = require('lodash/set');
const times = require('lodash/times');
const isEqual = require('lodash/isEqual');
const minBy = require('lodash/minBy');
const objectScan = require('object-scan');
const pickDeepdash = require('deepdash/pickDeep');
const { performance } = require('perf_hooks');

const pickDeep = (haystack, needles) => {
  const result = {};
  objectScan(needles, {
    filterFn: (key, value) => set(result, key, value),
  })(haystack);
  return result;
};

const input = {
  a: { b: 'c', d: 'e', i: { a: { b: 'c again' } } },
  f: { g: 'h', a: { b: 'c', d: 'e', i: { a: { b: 'c again' } } } },
};

const fn = Object.entries({
  'deepDash.pickDeep': () => pickDeepdash(input, ['a.b']),
  'objectScan.pickDeep': () => pickDeep(input, ['**.a.b']),
});

// --------------------
const round = (v) => Math.round(v * 100) / 100;

for (let idx1 = 0; idx1 < fn.length; idx1 += 1) {
  for (let idx2 = idx1 + 1; idx2 < fn.length; idx2 += 1) {
    console.log(
      'Comparing',
      fn[idx1][0],
      'with',
      fn[idx2][0],
      '>>>>>',
      isEqual(fn[idx1][1](), fn[idx2][1]()) ? 'ok' : 'ERROR',
      '<<<<<'
    );
  }
}

for (let idx = 0; idx < fn.length; idx += 1) {
  let start = performance.now();
  times(100000, fn[idx][1]);
  fn[idx].push(performance.now() - start);
}

for (let idx1 = 0; idx1 < fn.length; idx1 += 1) {
  for (let idx2 = idx1 + 1; idx2 < fn.length; idx2 += 1) {
    const input = [fn[idx1], fn[idx2]];
    const first = minBy(input, (e) => e[2]);
    const second = minBy(input, (e) => -e[2]);
    console.log(
      `Method ${first[0]} (${round(first[2])}ms) is ${round(
        (100 * (second[2] - first[2])) / second[2]
      )}% faster then ${second[0]}  (${round(second[2])}ms)`
    );
  }
}

So never trust frontarm to measure performance anymore.

@simlu
Copy link
Author

simlu commented Apr 27, 2019

Makes sense. The browser probably has all kinds of freezing protections built in.

I'd be interesting to see performance for traversal where only specific keys are of interest. That one could be big

@YuriGor
Copy link
Owner

YuriGor commented Apr 27, 2019

Example of 'specific key' in terms of a wildcard mask?
If you mean full paths starting from the root, then you need to compete with Lodash _.pick since it supports fixed deep paths.

if you mean any path ending with a single field, then

deepdash.pickDeep(input,['b']) vs objectScan.pickDeep(input,['**.b'])
 --> objectScan ~35% faster

When picking a lot of fields, performance tends to be almost equal:

deepdash.pickDeep(input,['b','d','g']) vs objectScan.pickDeep(input,['**.{b,d,g}'])
 --> deepdash ~5% faster

So yes, as expected, iterating over each possible path and check if we need this path or not is less productive.

I think I need to stop visiting each node and testing if current paths matched given criteria,
but iterate over objects/arrays only and test if they _.has something by the path we need.
As a better candidate filter, I can guess who will be next in the path chain, an array or an object, and skip some extra useless probing.

@simlu
Copy link
Author

simlu commented Apr 27, 2019

Performance aside, isn't your pickDeep very restrictive? Is there a reason you are only focusing on path endings?

Given the following input:

Show
[
  {
    "_id": "5cc4bc4b015742a0d80c8b1a",
    "name": "Dalton Hull",
    "friends": [
      {
        "id": 0,
        "name": "Hillary Perry"
      },
      {
        "id": 1,
        "name": "Cote Kim"
      },
      {
        "id": 2,
        "name": "Dolly Curtis"
      }
    ],
    "children": [
      {
        "id": 0,
        "name": "Nicole Mcdaniel"
      }
    ],
    "parents": [
      {
        "id": 0,
        "name": "Francis Landry"
      },
      {
        "id": 1,
        "name": "Alexandra Byrd"
      }
    ]
  },
  {
    "_id": "5cc4bc4bfd19ec41acfe1a43",
    "name": "Dale Booth",
    "friends": [
      {
        "id": 0,
        "name": "Mcdowell Reyes"
      },
      {
        "id": 1,
        "name": "Moses Mcguire"
      },
      {
        "id": 2,
        "name": "Selena Phelps"
      }
    ],
    "children": [
      {
        "id": 0,
        "name": "Golden Vasquez"
      }
    ],
    "parents": [
      {
        "id": 0,
        "name": "Peck Herman"
      },
      {
        "id": 1,
        "name": "Ada Crosby"
      }
    ]
  },
  {
    "_id": "5cc4bc4b598e629b835fc2ff",
    "name": "Cameron Stout",
    "friends": [
      {
        "id": 0,
        "name": "Burgess York"
      },
      {
        "id": 1,
        "name": "Martina Valentine"
      },
      {
        "id": 2,
        "name": "Anita Browning"
      }
    ],
    "children": [
      {
        "id": 0,
        "name": "Young Marshall"
      },
      {
        "id": 1,
        "name": "Bryant Bright"
      }
    ],
    "parents": [
      {
        "id": 0,
        "name": "Stewart Hays"
      },
      {
        "id": 1,
        "name": "Saunders Craig"
      }
    ]
  }
]

How would you find all top level names?

  • ["[*].name"] (very performant)

How would you find all the names that are not friends?

  • ["!**.friends.name", "**.name"] (works in arbitrary nesting location)

Or how would you find all the first friend names?

  • ["[*].friends[0].name"] (very performant)

This is where objectScan really shines since only the relevant branches of the input are traversed.

It is very easy to build on top of the abstraction that object-scan provides. That didn't just happen - that's due to lots and lots of iterations and refactoring, reevaluating, ripping out things and implementing others. It's now finally in a really great state where most use cases are just super easy to handle. ---- as you can probably tell I'm on love with the project. It feels very complete and I don't have a lot of those projects :)

But yeah, I'd love to hear the reasoning here. Is this something that lodash does somewhere?

@YuriGor
Copy link
Owner

YuriGor commented Apr 27, 2019

Yes, it's beautiful.

Here is how deepdash supposed to be used in such cases:

// How would you find all top-level names?
data.map(p => p.name); // It's not mine! I only published an ad!

// How would you find all the names that are not friends?
_.reduceDeep(data,(res,v)=>{
  res.push(v.name);
  return res;
},[],{childrenPath:['parents','children']});

// Or how would you find all the first friend names?
_.reduceDeep(data,(res,v,k,p,c)=>{
  if(c.childrenPath =='friends' && k==0) res.push(v.name);
  return res;
},[],{childrenPath:'friends'});
// we need c.childrenPath check 
// because it's undefined for top-level and we dont need top-level

It looks not so elegant, but it's a javascript.
No new syntax, familiar API, do everything you need.
That's why I love Lodash and why I try to follow the same philosophy in Deepdash.

(now I know why mapDeep should exist, to make examples above shorter a bit!)

@simlu
Copy link
Author

simlu commented Apr 27, 2019

The problem with your examples is that they are not very flexible. I'm dealing with dynamic data structures a lot where I don't know the exact format. Consider form data for example: https://github.com/blackflux/object-scan/blob/master/test/integration/form.json

When you have more complex data the "JS" approach you are suggesting gets very cumbersome and error prone. I'm speaking from experience.

No new syntax, familiar API, do everything you need.

objectScan uses glob. It's very established. This is not a new syntax. It was invented exactly for this very reason: To make it easier to filter object hierarchies (originally for file structures). I very much disagree with your reasoning of "need to use javascript".

@simlu
Copy link
Author

simlu commented Apr 27, 2019

Let me give you another real world use case. We are managing huge configuration files. These are build from templates to make them manageable. In arbitrary locations resources can be defined. The only rule is that they are nested inside a resource object.

some:
  path:
    resources:
      myResource:
        type: 'A'
      yourResource:
        type: 'B'

We now need to check that all resources of type B have backups configured. We do that by running

const badResources = objectScan(['**.resource.*'], {
  filterFn: (key, value) => value.type === 'B' && value.backup !== true
})(configuration);

Now with your approach you would have to target resources and then iterate over them. Not bad, but you still doing one additional iteration. The idea is to keep the callback function very simple. Moving complexity into the "pre-select" is a good thing here.

@YuriGor
Copy link
Owner

YuriGor commented Apr 28, 2019

Now with your approach you would have to target resources and then iterate over them.

Why?

_.eachDeep(files,(value,key,parentValue,ctx)=>{
  if(ctx.parent.key=='resources' && value.type=='B' && value.backup !== true){
    // do your job
  }
},
{leavesOnly:false,onFalse:{skipChildren:false}})

or if you need the only tree with bad resources:

let badResources = _.filterDeep(data,(value,key,parentValue,ctx) => 
    ctx.parent.key=='resources'
    && value.backup !== true
    && value.type=='B',
{leavesOnly:false,onFalse:{skipChildren:false}});

Take a look here
it's a full JSON scheme for echarts.

They build their docs using this JSON.
And I build proprietary options editor and also extend echarts as customer needs.
So this JSON is an ocean where Deepdash was born and survived initially.
I need to normalize, fix(it's incorrect) update with new versions, split into smaller chunks, to make it a bit more loadable into the browser, build chart editor UI, generate compatible echarts options, hide some options, extend some options, decorate/alias/substitute some options, suggest autocompletion when possible, inject our own self-implemented stuff and so on.

Where was your lib 3-4 years ago, when I so needed it? :)

Moving complexity into the "pre-select" is a good thing here.

Yes, I agree, as I said before

using some glob as a pre-filter before actual iteration can add a lot of benefits.

I need to meditate on this.
I believe it should be a separate method with a kind of lazy callback evaluation support.
Lodash has such things, via _.chain and _.flow, but chain is not a very good option, because of it requires whole the lodash instance to be loaded, so it doesn't work with cherry-pick.
And I don't think it can be reused in our deep case.
I still don't know how it works in lodash. Iteration and callback invocation should be decoupled I suppose, then all the subsequent callbacks will be collected before actual iteration.
Right now I have two iteration modes:

  • 'objects' - field by field on each nested level
  • 'tree' - iteration over specific children collections only, set by the childrenPath option.
    I wanted to separate them initially, but they are still pretty same so it will be code duplication.
  • now you are hypnotizing me with yet another iterator 'glob'.

If I'll add it, 'tree' will be just a specific case of 'glob', and all the methods will sit on top of some abstract deep iterator and handle callbacks some way.
Main issues:

  • consistent parents stack will be lost, or maybe not.
  • circular dependency check will work not so clear without parents stack.
  • heavier implementation - can be pluggable.
  • a lot of re-work and tests rewriting

Finally, it will be beautiful to have well-integrated glob iterator as an option in the deepdash, but it's enough to use it as a standalone tool in case of need.

@simlu
Copy link
Author

simlu commented Apr 28, 2019

Now with your approach you would have to target resources and then iterate over them.

Why?

_.eachDeep(files,(value,key,parentValue,ctx)=>{
  if(ctx.parent.key=='resources' && value.type=='B' && value.backup !== true){
    // do your job
  }
},
{leavesOnly:false,onFalse:{skipChildren:false}})

or if you need the only tree with bad resources:

let badResources = _.filterDeep(data,(value,key,parentValue,ctx) => 
    ctx.parent.key=='resources'
    && value.backup !== true
    && value.type=='B',
{leavesOnly:false,onFalse:{skipChildren:false}});

Fair enough. Lot's of options to be passed in though. I'm assuming the onFalse:{skipChildren:false} allows further iteration even when false is returned from the iterator.

In objectScan there are two callback functions. One that allows you to break the iteration (breakFn) and one that allows you to skip a specific matched entry from the result, but doesn't break the iteration (filterFn). It feels cleaner and gives you more flexibility.

Is it possible to differentiate between arrays and objects when you are iterating?

Take a look here
it's a full JSON scheme for echarts.

That's a big json file. Great example for some performance testing :)

They build their docs using this JSON.
And I build proprietary options editor and also extend echarts as customer needs.
So this JSON is an ocean where Deepdash was born and survived initially.
I need to normalize, fix(it's incorrect) update with new versions, split into smaller chunks, to make it a bit more loadable into the browser, build chart editor UI, generate compatible echarts options, hide some options, extend some options, decorate/alias/substitute some options, suggest autocompletion when possible, inject our own self-implemented stuff and so on.

That sounds like a fun project. Would have been a good use for objectRewrite.

Where was your lib 3-4 years ago, when I so needed it? :)

Still swimming in that ancient old ocean 😆

Moving complexity into the "pre-select" is a good thing here.

Yes, I agree, as I said before

👍

using some glob as a pre-filter before actual iteration can add a lot of benefits.

I need to meditate on this.
I believe it should be a separate method with a kind of lazy callback evaluation support.
Lodash has such things, via _.chain and _.flow, but chain is not a very good option, because of it requires whole the lodash instance to be loaded, so it doesn't work with cherry-pick.

Honestly no idea how those work either. Never used them.

And I don't think it can be reused in our deep case.
I still don't know how it works in lodash. Iteration and callback invocation should be decoupled I suppose, then all the subsequent callbacks will be collected before actual iteration.

I guess from a performance perspective you don't want to decouple them though. Ideally the iterator is "smart" enough that you don't need to separate.

Right now I have two iteration modes:

  • 'objects' - field by field on each nested level
  • 'tree' - iteration over specific children collections only, set by the childrenPath option.
    I wanted to separate them initially, but they are still pretty same so it will be code duplication.
  • now you are hypnotizing me with yet another iterator 'glob'.

Haha, just throw them out and use objectScan for everything instead 😉

If I'll add it, 'tree' will be just a specific case of 'glob', and all the methods will sit on top of some abstract deep iterator and handle callbacks some way.
Main issues:

  • consistent parents stack will be lost, or maybe not.

Please take a look at the function signature that I linked to above. Parents are contained already. That's why I am saying our iterators are very similar. They just have different signatures. Or are you talking about something else?

  • circular dependency check will work not so clear without parents stack.

Not a problem since you have access to the parents.

  • heavier implementation - can be pluggable.

Not sure what you mean?

  • a lot of re-work and tests rewriting

Yeah, very much agreed. That's the main issue.

Finally, it will be beautiful to have well-integrated glob iterator as an option in the deepdash, but it's enough to use it as a standalone tool in case of need.

I would be an opportunity to clean up some of your options.

With objectScan I made a very conscious decision to remove options whenever possible and handle things the correct and good way by default. For example: You might want to iterate over an object and delete entries while you do so. I used to have a "sort" option with a second pass, but removed it. Now the iteration order allows you to do removal "on the fly" already. I honestly think that you have a lot of options and I'm sure some of them are not necessary.

But, having said all that it would be a big undertaking. I'd honestly love to see what cases are lurking in this library and how objectScan could support them cleanly! As I said, I always thought about writing a "deepdash" style library backed by objectScan before you published this one.

@YuriGor
Copy link
Owner

YuriGor commented Apr 28, 2019

I'm assuming the onFalse:{skipChildren:false} allows further iteration even when false is returned from the iterator.

Yes, filter deep expects three possible results from the predicate: true, undefined or false.

  • true == yes, grab this value
  • undefined == No, but let's see if need some children
  • false = No-no-no.
    But in general, true, undefined and false are just keys to choose one of predefined behavior described by three aspects:
{
  // should value be cloned deeply or primitives only should be copied(objects array will be empty).
  cloneDeep, 
  // do we need iterate over children?
  skipChildren,
  // should 'empty' node remain if no children passed the filter?
  keepIfEmpty
}

each default preset for true, undefined and false can be redefined in options.
also, a predicate can return such object directly

actually eachDeep has no such options, and filterDeep defaults just works, no need to overconfigurate here.
4 am is not best time for my brains after a loong day :)
So examples above better to write like this:

var files = {some: {path: {resources: {
      myResource: {type: 'A'},
      yourResource:{type: 'B'}
    } } } };

_.eachDeep(files,(value,key,parentValue,ctx)=>{
  if(ctx.parent.key=='resources' && value.type=='B' && value.backup !== true){
    console.log(ctx.path);
  }
});

let badResources = _.filterDeep(files,(value,key,parentValue,ctx) => 
{
  if(ctx.parent.key=='resources'  && value.backup !== true  && value.type=='B')
    return true;
},
{leavesOnly:false});
console.log(badResources);

codepen

Ok, looks like I have to think about all this once this again :)

@simlu
Copy link
Author

simlu commented Apr 28, 2019

Sweet man. Had a lot of fun chatting with ya :)

@YuriGor YuriGor added the ⭐⭐⭐ SSS ⭐⭐⭐ Super Star Support Plan - put a star on project to have such an awesome label on your issue. label Nov 18, 2019
@simlu
Copy link
Author

simlu commented May 15, 2020

I still think https://github.com/YuriGor/deepdash/blob/master/src/private/getIterate.js would benefit from using object-scan (perf wise and for readability). Have to better understand what that file does... Any chance we can refactor it a bit into separate functions to make it more readable?

@YuriGor
Copy link
Owner

YuriGor commented May 15, 2020

Hi! How are you doing? Is it going easy in Canada?

Yes, I have plans to refactor this:

  • I want to avoid recursion - do everything in the plain loop instead, by pushing and popping current context into some array(=stack) depending on the iteration depth.
  • also now there is an internal creation of function heavily coupled with surrounding closure - that's also subject of future refactoring.
  • And I agree it's not readable currently :)

But in general, it still will be one-by-one iteration over all the keys/indexes or children
(unless iteratee will tell to skip something explicitly) because I need to provide the solid stack of parents as a predicate's argument.

So the only way of significant performance improvement I see here is trying to split object tree into branches and iterate over them in parallel.
Last time I analyzed performance - I found bottleneck not in this file, but in pathToString converter method.

@simlu
Copy link
Author

simlu commented May 15, 2020

Hi! How are you doing? Is it going easy in Canada?

Hey! Doing good with the great outdoors nearby! Hope you're doing well yourself! I'll ping ya!

Yes, I have plans to refactor this:

  • I want to avoid recursion - do everything in the plain loop instead, by pushing and popping current context into some array(=stack) depending on the iteration depth.

I think that's a great idea. Still very happy with going iterative instead of recursive in object scan. It makes code a bit less readable, but significantly improved perf

  • also now there is an internal creation of function heavily coupled with surrounding closure - that's also subject of future refactoring.
  • And I agree it's not readable currently :)

Yeah I saw a few things that looked a bit ugly. Refactoring will take work... But will also be fun!

But in general, it still will be one-by-one iteration over all the keys/indexes or children
(unless iteratee will tell to skip something explicitly) because I need to provide the solid stack of parents as a predicate's argument.

So the only way of significant performance improvement I see here is trying to split object tree into branches and iterate over them in parallel.

Doing parallel traversal was a huge perf improvement in object scan and made many use cases possible

Last time I analyzed performance - I found bottleneck not in this file, but in pathToString converter method.

That's what I found too! That's why object scan now has the joined flag set to false by default. In most cases returning and using the array syntax (that lodash understands as well) is very usable.

Keep me posted when you get to some refactoring!

@YuriGor
Copy link
Owner

YuriGor commented May 15, 2020

Just created an issue for this: #55
Subscribe and put a thumb up!
Maybe I will find some time this weekend for this, but not sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⭐⭐⭐ SSS ⭐⭐⭐ Super Star Support Plan - put a star on project to have such an awesome label on your issue. wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants