-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better key matching? #16
Comments
Hi @simlu, welcome to my tree :) Yes, I have such thoughts, but I still had no chance to discuss them, so sorry if I will overload you with ideas below, you inspired this) It would be nice to have some wildcard match method and let user use masks as some shorthand instead of iteratee/predicate. Right now I use my own in the So instead of As I see Can you expose it (or better publish as a esm package) separately? Or I'd better steal a piece of code?
I am not sure what are you talking about, does Lodash have some selectors support? All I saw somehow related to Lodash selectors is separate lib spotlight also by @jdalton. Did you try it? Looks promising, but I had no time for testing. Combining spotlight and, suddenly, CSS selector syntax may unleash a power I afraid even think about:
BTW I used to implement relative JS object paths using colon All the iteratees/predicates in deepdash already receive a context object as an argument, with full parents stack, so relative paths will also be easy to apply. Sounds like a great separate project) Maybe it already exists somewhere in the npm) |
There is a lot in here. Let me try to reply to it in order :) The reason why I dislike regex matching against path is that this requires the whole object to be traversed. Using glob-style syntax (which is what object-scan aims to support as per blackflux/object-scan#279), you don't need to traverse the entire object, which can result in massive performance improvements. It further is much more intuitive and accessible than using regex. I would encourage you not to support regex matching for entire paths and go with glob matching instead. Is there a good reason that you need to support it? Are there any use cases you can think of that are not supported? Yes, object-scan uses it's own wildcard matching. The reason is that it is smart about what parts of the input it needs to traverse. Globbing against all paths would be possible, but very bad performance wise. Wildcard/glob style matching for path segments is done here. So yes, regex are used internally, but only for path segments(!) Regarding Take a good look at how object-scan operates and the more complex test examples that use callbacks. I think it would simplify and generify your library significantly. I've scanned over all your functions and most of them would be less than 15 lines to implement with object scan. I might create some examples tonight if you're interested! Unrelated little note on |
About About regexp: I already dropped it in the 'tree' iteration mode so for now only constant "children" paths are supported - it allowed me to not iterate over all the possible ways just for testing them with regex if it's a children collection or not, so I know that feel, bro. I agree - using some glob as a pre-filter before actual iteration can add a lot of benefits, _(obj).glob('['a.*.f']').filterDeep((v,k,c,p)=>{ /* do some extra filterig */ }).eachDeep((v,k,p,c)=>{/* do final job */}); I'll think about it. |
Oh wow, I had no idea That's up to you. I like doing a single iteration, but chaining is certainly also possible. The idea is that Here is a basic example of how you'd implement pickDeep: const set = require('lodash.set');
const objectScan = require('object-scan');
const pickDeep = (haystack, needles) => {
const result = {};
objectScan(needles, {
filterFn: (key, value) => set(result, key, value)
})(haystack);
return result;
}
const input = {
a: { b: 'c', d: 'e' },
f: { g: 'h' }
};
console.log(pickDeep(input, ['a.b'])); Demo: https://frontarm.com/demoboard/?id=1c33ad8a-36e1-4b54-adf0-1fcea40acbca You do have a lot of options, but as far as I can tell, they're all easily supported. I'm really just trying to demonstrate the power and ease of object-scan here. If you have questions on how you'd achieve a specific use case I'd be very happy to give input. |
Your example works more like |
Finally, I think I will not add wildcard support:
So to recap - I think using |
As to your test case. I've adjusted it here https://frontarm.com/demoboard/?id=9a39401c-5b82-4669-9e10-d5d3b1a508e9 |
Right. And I'm saying object scan would work great for that (see example above). I also believe that all your methods would be easily implemented using object scan. Which one do you think is not a good fit and I'll take a stab at it ;)
That's exactly my point. You don't need to. Just use object scan.
Not sure I follow on this one. How would you use minimatch with this library?
That's like saying "I never needed react for front end dev, good old vanilla js always worked fine for me" :) I'm a huge fan of regex and it does have its place. It's also very powerful and complicated and a lot of users don't understand it. That's why no one uses it for path matching. Glob is the golden standard for path matching. Regex is not. Having said all that, it's your library and there is a lot of work put into it (never be scared to "kill your darlings" though). Are there other libraries like yours that use regex for path matching like you do? I've never seen it before. |
Deepdash actually doesn't offer a path matching. The core method is So regexp here is a cheap backdoor for complex stuff. It may help someone in some cases, others will ignore it. Your final pickDeep implementation does not look as sexy as "get me some data by this pattern" anymore. Maybe it will be more obvious with another use case, but I don't expect a huge difference in the most real-world examples. |
I see, so it would really be a matter of supporting those two. Seems easy enough. I might give an example later.
It sounds like regex is here just because that's how you implemented it. Not really because it's helping with any major use case. Good argument to get rid of it :)
Sorry, I just hacked it in. This is sexy again :) I see a performance improvement of ~40 percent? I consider that significant. This is a weird case because it requires a full tree traversal since you are looking for "endsWith". In other cases the difference will be much bigger.
I'm seeing a significant improvement. Can you please double check? |
Just playing around with forEachDeep. The signature for objectScan Here is what I did. It's only a PoC: https://frontarm.com/demoboard/?id=62019249-4e0b-4207-b0c5-1bc2ef7049fe I don't think the goal would be to translate the signatures, but rather to see if the use cases are solvable using tl;dr Taking a step back. You've written an extension to lodash around your own There are a lot of reasons why you might decide against doing it: E.g. too much work, don't trust my code, etc. But I think the regex reason is a very weak one. Anyways, I've said everything I wanted to say here I think. I'd be happy to help with any questions though. |
Maybe performance was so for hacked version only, but I saw my code was slightly faster. I am quite busy right now, Ill take a look in few days, and will start our horses again :) Thank you for your attention, I really appreciate it, you already made me to look at the problem from the different point. Definitely, some wild regexp bitten you, when you was a kid: there is no regexp in the forEachDeep nor in the filterDeep, it used in the pick\omitDeep only |
In your most recent fiddle I see the same picture:
I don't know what is it, some cache/init/whatever. I've never focused on performance in the Deepdash before. |
I tested the same script in the local nodejs at my laptop and got stable result: const set = require('lodash/set');
const times = require('lodash/times');
const isEqual = require('lodash/isEqual');
const minBy = require('lodash/minBy');
const objectScan = require('object-scan');
const pickDeepdash = require('deepdash/pickDeep');
const { performance } = require('perf_hooks');
const pickDeep = (haystack, needles) => {
const result = {};
objectScan(needles, {
filterFn: (key, value) => set(result, key, value),
})(haystack);
return result;
};
const input = {
a: { b: 'c', d: 'e', i: { a: { b: 'c again' } } },
f: { g: 'h', a: { b: 'c', d: 'e', i: { a: { b: 'c again' } } } },
};
const fn = Object.entries({
'deepDash.pickDeep': () => pickDeepdash(input, ['a.b']),
'objectScan.pickDeep': () => pickDeep(input, ['**.a.b']),
});
// --------------------
const round = (v) => Math.round(v * 100) / 100;
for (let idx1 = 0; idx1 < fn.length; idx1 += 1) {
for (let idx2 = idx1 + 1; idx2 < fn.length; idx2 += 1) {
console.log(
'Comparing',
fn[idx1][0],
'with',
fn[idx2][0],
'>>>>>',
isEqual(fn[idx1][1](), fn[idx2][1]()) ? 'ok' : 'ERROR',
'<<<<<'
);
}
}
for (let idx = 0; idx < fn.length; idx += 1) {
let start = performance.now();
times(100000, fn[idx][1]);
fn[idx].push(performance.now() - start);
}
for (let idx1 = 0; idx1 < fn.length; idx1 += 1) {
for (let idx2 = idx1 + 1; idx2 < fn.length; idx2 += 1) {
const input = [fn[idx1], fn[idx2]];
const first = minBy(input, (e) => e[2]);
const second = minBy(input, (e) => -e[2]);
console.log(
`Method ${first[0]} (${round(first[2])}ms) is ${round(
(100 * (second[2] - first[2])) / second[2]
)}% faster then ${second[0]} (${round(second[2])}ms)`
);
}
} So never trust |
Makes sense. The browser probably has all kinds of freezing protections built in. I'd be interesting to see performance for traversal where only specific keys are of interest. That one could be big |
Example of 'specific key' in terms of a wildcard mask? if you mean any path ending with a single field, then
When picking a lot of fields, performance tends to be almost equal:
So yes, as expected, iterating over each possible path and check if we need this path or not is less productive. I think I need to stop visiting each node and testing if current paths matched given criteria, |
Performance aside, isn't your Given the following input: Show
How would you find all top level names?
How would you find all the names that are not friends?
Or how would you find all the first friend names?
This is where objectScan really shines since only the relevant branches of the input are traversed. It is very easy to build on top of the abstraction that object-scan provides. That didn't just happen - that's due to lots and lots of iterations and refactoring, reevaluating, ripping out things and implementing others. It's now finally in a really great state where most use cases are just super easy to handle. ---- as you can probably tell I'm on love with the project. It feels very complete and I don't have a lot of those projects :) But yeah, I'd love to hear the reasoning here. Is this something that lodash does somewhere? |
Yes, it's beautiful. Here is how deepdash supposed to be used in such cases: // How would you find all top-level names?
data.map(p => p.name); // It's not mine! I only published an ad!
// How would you find all the names that are not friends?
_.reduceDeep(data,(res,v)=>{
res.push(v.name);
return res;
},[],{childrenPath:['parents','children']});
// Or how would you find all the first friend names?
_.reduceDeep(data,(res,v,k,p,c)=>{
if(c.childrenPath =='friends' && k==0) res.push(v.name);
return res;
},[],{childrenPath:'friends'});
// we need c.childrenPath check
// because it's undefined for top-level and we dont need top-level It looks not so elegant, but it's a javascript. (now I know why |
The problem with your examples is that they are not very flexible. I'm dealing with dynamic data structures a lot where I don't know the exact format. Consider form data for example: https://github.com/blackflux/object-scan/blob/master/test/integration/form.json When you have more complex data the "JS" approach you are suggesting gets very cumbersome and error prone. I'm speaking from experience.
|
Let me give you another real world use case. We are managing huge configuration files. These are build from templates to make them manageable. In arbitrary locations resources can be defined. The only rule is that they are nested inside a resource object. some:
path:
resources:
myResource:
type: 'A'
yourResource:
type: 'B' We now need to check that all resources of type B have backups configured. We do that by running const badResources = objectScan(['**.resource.*'], {
filterFn: (key, value) => value.type === 'B' && value.backup !== true
})(configuration); Now with your approach you would have to target |
Why? _.eachDeep(files,(value,key,parentValue,ctx)=>{
if(ctx.parent.key=='resources' && value.type=='B' && value.backup !== true){
// do your job
}
},
{leavesOnly:false,onFalse:{skipChildren:false}}) or if you need the only tree with bad resources: let badResources = _.filterDeep(data,(value,key,parentValue,ctx) =>
ctx.parent.key=='resources'
&& value.backup !== true
&& value.type=='B',
{leavesOnly:false,onFalse:{skipChildren:false}}); Take a look here They build their docs using this JSON. Where was your lib 3-4 years ago, when I so needed it? :)
Yes, I agree, as I said before
I need to meditate on this.
If I'll add it, 'tree' will be just a specific case of 'glob', and all the methods will sit on top of some abstract deep iterator and handle callbacks some way.
Finally, it will be beautiful to have well-integrated glob iterator as an option in the deepdash, but it's enough to use it as a standalone tool in case of need. |
Fair enough. Lot's of options to be passed in though. I'm assuming the In Is it possible to differentiate between arrays and objects when you are iterating?
That's a big json file. Great example for some performance testing :)
That sounds like a fun project. Would have been a good use for objectRewrite.
Still swimming in that ancient old ocean 😆
👍
Honestly no idea how those work either. Never used them.
I guess from a performance perspective you don't want to decouple them though. Ideally the iterator is "smart" enough that you don't need to separate.
Haha, just throw them out and use
Please take a look at the function signature that I linked to above. Parents are contained already. That's why I am saying our iterators are very similar. They just have different signatures. Or are you talking about something else?
Not a problem since you have access to the parents.
Not sure what you mean?
Yeah, very much agreed. That's the main issue.
I would be an opportunity to clean up some of your options. With But, having said all that it would be a big undertaking. I'd honestly love to see what cases are lurking in this library and how |
Yes, filter deep expects three possible results from the predicate: true, undefined or false.
{
// should value be cloned deeply or primitives only should be copied(objects array will be empty).
cloneDeep,
// do we need iterate over children?
skipChildren,
// should 'empty' node remain if no children passed the filter?
keepIfEmpty
} each default preset for actually var files = {some: {path: {resources: {
myResource: {type: 'A'},
yourResource:{type: 'B'}
} } } };
_.eachDeep(files,(value,key,parentValue,ctx)=>{
if(ctx.parent.key=='resources' && value.type=='B' && value.backup !== true){
console.log(ctx.path);
}
});
let badResources = _.filterDeep(files,(value,key,parentValue,ctx) =>
{
if(ctx.parent.key=='resources' && value.backup !== true && value.type=='B')
return true;
},
{leavesOnly:false});
console.log(badResources); Ok, looks like I have to think about all this once this again :) |
Sweet man. Had a lot of fun chatting with ya :) |
I still think https://github.com/YuriGor/deepdash/blob/master/src/private/getIterate.js would benefit from using object-scan (perf wise and for readability). Have to better understand what that file does... Any chance we can refactor it a bit into separate functions to make it more readable? |
Hi! How are you doing? Is it going easy in Canada? Yes, I have plans to refactor this:
But in general, it still will be one-by-one iteration over all the keys/indexes or children So the only way of significant performance improvement I see here is trying to split object tree into branches and iterate over them in parallel. |
Hey! Doing good with the great outdoors nearby! Hope you're doing well yourself! I'll ping ya!
I think that's a great idea. Still very happy with going iterative instead of recursive in object scan. It makes code a bit less readable, but significantly improved perf
Yeah I saw a few things that looked a bit ugly. Refactoring will take work... But will also be fun!
Doing parallel traversal was a huge perf improvement in object scan and made many use cases possible
That's what I found too! That's why object scan now has the joined flag set to false by default. In most cases returning and using the array syntax (that lodash understands as well) is very usable. Keep me posted when you get to some refactoring! |
Just created an issue for this: #55 |
I was thinking about writing a library like this myself :) I'm the author of object-scan and was thinking it might be a great fit to improve key matching and efficiency of this library (I'm not a big fan of regex matching).
Take a look and let me know what you think! It would certainly be more in line with the existing lodash selector syntax.
Would be very happy to help with any questions! Cheers, L~
The text was updated successfully, but these errors were encountered: