Breaking/abstract out examine attempt 2 #14495

bielu · 2023-07-04T07:47:02Z

Retargeted pr, please read convs here:
#13851 (comment)

See also #13620

github-actions · 2023-07-04T07:47:17Z

Hi there @bielu, thank you for this contribution! 👍

While we wait for one of the Core Collaborators team to have a look at your work, we wanted to let you know about that we have a checklist for some of the things we will consider during review:

It's clear what problem this is solving, there's a connected issue or a description of what the changes do and how to test them
The automated tests all pass (see "Checks" tab on this PR)
The level of security for this contribution is the same or improved
The level of performance for this contribution is the same or improved
Avoids creating breaking changes; note that behavioral changes might also be perceived as breaking
If this is a new feature, Umbraco HQ provided guidance on the implementation beforehand
💡 The contribution looks original and the contributor is presumably allowed to share it

Don't worry if you got something wrong. We like to think of a pull request as the start of a conversation, we're happy to provide guidance on improving your contribution.

If you realize that you might want to make some changes then you can do that by adding new commits to the branch you created for this work and pushing new commits. They should then automatically show up as updates to this pull request.

Thanks, from your friendly Umbraco GitHub bot 🤖 🙂

… removing examine

JasonElkin · 2023-07-17T14:15:34Z

Hey @bielu, I know that you'll need some more input from HQ to get this over the line, just want to flag a couple of things in the mean time.

Draft PRs don't necessarily alert anyone, so once you're ready for some input be sure to tag someone to take a look (I don't know if simply changing it from a draft will alert anyone either, and as you've already started the conversation on #13851 it's best to tag Bjarke and/or Kenn when you're ready).

Core Collaborators don't have the power to merge to v13/dev so you'll either need to get someone from HQ to do that or retarget it to /contrib.

Thanks, and good luck!

bielu · 2023-07-18T08:56:55Z

@JasonElkin It is draft on purpose, as agreed with Umbraco HQ :) as it is one of PRs which we want avoid merging by incident.
I know I need get Umbraco Employers to merge to dev branches, it is not first cms core pr which I am doing :)

JasonElkin · 2023-07-18T09:09:34Z

Totally, @bielu. TBH my reply was more to stop the Core Collabs automated alert from complaining to to us that nobody had gotten back to you yet (no reply for 201 business hours, it tells me 😱).

But also, on the grounds that this isn't the usual contribution path, and we're doing things in the open, it's helpful to make the process clear for the benefit of any others that come across this.

bielu · 2023-07-23T09:30:58Z

@kjac when you have time, I need some back and forward about unit tests.
There is my thinking:

as we now rely on provider not directly on examine, we should rewrite test based on provider in way so we dont know what is underhood, so instead of indexing directly to examine in unit tests as we doing now, we should have unit tests which basically use provider first to index and later retrieve items from provider, and we should be able to switch providers in this tests :)
we should have now integration test for providers including checking Headless API responses
I think I might miss some bits around here, but for sure we need come with test strategy which covers less examine and more abstraction.

kjac · 2023-07-24T05:09:54Z

Hi @bielu 👋

we should rewrite test based on provider in way so we dont know what is underhood, ... , and we should be able to switch providers in this tests

Yes, definitively 💯

we should have now integration test for providers including checking Headless API responses

I'm not entirely sure what you mean here. If the Delivery API query providers are rewritten to use the search abstractions you're creating here, wouldn't the above mentioned provider based test be sufficient? Don't get me wrong, I'm all for more tests 😄 just making sure you don't scope creep yourself here.

bielu · 2023-07-24T09:25:25Z

@kjac I meant that we now have special abstraction for filters, which I would love to cover with examples from current requests to headless api, to ensure swap from examine to search providers doesnt change outcome :)
As umbraco responses are pretty basic and use really simple requests, headless use filtering which makes it little harder to test just in provider as I can write test but without context they might not cover correctly parts which headless is using

bielu · 2023-08-02T17:16:36Z

@kjac I started implementing unit test, so to ensure the abstraction and idea works as we wanted, I introduced Lifti provider, which is basically in memory search provider :)
also I noticed CodeScene complains about a lot stuff, because I didnt move it but duplicated by obsoleting them, any tips how can I make it less unhappy without refactoring bits which imho shouldn't be touch as part of this PR?

bielu · 2023-08-28T13:24:03Z

@kjac managed to correct indexing of Node Alias! btw when you requested from api can you share test db + request which you doing as I cannot reproduce issue with api not returning content 😢

…-out-examine-attempt-2

kjac · 2023-08-29T11:29:21Z

@bielu I see the alias now 😄

Still don't get anything from the Delivery API though. DB attached here - admin login is test@test / Test123456

I'm still using the "default" Lucene directory factory, but hopefully that is not the issue:

  "Umbraco": {
    "CMS": {
      "Examine": {
        "LuceneDirectoryFactory": "Default"
      },
      ...

I can query single items just fine (/umbraco/delivery/api/v1.0/content/item/{key}) but that endpoint does not use the Examine index, so that makes sense.

Querying multiple items (/umbraco/delivery/api/v1.0/content/, /umbraco/delivery/api/v1.0/content/?filter=contentType:page, ...) all yield zero results.

bielu · 2023-08-29T15:52:01Z

@kjac will have a look over next few days, but going for holidays today so will be less responsive

kjac · 2023-08-30T06:29:57Z

Don't fret - enjoy holidays 😄

bielu · 2023-09-16T08:26:08Z

@kjac I am back from holiday, I am looking into issue now, btw I was trying to update branch with changes from v13/dev but I cannot get v13/dev to built at all, do you know if there are any issues with this branch or maybe it is git issue and time for fresh pull? 😕
Edit: fresh pull fixed issues

bielu · 2023-09-16T19:55:53Z

@kjac just pulled and play around with your db,

so it has to be one of settings somewhere 😕 but now is question which one!
Do you have maybe time following week to catch up on call and debug together to see why it fails on your machine? :)
As I am not able to reproduce it at all. I am working on .net 8 and windows 11, and I am assuming you are using same OS?

…-out-examine-attempt-2

bielu · 2023-09-17T13:16:31Z

Also added small quality of live change in dashboard:
for indexes:

for searchers:

so it is clear which native query syntax people should use

kjac · 2023-09-18T05:03:20Z

Hi @bielu - welcome back.

I'm pressed for time this week... the community teams are visiting, so a fair bit of my time is reserved for that. I'll try to find time to pull and test again during the week, but it will likely be next week.

bielu · 2023-09-27T08:09:40Z

@kjac Lets find time next week to sitdown and see why it doesnt work if it still doesnt work for you?

kjac · 2023-09-27T09:55:53Z

Hi @bielu,

Coincidentally I did debug the PR this morning.

The concrete problem has to do with the published field. When indexing for the Delivery API, the field is stored as __Published, meaning any query for published:y explicitly filters out all results.

I'm seeing a few other issues with the query when querying the Delivery API index. The Lucene query looks like this: +*:* +published:y +(+__IndexType:content) +(+((culture:en culture:us) culture:none))

However, the corresponding Lucene query in v12 and v13 looks like this: +__IndexType:content +(culture:en-us culture:none) +published:y

The culture part seems wrong in the Lucene first query. The culture field should be indexed and queried as raw text.
I'm not sure I feel comfortable with the +*:* in the beginning of the first Lucene query. Won't that have a negative performance impact?

On top of that, I can't see any difference in the generated Lucene queries when appending filters from my Delivery API query - i.e. appending filter=contentType:[alias]. The resulting Lucene query is exactly the same as the one above. To compare, in v12 and v13 it would be appended like this: +__IndexType:content +(culture:en-us culture:none) +published:y +contentType:page

I haven't tested appending sorting to the Delivery API queries yet, so I can't tell if the same issue is present for sorting.

On a side note, the solution cannot build because Umbraco.Tests.Integration.csproj is malformed - likely due to a bad merge. See this line:

Umbraco-CMS/tests/Umbraco.Tests.Integration/Umbraco.Tests.Integration.csproj

Line 65 in 78358c6

    
           <ProjectReference Include="..\Umbraco.Tests.Common\Umbraco.Tests.Common.csproj" />

Now, we've been discussing things internally and there are a few more things to consider.

Testing

When it comes to Examine we're basically in a bad place test coverage wise. You also addressed this earlier. We've currently started looking into building a test suite for Examine, which will help ensure that we do not break things unintentionally while abstracting the search functionality.

These tests will eventually cover both search and indexing for "regular" Examine indexes (like against the internal and external index) and Delivery API index.

Tracking changes to the existing code

In an effort to cut Examine from the core, you have copied out a bunch of code to new class libraries and obsoleted the old code. I like it, but I'm afraid it might be a bit premature.

We are continuously making changes to that code. Particularly the bits that involve the Delivery API change frequently, but we might also be bugfixing things for "regular" Examine as well. I fear your approach will make it quite hard to track the ongoing changes and ensure that they're ported over to the new libraries.

Maybe it would be better to do all these changes in-place to start with and split things out later? It would of course mean that Examine would still be a first-class citizen, but the upsides in form of change tracking may outweigh that.

Thoughts?

Backwards compatability

It seems I'm still able to use ExamineManager etc. directly to perform searches. That's good for backwards compatability. We must be able to, otherwise the upgrade path will be a nightmare.

What I can't entirely grasp is... how will all these changes affect community search packages and generally the ability to perform advanced stuff like faceting and spatial search?

bielu · 2023-09-27T23:16:32Z

about published field it is related to inconsistency between indexing, we should make decision if we going to be consistent with other indexes or with v12 delivery api
"+*:* " has negilable performance impact, the most used way of searching sadly is managed query which generates only 4 queries per index field... I wish we could skip match all query but AFAIK we can't if we want reuse this query with correct way of doing it
let me check Umbraco.Tests.Integration.csproj but I think it is related to change in same file and wrong git decision aroudn which are master chanes
Tracking changes to the existing code - orginal pr was moving classes around without obsoleting them! which would be cleaner and easier to track changes, but because of Umbraco breaking changes policy I moved it to use obsoleted classes and new duplicated classes. we can work around it if we do v12 pr to obselete classes and remove obseleting classes in this pr! it would make it much easier to track!
Backwards compatability - TBH you never should speak directly with ExamineManager but umbraco never abstracted it out earlier... in this code we do keep ExamineManager in container which means people can access it, but they shouldn't all code should use new abstraction code, which makes code search engine agnostic :) but yeah because Examine is so long not correctly abstract out it will be quite a road!
culture field - just readed again and I will fix it asap.
for faceting and spatial search we agreed with Bjarke to not think about it right now, as Examine is not supporting it yet, but we can introduce abstraction to handle it as well :)

kjac · 2023-09-29T14:52:43Z

about published field it is related to inconsistency between indexing, we should make decision if we going to be consistent with other indexes or with v12 delivery api

I'm all for consistency. But we can't really change it at this point, because it'd require everyone to rebuild their indexes. We also can't assume what the field name is for custom indexes, or if it's even present (InternalIndex for example must be able to query unpublished content).

I wish we could skip match all query but AFAIK we can't if we want reuse this query with correct way of doing it

I believe +__IndexType:content does the trick here? That part of the query basically ensures that we select everything in the index, because everything has type content.

In terms of change tracking and backwards compat I'm mostly expressing concerns here. At this point it's way more important to keep up with core changes than to worry about obsoletion strategies in my opinion.

Also, I understand all the changes correctly, the obsoleted code won't actually be run (used in the DI) anyway? If that's correct we need to treat this whole thing as one big breaking change. That is, we still need to ensure that ExamineManager works for at least one major, as people have been using it for ages. But ... well, generally speaking, the obsoletion likely won't make a lot of sense, if the obsoleted code doesn't run.

Ideally we'll have sufficient test coverage for indexing and searching to ensure that any functionally breakage is discovered by said tests 😄

bielu · 2023-09-29T16:15:07Z

I'm all for consistency. But we can't really change it at this point, because it'd require everyone to rebuild their indexes. We also can't assume what the field name is for custom indexes, or if it's even present (InternalIndex for example must be able to query unpublished content).

We can duplicate property for now, that will have slight impact, it will just make marginally slower indexes.

I believe +__IndexType:content does the trick here? That part of the query basically ensures that we select everything in the index, because everything has type content.

Are we sure indexType is correct query? abstraction will be used between all types of indexes 🤔 maybe we should add dummy field and search for it?

In terms of change tracking and backwards compat I'm mostly expressing concerns here. At this point it's way more important to keep up with core changes than to worry about obsoletion strategies in my opinion.

i can remove obsoleting part and do just breaking changes, than we can track changes and git should try help use find what change and where 🤔 (we can add not breaking obsolete tags in separate prs than we dont have to worrying about introducing obsoletes in this pr?

Also, I understand all the changes correctly, the obsoleted code won't actually be run (used in the DI) anyway? If that's correct we need to treat this whole thing as one big breaking change. That is, we still need to ensure that ExamineManager works for at least one major, as people have been using it for ages. But ... well, generally speaking, the obsoletion likely won't make a lot of sense, if the obsoleted code doesn't run.

Yes, you understood correctly, we still use ExamineManager in there so you can access it. We can do path with introducing obseleting in v12.x, but yeah it doesnt change classes would have to be in v13 still present, but yeah we have a lot code which doesnt run in umbraco and tagged and obsolete (logger injecting helpers etc)

I think it would be beneficial to have call and include Bjarke to this and make full strategy on it, as right now we looking into it without any.

kjac · 2023-10-02T04:55:19Z

I agree. We should talk this through soon.

Bjarke is gone all week for the US summit. I'll get back to you next week, we'll see how the calendars might align 👍

bielu · 2023-10-11T09:09:06Z

Hi @kjac,
I just had though about it and came with this POC strategy:

we introduce obsolete information in next patch for v12
we keep them in v13
I will readdress this pr to v14, so we can have clean transition without having maintaining both approaches in v14
as keeping both namespaces in DI will cause us troubles.
Anyway I still think we should sit down on call and prepare full strategy for it. :)

bielu · 2023-10-16T12:33:42Z

Hey @kjac any updates on your side?

bielu · 2023-10-24T14:09:02Z

@kjac Hello mate, I have some time on my plate and wanted do some work related to this pr, but not sure if I should continue or should open new retargated to v14?

bergmania · 2023-11-09T13:58:30Z

Hi @bielu

I apologize many times that we have been so slow to respond. It's just been really busy lately.

In the team we have discussed how we can handle this PR and generally how we want to abstract the search in a good way.

I'm afraid we'll have to go back into the thinking box a bit, so we make sure we get the right approach to all the use cases we see. I am very sorry for that, as I know you have already put in a lot of effort, which fortunately I believe we can get a lot out of. Sadly it will certainly mean that Umbraco 13 is not realistic to reach.

The search abstraction is very central for us to get right. I think the next step is a RFC to be published with our thoughts, so that we can also involve a larger part of the community.

Best Regards

bielu · 2023-11-09T14:29:36Z

@bergmania that is fine, I am just little annoyed about lack of communication.
It is okay to bin this pr, as I already would have to do it after latest changes in umbraco related to some bits. About RFC yeah might a way to go. I am happy to help with implementation or giving some opinion points about it as package developer for search providers.

bielu · 2023-12-13T16:41:37Z

Hey @bergmania,
Any ETA for RFC? :)
Best,
Arkadiusz

bergmania · 2023-12-15T13:36:30Z

We haven't startet yet, but the aim in next year

bielu · 2024-04-17T09:56:17Z

Hey @bergmania any updates on RFC?

bergmania · 2024-04-17T11:52:39Z

No.. It's currently not a priority. Goal is still some time this year.

bielu1 added 2 commits July 4, 2023 09:42

chore: obsolete everything related to examine in infrastructure project

12c3578

chore: obsolete umbraco examine project

2f386a7

bielu changed the base branch from contrib to v13/dev July 4, 2023 07:47

bielu1 added 8 commits July 4, 2023 09:47

chore: reintroduced search abstractions

12ba01c

chore: obsoleted lefft over files

f90cfe0

chore: move search related test to separate test library

cf36457

chore: moved changes from other pr to resolve based issues related to…

9781a90

… removing examine

chore: started replacing examine services again in core services

b571e33

chore: get most projects to built

54dc5e0

fix: adjusted DI for tests

9e07817

chore: get umbraco to boot

765e82f

bielu1 and others added 9 commits July 25, 2023 00:21

feat: start implementing in memory dummy provider

a5542a5

feat: partial implemented in memory provider

9c38f3a

feat: introduced umbraco valuesets and index related nofications

a99a8ea

feat: used lifti for inmemory provider

035b72e

chore: implemented based for in memory provider

a816219

feat: introduced base tests for providers

57db2e9

fix: corrected code based on unit tests

65cf41f

fix: resolved issues with tests

e47dcb0

refactor: simplified code with switch

54efe2c

bielu added 2 commits August 5, 2023 09:24

refactor: split methods into smaller methods

de70d43

refactor: codescene-delta suggestions

1b992d9

fix: ensured to pass item type to examine valueset

185b572

Merge remote-tracking branch 'origin2/v13/dev' into breaking/abstract…

02891cc

…-out-examine-attempt-2

bielu mentioned this pull request Aug 29, 2023

Novicell.Examine.ElasticSearch for Umbraco 11? Novicell/Novicell.Examine.ElasticSearch#57

Open

bielu added 3 commits September 16, 2023 21:03

fix: corrected references

df5d822

Merge remote-tracking branch 'origin2/v13/dev' into breaking/abstract…

c27edaa

…-out-examine-attempt-2

feat: added search engine details to search dashboard

78358c6

nul800sebastiaan closed this Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Breaking/abstract out examine attempt 2 #14495

Breaking/abstract out examine attempt 2 #14495

bielu commented Jul 4, 2023 •

edited by JasonElkin

Loading

github-actions bot commented Jul 4, 2023 •

edited by JasonElkin

Loading

JasonElkin commented Jul 17, 2023

bielu commented Jul 18, 2023

JasonElkin commented Jul 18, 2023

bielu commented Jul 23, 2023

kjac commented Jul 24, 2023

bielu commented Jul 24, 2023

bielu commented Aug 2, 2023

bielu commented Aug 28, 2023

kjac commented Aug 29, 2023

bielu commented Aug 29, 2023

kjac commented Aug 30, 2023

bielu commented Sep 16, 2023 •

edited

Loading

bielu commented Sep 16, 2023

bielu commented Sep 17, 2023

kjac commented Sep 18, 2023

bielu commented Sep 27, 2023

kjac commented Sep 27, 2023

bielu commented Sep 27, 2023 •

edited

Loading

kjac commented Sep 29, 2023

bielu commented Sep 29, 2023 •

edited

Loading

kjac commented Oct 2, 2023

bielu commented Oct 11, 2023

bielu commented Oct 16, 2023

bielu commented Oct 24, 2023

bergmania commented Nov 9, 2023

bielu commented Nov 9, 2023 •

edited

Loading

bielu commented Dec 13, 2023

bergmania commented Dec 15, 2023

bielu commented Apr 17, 2024

bergmania commented Apr 17, 2024

Breaking/abstract out examine attempt 2 #14495

Breaking/abstract out examine attempt 2 #14495

Conversation

bielu commented Jul 4, 2023 • edited by JasonElkin Loading

github-actions bot commented Jul 4, 2023 • edited by JasonElkin Loading

JasonElkin commented Jul 17, 2023

bielu commented Jul 18, 2023

JasonElkin commented Jul 18, 2023

bielu commented Jul 23, 2023

kjac commented Jul 24, 2023

bielu commented Jul 24, 2023

bielu commented Aug 2, 2023

bielu commented Aug 28, 2023

kjac commented Aug 29, 2023

bielu commented Aug 29, 2023

kjac commented Aug 30, 2023

bielu commented Sep 16, 2023 • edited Loading

bielu commented Sep 16, 2023

bielu commented Sep 17, 2023

kjac commented Sep 18, 2023

bielu commented Sep 27, 2023

kjac commented Sep 27, 2023

Testing

Tracking changes to the existing code

Backwards compatability

bielu commented Sep 27, 2023 • edited Loading

kjac commented Sep 29, 2023

bielu commented Sep 29, 2023 • edited Loading

kjac commented Oct 2, 2023

bielu commented Oct 11, 2023

bielu commented Oct 16, 2023

bielu commented Oct 24, 2023

bergmania commented Nov 9, 2023

bielu commented Nov 9, 2023 • edited Loading

bielu commented Dec 13, 2023

bergmania commented Dec 15, 2023

bielu commented Apr 17, 2024

bergmania commented Apr 17, 2024

bielu commented Jul 4, 2023 •

edited by JasonElkin

Loading

github-actions bot commented Jul 4, 2023 •

edited by JasonElkin

Loading

bielu commented Sep 16, 2023 •

edited

Loading

bielu commented Sep 27, 2023 •

edited

Loading

bielu commented Sep 29, 2023 •

edited

Loading

bielu commented Nov 9, 2023 •

edited

Loading