Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability for the main Umbraco search to include media file names #6579

Merged
merged 12 commits into from Nov 14, 2019

Conversation

Jeavon
Copy link
Contributor

@Jeavon Jeavon commented Oct 4, 2019

The PR adds a new field to the MediaValueSet called "__umbracoFile", "__" prefixed as this is to be used internally only.
The new field is populated either by extracting the Src property from the Image Cropper Json or directly from the property in the case of files such as PDFs.
We have also updated the UmbracoTreeSearcher to include the new field and modified the mapper to include the filename in the search results in braces.

2019-10-03_17-06-06

@bjarnef
Copy link
Contributor

bjarnef commented Oct 4, 2019

@Jeavon should we add this as a constant here?
https://github.com/umbraco/Umbraco-CMS/blob/v8/dev/src/Umbraco.Examine/UmbracoExamineIndex.cs

and then use it like this similar to UmbracoExamineIndex.NodeKeyFieldName:

{UmbracoExamineIndex.UmbracoFileFieldName, new object[] {umbracoFile}}
if (source.Values.ContainsKey(UmbracoExamineIndex.UmbracoFileFieldName))
{
    var umbracoFile = source.Values[UmbracoExamineIndex.UmbracoFileFieldName];
    if (umbracoFile != null)
    {
        target.Name = $"{target.Name} ({umbracoFile})";
    }
}

@Jeavon
Copy link
Contributor Author

Jeavon commented Oct 4, 2019

@bjarnef yes we should, will add it

Copy link
Contributor

@Shazwazza Shazwazza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the field needs to be prefixed with __ generally that indicates it's a system field that isn't analyzed and can only be exact matched. Have left a couple other comments too.
Cheers!

src/Umbraco.Examine/MediaValueSetBuilder.cs Outdated Show resolved Hide resolved
src/Umbraco.Web/Search/UmbracoTreeSearcher.cs Outdated Show resolved Hide resolved
@Jeavon
Copy link
Contributor Author

Jeavon commented Oct 4, 2019

@Shazwazza trying to come up with a Lucene query that will allow for trailing wildcard but where symbols (hyphens, fullstops, undersores) are allowed, any ideas.....
For example, a file called "jancaps-copy.jpg" Then this query doesn't work +(umbracoFileSrc:jancaps\-copy* )

@Shazwazza
Copy link
Contributor

@Jeavon don't worry about searching on Key, i've added that in a different PR, see #6530

As for your question regarding escaping chars and allowing wildcards, can you make anything work in Luke? Does the wildcard query work when you aren't escaping chars? Is it because the hyphens gets stripped out by the standard analyzer and for this field it should use whitespace analyzer?

@Jeavon
Copy link
Contributor Author

Jeavon commented Oct 9, 2019

@Shazwazza I spent quite a while in Luke, the only thing I could make work was to replace the hyphens with spaces +(umbracoFileSrc:jancaps copy.jpg* ) this then works perfectly.
I believe the internal index is using the CultureInvariantWhitespaceAnalyzer so I don't think that's an issue...

@Jeavon
Copy link
Contributor Author

Jeavon commented Oct 9, 2019

I've added a workaround to replace the hyphens to spaces when searching this field. It now works as expected and shown in this gif however maybe there is a better way that I have not found...
2019-10-09_16-33-54

@Jeavon Jeavon requested a review from Shazwazza October 14, 2019 09:27
Copy link
Contributor

@Shazwazza Shazwazza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Jeavon, found a few other potential changes/questions/comments. Regarding searching with hyphens and wildcards, i found this SO thread: https://stackoverflow.com/questions/16858880/java-lucene-search-query-hyphens-with-wildcards

Could be another more effective work around: https://stackoverflow.com/a/36320124/694494)

Else the correct way to do it would be to use a custom analyzer for this field that strips out hyphens


if (!string.IsNullOrEmpty(umbracoFilePath))
{
var uri = new Uri(_runtimeState.ApplicationUrl.GetLeftPart(UriPartial.Authority) + umbracoFilePath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to use the application URL here? i mean, all this is really doing is constructing a Uri just so that it parses, you could just as easily have a static URI with a dummy hostname, etc... so you aren't allocating a new object for this each time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could have a dummy URI but that felt somehow wrong (I could also split the string), but can do that, what would you suggest for a dummy URI (is there a precedent)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a dummy URL would be better and then we don't need to inject the IRuntimeState, at the end of the day it won't make any difference and we don't need to change as much

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jeavon this will be the last thing we need to cleanup i think and then we can merge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shazwazza updated, thanks!

src/Umbraco.Tests/UmbracoExamine/IndexInitializer.cs Outdated Show resolved Hide resolved
src/Umbraco.Web/Search/UmbracoTreeSearcher.cs Outdated Show resolved Hide resolved
@Jeavon
Copy link
Contributor Author

Jeavon commented Oct 17, 2019

I like the ? idea, I'll try that also

@Jeavon
Copy link
Contributor Author

Jeavon commented Oct 29, 2019

@Shazwazza
I switched to using a List as suggested.
I have left you a comment about the Url here https://github.com/umbraco/Umbraco-CMS/pull/6579/files#r336117246
I tried the ? suggestion for the wildcard search but it didn't work (in Luke also), space seems to work perfectly so leaving as that....?

Copy link
Contributor

@Shazwazza Shazwazza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple questions left inline


if (!string.IsNullOrEmpty(umbracoFilePath))
{
var uri = new Uri(_runtimeState.ApplicationUrl.GetLeftPart(UriPartial.Authority) + umbracoFilePath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a dummy URL would be better and then we don't need to inject the IRuntimeState, at the end of the day it won't make any difference and we don't need to change as much

src/Umbraco.Web/Search/UmbracoTreeSearcher.cs Show resolved Hide resolved
Copy link
Contributor

@Shazwazza Shazwazza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @Jeavon just a couple small tweaks, sorry for being such a pain ;)

src/Umbraco.Examine/MediaValueSetBuilder.cs Outdated Show resolved Hide resolved
src/Umbraco.Web/Models/Mapping/EntityMapDefinition.cs Outdated Show resolved Hide resolved
…zation issues and therefore needing to mock the logger for the tests
@Shazwazza
Copy link
Contributor

Awesome stuff @Jeavon all merged in 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants