Elastic query to search eliminating spaces from mongodb data C# - c#

I wanted to know if there is a way to search MongoDB data excluding space. My MongoDB has data such as -
"Element":"Ele1"
"Element":"Ele2"
"Element":"Ele 3"
And my search query has string excluding space i.e "Ele1,Ele2,Ele3". So When I pass "Ele3" to my search query it does not return any result. I am using ElasticSearch.net nuget. My code is something like this -
var elResp = await data.MultiSearchAsync(m => m
.Query(q => q
.Nested(n => n
.Path("Element")
.Query(qq => qq
.Bool(b => b
.Must(m1 => m1
.Term("Element.keyword", "Ele1")))))))
Is there anything I can do or I have to create an extra field in MongoDb which saves Element Value without space. So I can perform check from that. Thanks

You'd achieve the best performance by creating extra field without spaces, however in MongoDB you can just use $regex to match both fields with and without spaces.
db.collection.find({ "Element": { $regex: /Ele\s*3/ } })
where \s is a whitespace and *means zero or more occurences. The more spaces you predict, the more you should consider adding a field without any spaces (because of performance), but /E\s*l\s*e\s*3/ will be still working for this case.

Related

how to get a value from json with just the index?

Im making an app which needs to loop through steam games.
reading libraryfolder.vbf, i need to loop through and find the first value and save it as a string.
"libraryfolders"
{
"0"
{
"path" "D:\\Steam"
"label" ""
"contentid" "-1387328137801257092942"
"totalsize" "0"
"update_clean_bytes_tally" "42563526469"
"time_last_update_corruption" "1663765126"
"apps"
{
"730" "31892201109"
"4560" "9665045969"
"9200" "22815860246"
"11020" "776953234"
"34010" "11967809445"
"34270" "1583765638"
for example, it would record:
730
4560
9200
11020
34010
34270
Im already using System.Text.JSON in the program, is there any way i could loop through and just get the first value using System.Text.JSON or would i need to do something different as vdf doesnt separate the values with colons or commas?
That is not JSON, that is the KeyValues format developed by Valve. You can read more about the format here:
https://developer.valvesoftware.com/wiki/KeyValues
There are existing stackoverflow questions regarding converting a VDF file to JSON, and they mention libraries already developed to help read VDF which can help you out.
VDF to JSON in C#
If you want a very quick and dirty way to read the file without needing any external library I would probably use REGEX and do something like this:
string pattern = "\"apps\"\\s+{\\s+(\"(\\d+)\"\\s+\"\\d+\"\\s+)+\\s+}";
string libraryPath = #"C:\Program Files (x86)\Steam\steamapps\libraryfolders.vdf";
string input = File.ReadAllText(libraryPath);
List<string> indexes = Regex.Matches(input, pattern, RegexOptions.Singleline)
.Cast<Match>().ToList()
.Select(m => m.Groups[2].Captures).ToList()
.SelectMany(c => c.Cast<Capture>())
.Select(c => c.Value).ToList();
foreach(string s in indexes)
{
Debug.WriteLine(s);
}
See the regular expression explaination here:
https://regex101.com/r/bQSt79/1
It basically captures all occurances of "apps" { } in the 0 group, and does a repeating capture of pairs of numbers inbetween the curely brackets in the 1 group, but also captures the left most number in the pair of numbers in the 2 group. Generally repeating captures will only keep the last occurance but because this is C# we can still access the values.
The rest of the code takes each match, the 2nd group of each match, the captures of each group, and the values of those captures, and puts them in a list of strings. Then a foreach will print the value of those strings to log.

Trying to compare the two strings inside the linq query

I am trying to check the equality on two strings using the EF Functions like a method, but it is failing somehow and getting null on spaces.
The two strings are like as below, Here you can observe the only difference is the case for SPACE
the displayName is L1-008-5 SPACE and I have stored Displayname in space identity object as like this L1-008-5 Space
L1-008-5 SPACE and L1-008-5 Space
Here is the code
var space = dbContext.Spaces.SingleOrDefault(a => EF.Functions.Like(a.SpaceIdentity.DisplayName, displayName));
and I tried the below options as well
dbContext.Spaces.SingleOrDefault(s => s.SpaceIdentity.DisplayName.Equals(displayName, StringComparison.OrdinalIgnoreCase));
dbContext.Spaces.SingleOrDefault(s => string.Equals(s.SpaceIdentity.DisplayName,displayName, StringComparison.OrdinalIgnoreCase));
None of the above are working and getting null on spaces.
Could anyone please point me in the right direction where I am doing wrong with the above comparison.
Many thanks in advance!
Assuming you are using the Npgsql.EntityFrameworkCore.PostgreSQL provider for EF Core, you should have access to the ILike method which is a case-insensitive LIKE. That means you are able to use this code:
var space = dbContext.Spaces
.SingleOrDefault(a =>
EF.Functions.ILike(a.SpaceIdentity.DisplayName, displayName));
// ^^^^^

Sitecore Lucene index search term with space match same word without space

This seems so simple that I'm convinced I must be overlooking something. I cannot establish how to do the following in Lucene:
The problem
I'm searching for place names.
I have a field called Name
It is using Lucene.Net.Analysis.Standard.StandardAnalyzer
It is TOKENIZED
The value of Name contains 1 space in the value: halong bay.
The search term may or may not contain an extra space due to culturally different spellings or genuine spelling mistakes. E.g. ha long bay instead of halong bay.
If I use the term halong bay I get a hit.
If I use the term ha long bay I do not get a hit.
The attempted solution
Here's the code I'm using to build my predicate using LINQ to Lucene from Sitecore:
var searchContext = ContentSearchManager.GetIndex("my_index").CreateSearchContext();
var term = "ha long bay";
var predicate = PredicateBuilder.Create<MySearchResultItemClass>(sri => sri.Name == term);
var results = searchContext.GetQueryable<MySearchResultItemClass>().Where(predicate);
I have also tried a fuzzy match using the .Like() extension:
var predicate = PredicateBuilder.Create<MySearchResultItemClass>(sri => sri.Like(term));
This also yields no results for ha long bay.
How do I configure Lucene in Sitecore to return a hit for both halong bay and ha long bay search terms, ideally without having to do anything fancy with the input term (e.g. stripping space, adding wildcards, etc)?
Note: I recognise that this would also allow the term h a l o n g b a y to produce a hit, but I don't think I have a problem with this.
A TOKENIZED field means that the field value is split by a token (space in that case) and the resulting terms are added to the index dictionary. If you index "halong bay" in such a field, it will create the "halong" and "bay" terms.
It's normal for the search engine to fail to retrieve this result for the "ha long" search query because it doesn't know any result with the "ha" or "long" terms.
A manual approach would be to define all the other ways to write the place name in another multi-value computed index field named AlternateNames. Then you could issue this kind of query: Name==query OR AlternateNames==query.
An automatic approach would be to also index the place names without spaces in a separate computed index field named CompactName. Then you could issue this kind of query: Name==query OR CompactName==compactedQueryWithoutSpaces
I hope this helps
Jeff
Something like this might do the trick:
var predicate = PredicateBuilder.False<MySearchResultItemClass>();
foreach (var t in term.Split(' '))
{
var tempTerm = t;
predicate = predicate.Or(p => p.Name.Contains(tempTerm));
}
var results = searchContext.GetQueryable<MySearchResultItemClass>().Where(predicate);
It does split your input string, but I guess that is not 'fancy' ;)

Regex word boundaries capturing wrong words

I am having some difficulties trying to get my simple Regex statement in C# working the way I want it to.
If I have a long string and I want to find the word "executive" but NOT "executives" I thought my regex would look something like this:
Regex.IsMatch(input, string.Format(#"\b{0}\b", "executive");
This, however, is still matching on inputs that contain only executives and not executive (singular).
I thought word boundaries in regex, when used at the beginning and end of your regex text, would specify that you only want to match that word and not any other form of that word?
Edit: To clarify whats happening, I am trying to find all of the Notes among Students that contain the word executive and ignoring words that simply contain "executive". As follows:
var studentMatches =
Students.SelectMany(o => o.Notes)
.Where(c => Regex.Match(c.NoteText, string.Format(#"\b{0}\b", query)).Success).ToList();
where query would be "executive" in this case.
Whats strange is that while the above code will match on executives even though I don't want it to, the following code will not (aka it does what I am expecting it to do):
foreach (var stu in Students)
{
foreach (var note in stu.Notes)
{
if (Regex.IsMatch(note.NoteText, string.Format(#"\b{0}\b", query)))
Console.WriteLine(stu.LastName);
}
}
Why would a nested for loop with the same regex code produce accurate matches while a linq expression seems to want to return anything that contains the word I am searching for?
Your linq query produces the correct result. What you see is what you have written.
Let's give proper names to make it clear
var noteMatches = Students.SelectMany(student => student.Notes)
.Where(note => Regex.Match(note.NoteText, string.Format(#"\b{0}\b", query)).Success)
.ToList();
In this query after executing SelectMany we received a flattened list of all notes. Thus was lost the information about which note belonged to which student.
Meanwhile, in the sample code with foreach loops you output information about the student.
I can assume that you need a query like the following
var studentMatches = Students.Where(student => student.Notes
.Any(note => Regex.IsMatch(note.NoteText, string.Format(#"\b{0}\b", query))))
.ToList();
However, it is not clear what result you want to obtain if the same student will have notes containing both executive and executives.

How to check if returned array has atleast two elements?

Here is my code:-
Repository.DB.Table01Repository.AsQueryable().Where(item => (item.Name.Split(' ')[1] == null)).ForEach(items => _VerifyList.Add(items.Name.Trim()));
I split 'Name' by ' ' and if it does not have second element, I need those records.
Thanks..
Since Split will produce two or more elements if the Name contains at least one space, you can write it as follows:
Repository.DB.Table01Repository.AsQueryable()
.Where(item => !item.Name.Contains(" "))
.ForEach(items => _VerifyList.Add(items.Name.Trim()));
There's no need to actually perform the Split.
In addition, Contains can be mapped to SQL (it is one of the CLR methods that map to canonical LINQ functions) which means that the query will execute successfully on your database. Other methods (like Split itself) cannot be used when querying a database through IQueryable, and would cause a runtime exception to be thrown.
You want to be wary of white space in this scenario. "abcd ".Split().Count() will return 2 due to the whitespace character at the end. To avoid this use the Trim() method to remove leading and trailing white spaces before splitting like this:
if(item.Name.Trim().Split() > 1)
{
// Do stuff
}
or in a Where clause:
.Where(item => (item.Name.Trim().Contains(" ")))

Categories