I'm writing a search algorithm. For the last portion of it, I want to split their search into individual words and then find any results that have at least one of those words in it. Is there any function that would work something like "ContainsAny" below? Otherwise, how can I make that happen?
string[] splitStr = text.Split();
result = db.Table.Where(x => x.Name.ContainsAny(splitStr).FirstOrDefault();
For example, if they search for "Metal Spoon" both "Metal Chair" and "spoon book" would be valid results because each contains at least one of the search terms.
There is no ContainsAny, but you can use combination of Any and Contains like this:
var results = db.Table.Where(x => splitStr.Any(s => x.Name.Contains(s)));
Related
I'm working of filtering comments. I'd like to replace string like this:
llllolllllllllllooooooooooooouuuuuuuuuuuddddddddddddddllllollllllllllllloooooooooooooooooouuuuuuuuuuuuuuuuuuddddddddddddddllllollllllllllllloooooooooooooooooouuuuuuuuuuuuuuuuuuddddddddddddddllllollllllllllllloooooooooooouuuuuuuuuuuuuuuuudddddddddddddd
with two words: lol loud
string like this:
cuytwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww
with: cuytw
And string like this:
hyyuyuyuyuyuyuyuyuyuyuyuyuyu
with: hyu
but not modify strings like look, geek.
Is there any way to achieve this with single regular expression in C#?
I think I can answer this categorically.
This definitely cant be done with RegEx or even standard code due to your input and output requirements without at minimum some sort of dictionary and algorithm to try and reduce doubles in a permutation check for legitimate words.
The result (at best) would give you a list of possible non mutually-exclusive combinations of nonsense words and legitimate words with doubles.
In fact, I'd go as far to say with your current requirements and no extra specificity on rules, your input and output are generically impossible and could only be taken at face value for the cases you have given.
I'm not sure how to use RegEx for this problem, but here is an alternative which is arguably easier to read.*
Assuming you just want to return a string comprising the distinct letters of the input in order, you can use GroupBy:
private static string filterString(string input)
{
var groups = input.GroupBy(c => c);
var output = new string(groups.Select(g => g.Key).ToArray());
return output;
}
Passes:
Returns loud for llllolllllllllllooooooooooooouuuuuuuuuuuddddddddddddddllllollllllllllllloooooooooooooooooouuuuuuuuuuuuuuuuuuddddddddddddddllllollllllllllllloooooooooooooooooouuuuuuuuuuuuuuuuuuddddddddddddddllllollllllllllllloooooooooooouuuuuuuuuuuuuuuuudddddddddddddd
Returns cuytw for cuytwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww
Returns hyu for hyyuyuyuyuyuyuyuyuyuyuyuyuyu
Failures:
Returns lok for look
Returns gek for geek
* On second read you want to leave words like look and geek alone; this is a partial answer.
This seems so simple that I'm convinced I must be overlooking something. I cannot establish how to do the following in Lucene:
The problem
I'm searching for place names.
I have a field called Name
It is using Lucene.Net.Analysis.Standard.StandardAnalyzer
It is TOKENIZED
The value of Name contains 1 space in the value: halong bay.
The search term may or may not contain an extra space due to culturally different spellings or genuine spelling mistakes. E.g. ha long bay instead of halong bay.
If I use the term halong bay I get a hit.
If I use the term ha long bay I do not get a hit.
The attempted solution
Here's the code I'm using to build my predicate using LINQ to Lucene from Sitecore:
var searchContext = ContentSearchManager.GetIndex("my_index").CreateSearchContext();
var term = "ha long bay";
var predicate = PredicateBuilder.Create<MySearchResultItemClass>(sri => sri.Name == term);
var results = searchContext.GetQueryable<MySearchResultItemClass>().Where(predicate);
I have also tried a fuzzy match using the .Like() extension:
var predicate = PredicateBuilder.Create<MySearchResultItemClass>(sri => sri.Like(term));
This also yields no results for ha long bay.
How do I configure Lucene in Sitecore to return a hit for both halong bay and ha long bay search terms, ideally without having to do anything fancy with the input term (e.g. stripping space, adding wildcards, etc)?
Note: I recognise that this would also allow the term h a l o n g b a y to produce a hit, but I don't think I have a problem with this.
A TOKENIZED field means that the field value is split by a token (space in that case) and the resulting terms are added to the index dictionary. If you index "halong bay" in such a field, it will create the "halong" and "bay" terms.
It's normal for the search engine to fail to retrieve this result for the "ha long" search query because it doesn't know any result with the "ha" or "long" terms.
A manual approach would be to define all the other ways to write the place name in another multi-value computed index field named AlternateNames. Then you could issue this kind of query: Name==query OR AlternateNames==query.
An automatic approach would be to also index the place names without spaces in a separate computed index field named CompactName. Then you could issue this kind of query: Name==query OR CompactName==compactedQueryWithoutSpaces
I hope this helps
Jeff
Something like this might do the trick:
var predicate = PredicateBuilder.False<MySearchResultItemClass>();
foreach (var t in term.Split(' '))
{
var tempTerm = t;
predicate = predicate.Or(p => p.Name.Contains(tempTerm));
}
var results = searchContext.GetQueryable<MySearchResultItemClass>().Where(predicate);
It does split your input string, but I guess that is not 'fancy' ;)
I am having some difficulties trying to get my simple Regex statement in C# working the way I want it to.
If I have a long string and I want to find the word "executive" but NOT "executives" I thought my regex would look something like this:
Regex.IsMatch(input, string.Format(#"\b{0}\b", "executive");
This, however, is still matching on inputs that contain only executives and not executive (singular).
I thought word boundaries in regex, when used at the beginning and end of your regex text, would specify that you only want to match that word and not any other form of that word?
Edit: To clarify whats happening, I am trying to find all of the Notes among Students that contain the word executive and ignoring words that simply contain "executive". As follows:
var studentMatches =
Students.SelectMany(o => o.Notes)
.Where(c => Regex.Match(c.NoteText, string.Format(#"\b{0}\b", query)).Success).ToList();
where query would be "executive" in this case.
Whats strange is that while the above code will match on executives even though I don't want it to, the following code will not (aka it does what I am expecting it to do):
foreach (var stu in Students)
{
foreach (var note in stu.Notes)
{
if (Regex.IsMatch(note.NoteText, string.Format(#"\b{0}\b", query)))
Console.WriteLine(stu.LastName);
}
}
Why would a nested for loop with the same regex code produce accurate matches while a linq expression seems to want to return anything that contains the word I am searching for?
Your linq query produces the correct result. What you see is what you have written.
Let's give proper names to make it clear
var noteMatches = Students.SelectMany(student => student.Notes)
.Where(note => Regex.Match(note.NoteText, string.Format(#"\b{0}\b", query)).Success)
.ToList();
In this query after executing SelectMany we received a flattened list of all notes. Thus was lost the information about which note belonged to which student.
Meanwhile, in the sample code with foreach loops you output information about the student.
I can assume that you need a query like the following
var studentMatches = Students.Where(student => student.Notes
.Any(note => Regex.IsMatch(note.NoteText, string.Format(#"\b{0}\b", query))))
.ToList();
However, it is not clear what result you want to obtain if the same student will have notes containing both executive and executives.
I have a string builder which stores many words..for example, i did
StringBuilder builder = new StringBuilder();
builder.Append(reader.Value);
now, builder contains string as
" india is a great great country and it has many states and territories".. it contains many paragraphs.
I want that each word should be unique represented and its word count. example,
india: 1
great: 2
country: 1
and: 2
Also, this result should be saved in a excel file. But I am not getting the result.
I searched in google, but i am getting it by linq or by writing the words itself. Can you please help me out. I am a beginner.
You can use Linq to achieve it. Try something like this.
var result = from word in builder.Split(' ')
group word by word into g
select new { Word = g.Key, Count = g.Count() };
You can also convert this result into Dictionary object like this
Dictionary<string, int> output = result.ToDictionary(a => a.Word, a => a.Count);
So here each item in output will contains Word as Key and it's Count as value.
Well, this is one way to get the words:
IEnumerable<string> words = builder.ToString().Split(' ');
Look into using the String.Split() function to break up your string into words. You can then use a Dictionary<string, int> to keep track of unique words and their counts.
You don't really need a StringBuilder for this, though - a StringBuilder is useful when you contatenate strings together a lot. You only have a single input string here and you won't add to it - you'll split it up.
Once you finish processing all the words in the input string, you can write the code to export the results to Excel. The simplest way to do that is to create a comma-separated text file - search for that phrase and look into using a StreamWriter to save the output. Excel has built-in converters for CSV files.
I need to somehow detect if there is a parent OU value, and if there is retrieve it.
For example, here there is no parent:
LDAP://servera/OU=Santa Cruz,DC=contoso,DC=com
But here, there is a parent:
LDAP://servera/OU=Ventas,OU=Santa Cruz,DC=contoso,DC=com
So I would need to retrieve that "Ventas" string.
Another example:
LDAP://servera/OU=Contabilidad,OU=Ventas,OU=Santa Cruz,DC=contoso,DC=com
I would need to retrieve that "Ventas" string as well.
Any suggestions on how to tackle this?
string ldap = "LDAP://servera/OU=Ventas,OU=Santa Cruz,DC=contoso,DC=com";
Match match = Regex.Match(ldap, #"LDAP://\w+/OU=(?<toplevelou>\w+?),OU=");
if(match.Success)
{
Console.WriteLine(match.Result("${toplevelou}"));
}
I'd find the first occurrence of OU=... and get it's value. Then I'd check if there was another occurrence after it. If so, return the value I've got. If not, return whatever it is you want if there's no parent (String.Empty, or, null, or whatever).
You could also use a regular express like this:
var regex = new Regex(#"OU=(.*?),");
var matches = regex.Matches(ldapString);
Then check how many matches there are. If >1 return the captured value from the first match.
Update
The regex above needs to be improved to allow the case where there's an escaped comma (\,) in the LDAP string. Maybe something like:
var regex = new Regex(#"OU=((.*?(\\\,)+?)+?),");
That may be broken, and there may be simpler way to do the same thing. I'm not a regex wizard.
Another Update
Per Kimberly's comment below the regex should be #"OU=((?:.*?(?:\\\,)*?)+?),".
Call me crazy, but I 'd do it this way (hey ma, look, an one-liner!):
var str = "LDAP://servera/OU=Ventas,OU=Santa Cruz,DC=contoso,DC=com";
var result = str.Substring(str.LastIndexOf('/') + 1).Split(',')
.Select(s => s.Split('='))
.Where(a => a[0] == "OU")
.Select(a => a[1])
.Reverse().Skip(1).FirstOrDefault();
result is either null or has the string you want. This will work no matter how many OUs are in there and return the second-to-last one, as long as the format of the string is valid to begin with.
Update: possible improvements:
The above will not work correctly if your DN contains an escaped forward slash or an escaped comma.
To fix both of these you need to use regular expressions. Change:
str.Substring(str.LastIndexOf('/') + 1).Split(',')
to:
Regex.Split(Regex.Split(str, "(?<!\\\\)/").Last(), "(?<!\\\\),")
What this does is separate the DN by getting the last part of str after splitting on forward slashes, and split the in parts DN by splitting on commas. In both cases, negative lookbehind is used to make sure that the slashes/commas are not escaped.
Not as pretty, I know. But it's still an one-liner (yay!) and it still allows you to use LINQ further down to handle multiple OUs any way you choose to.