MongoDB substring product search order by highest match - c#

I'm not so good in Mongodb. That's my first project using this. I'm working on some sort of shop website. We're using C# and newest C# driver for mongodb. I need an idea how to implement my algorithm to filter products based on user's input. Here's how my algorithm works/should work.
User type something like "Blue adidas men spiderman cotton t-shirt"
I split whole phrase into single words so I have "blue", "adidas", "men", "spiderman", "cotton", "t-shirt"
I check each word against retailer, color, category names in my db. So here's how it goes.
"Blue" - oh, it's a color filter let's filter this against proper field in my mongodb document.
"adidas" - ok, it's retailer. Do the same as in point above
"men" - it's a category. Ok so let's search only in this category.
"cotton" - I don't filter by textile. Can't filter by this.
"spiderman" - definetly not a retailer, color etc. Can't filter using this.
"t-shirt" - Ok. It's category name that appears as child category for kids, men, women but since I already have "men" as my category I just go a little bit deeper into my category structure - Now I update my categoryId I want to filter against and it's t-shirt category under men category.
Ok. I parsed few filter params from user input but I still have cotton and spiderman word. And that's where I'm lost.
I would like to not ignore this "spiderman" and "cotton" words. I would like to actually get ALL items from database order by word coverage in its product name so my expected result (ofc with all retailer, color etc filters applied first) are:
products that names contain both "spiderman" and "cotton" products
products that names contain either "spiderman" or "cotton"
How can I do that?

First of all, you can combine multiple filters with using & operator like this:
var builder = Builders<Product>.Filter;
FilterDefinition<Product> filter = builder.Empty;
filter &= builder.Eq("Color", "blue");
filter &= builder.Eq("Retailer", "adidas");
filter &= builder.Eq("Category", "men");
Then, you can use Regex to filter the products the names of which contain any rest words/all rest words.
OR search (the name contains "cotton" OR "spiderman")
var restWords = new string[] { "cotton", "spiderman" };
var orReg = new System.Text.RegularExpressions.Regex(string.Join("|", restWords));
filter &= builder.Regex("Name", BsonRegularExpression.Create(orReg));
List<Product> filteredList = products.Find(filter).ToListAsync().Result;
AND search (the name contains "cotton" AND "spiderman")
foreach (var word in restWords)
{
filter &= builder.Regex("Name", BsonRegularExpression.Create(new System.Text.RegularExpressions.Regex(word)));
}
List<Product> filteredList = products.Find(filter).ToListAsync().Result;

Related

How to construct a LINQ Query to test a list against another list where elements start with the elements from the other

Basically, I am constructing an autocomplete textbox to search for name fragments. (Yes could have used Lucene or etc but due to many non-technical reasons, not using it)
public IEnumerable<ContactAutoComplete> SelectActiveContactsAutoCompleteForMailingList(string fullName)
{
//Search query fullname e.g. James Francis Cameron is decomposed
//into a list comprising James, Francis, Cameron
IEnumerable<string> fragment = fullName.Trim().Split();
return _db.Contacts.Where(contact => contact.Status == Statuses.Activated &&
(fragment.All(c => contact.FullName.Trim().Split().Any(frag =>
frag.StartsWith(c))
}
What I need in the above context is a clause to
Apply the .Trim() and .Split() to the FullName field of each contact
Test the obtained list of text fragments (contact.FullName.Trim.Split) against the text fragments (fragment) obtained from the search query
Check if each text fragment (fragment) will appear at the start of each of fragments obtained from contact.FullName.Trim.Split
Examples:
In the database, a contact has the FullName, James Francis Cameron
Searching for
"Fra Cam" - OK
"Cam Fra" - OK (because in Asia, name ordering convention is inconsistent)
"Cis Ron" - not OK
Many thanks!
Linq can't translate methods like string.Split() to SQL, so you can't do it in a super general way like this, and you can't easily split the Full Name into two fields while doing a query. If you have your name in two fields for first and last name you can do this:
var fragments = fullName.Split(' ');
var first = fragments.FirstOrDefault().Trim();
var last = fragments.Skip(1).FirstOrDefault().Trim();
var r = db.Contacts.Where(x => (x.First.StartsWith(first) && x.Last.StartsWith(last))
|| (x.First.StartsWith(last) && x.Last.StartsWith(first))
);

Entity Framework: Ignore tags/special characters when searching items

I've database table which contains text with BBCODES
QuestionId | QuestionText
1 | What is your [u]name[/u]?
2 | [i]How[/i] old are you?
I need to search all records using EF from a table that contains some search string regardless of bbcodes.
so, the following code return nothing:
var searchTerm = "your name".ToLower();
var query = _repository.GetAll().Where(question=>question.QuestionText.ToLower().Contains(searchTerm))
but I need to get the first record.
What is the best approach to ignore special tags when searching records?
You can split the search phrase on white space, to get separate words to search for, and then use some more LINQ to find the matching items. Simple example using List of strings instead of your EF class:
List<string> Questions = new List<string>();
Questions.Add("What is your [u]name[/u]?");
Questions.Add("[i]How[/i] old are you?");
String searchTerm = "your name".ToLower();
String[] searchPhrases = searchTerm.Split(null);
var matching = (from question in Questions
where searchPhrases.Any(searchPhrase => question.Contains(searchPhrase))
select question);
It should work, as long as none of the words you search for have BBCODES inside of them.

Linq with Regex

I have the matches of a regex pattern and I'm having some difficulties designing the Linq around it to produce the desired output.
The data is fixed lengths: 1231234512341234567
Lengths in this case are: 3, 5, 4, 7
The regex pattern used is: (.{3})(.{5})(.{4})(.{7})
This all works perfectly fine and the matched results of the pattern are as expected, however, the desired output is proving to be somewhat difficult. In fact, I'm not even certain what it would be called in SQL terms - except maybe a pivot query. The desired output is to take all the values from each of the groups at a given position and concatenate them so for example:
field1:value1;value2;value3;valueN;field2:value2;value3;valueN;
Using the below Linq expression, I was able to get field1-value1, field2-value2, etc...
var matches = Regex.Matches(data, re).Cast<Match>();
var xmlResults = from m in matches
from e in elements
select string.Format("<{0}>{1}</{0}>", e.Name, m.Groups[e.Ordinal].Value);
but I can't seem to figure out how to get all the values at position 1 from "Groups" using the element's Ordinal, then all the values at position 2 and so on.
The "elements" in this example is a collection of field names and ordinal positions (starting at 1). So, it would look like this:
public class Element
{
public string Name { get; set; }
public int Ordinal { get; set; }
}
var elements = new List<Element>{
new Element { Name="Field1", Ordinal=1 },
new Element { Name="Field2", Ordinal=2 }
};
I've reviewed a bunch of various Linq expressions and dug into some pivot type Linq expressions, but none of them get me close - they all use the join operator which I don't think is possible.
Does anyone have any idea how to make this Linq?
You should be able to do this by changing the query to select from elements only, and bring in the matches through string.Join, like this:
// Use ToList to avoid iterating matches multiple times
var matches = Regex.Matches(data, re).Cast<Match>().ToList();
// For each element, join all matches, and pull in the value for e.Ordinal
var xmlResults = elements.Select(e =>
string.Format(
"<{0}>{1}</{0}>"
, e.Name
, string.Join(";", matches.Select(m => m.Groups[e.Ordinal].Value))
);
Note: this is not the best way of formatting XML. You would be better off using one of .NET's libraries for making XML, such as LINQ2XML.

Lucene.net Umbraco order by multiple fields

I am trying to search on multiple fields in Umbraco and order the results by two factors:
Is it a specific type, yes place it on top
Order it by stock
I am using Umbraco 4.9.1.
var Searcher = ExamineManager.Instance.SearchProviderCollection["MySearcher"];
var searchCriteria = Searcher.CreateSearchCriteria();
var results = Searcher.Search(searchCriteria.OrderByDescending("IsAccesoires", "stock").Compile().RawQuery(search));
If I only search on IsAccesoires it works; if I search on both (IsAccesoires and stock") stock ignores the IsAccesoires.

Linq question about grouping something that can change?

I have a list of multiple string and I need to do operation on them by the suffixe they have. The only thing that is not changing is the beginning of the string (They will be always ManifestXXX.txt, FileNameItems1XXX...). The string end's with a suffix is different everytime. Here is what I have so far (Linq Pad):
var filesName = new[] { "ManifestSUFFIX.txt",
"FileNameItems1SUFFIX.txt",
"FileNameItems2SUFFIX.txt",
"FileNameItems3SUFFIX.txt",
"FileNameItems4SUFFIX.txt",
"ManifestWOOT.txt",
"FileNameItems1WOOT.txt",
"FileNameItems2WOOT.txt",
"FileNameItems3WOOT.txt",
"FileNameItems4WOOT.txt",
}.AsQueryable();
var query =
from n in filesName
group n by n.EndsWith("SUFFIX.txt") into ere
select new{ere} ;
query.Dump();
The condition in the GROUP is not good. I am thinking to try to get all possible suffixe with a nested SELECT in the group but I can't find a way to do it.
How can I have 3 differents group, grouping by their suffixe with Linq? Is it possible?
*Jimmy answer is great but still doesn't work the way desired. Any fix?
group by the suffix rather than whether it matches any particular one.
...
group by GetSuffix(n) into ere
...
string GetSuffix(string n) {
return Regex.Replace(n,"^Manifest|^FileNameItems[0-9]+", "");
}

Categories