LINQ get the best match from the list comparing strings - c#

I have code where I try to find QProduct by productName in a List<QProduct> (lqp) usinq LINQ. The variable productNameInaccurate I get from file name and often contains some other text usually at the end of the string. So for example the productName = '0/2' and the productNameInaccurate that I get from fileName is '0/2 new' etc.
I have this code:
//Get inaccurate product name from filename
productNameInaccurate = fileName.Substring(ind + 1, ind2 - ind - 1).Replace("-", "/");
//Get filtered list of products
List<QProduct> filtered = lqp.Where(x=>productNameInaccurate.StartsWith(x.Name, StringComparison.InvariantCultureIgnoreCase)).ToList();
//Some more filtering - here I need to get best match by productName
if (isDop)
qProduct = filtered.Where(x => x.normy.StartsWith("ČSN EN")).FirstOrDefault();
else
qProduct = filtered.Where(x => !x.normy.StartsWith("ČSN EN")).FirstOrDefault();
It works ok, but I have also productName = '32/63' and productName = '32/63 B I'. This code finds QProduct that has productName == '32/63' even if productNameInaccurate == '32/63 BI'.
What I need is to find best match from the list of QProduct, so that for productNameInaccurate='0/2 new' I get QProduct.Name = '0/2' and for productNameInaccurate='32/63 Bi' I get QProduct.Name = '32/63 B II' instead of QProduct.Name = '32/63'. Ideally get the filtered list sorted by count of matching characters.

"Ideally get the filtered list sorted by count of matching characters."
// Get the filtered list and sort by count of matching characters
IEnumerable<QProduct> filtered = lqp
.Where(x=>productNameInaccurate.StartsWith(x.Name, StringComparison.InvariantCultureIgnoreCase))
.OrderByDesc(x => Fitness(x.ProductName, productNameInaccurate));
static int Fitness(string individual, string target) {
return Enumerable.Range(0, Math.Min(individual.Length, target.Length))
.Count(i => individual[i] == target[i]);
}

Related

LINQ: select specific value in a datatable column

In table I have 4 Columns GroupName, Display, Value and ID
How can I just show a specific data in display. I only want to show some of the groupNames Data
for example I only want to show Groupname = company and display = Forbes
Here's my linq
sample = (from c in smsDashboardDBContext.CodeDefinitions
orderby c.Display ascending
select new CodeDefinitionDTO
{
GroupName = c.GroupName,
Display = c.Display,
Value = c.Value,
Id = c.Id
}).ToList();
You can add a where statement in the query.
where c.GroupName == "company" && c.Display == "Forbes"
I only want to show some of the groupNames Data for example I only want to show Groupname = company and display = Forbes
Before the ToList, use a Where to keep only those items that you want to show:
var company = ...
var forbes = ...
var result = smsDashboardDBContext.CodeDefinitions
.OrderBy(codeDefinition => codeDefintion.Display)
.Select(codeDefinition => new CodeDefinitionDTO
{
Id = codeDefinition.Id,
GroupName = codeDefinition.GroupName,
Display = codeDefinition.Display,
Value = codeDefinition.Value,
})
.Where(codeDefinition => codeDefition.GroupName == company
&& codeDefintion.Display == forbes);
In words:
Order all codeDefinitions that are in the table of CodeDefintions by ascending value of property codeDefintion.Display.
From every codeDefinition in this ordered sequence make one new CodeDefinitionDTO with the following properties filled: Id, GroupName, Display, Value
Frome every codeDefintion in this sequence of CodeDefinitionDTOs, keep only those codeDefinitions that have a value for property GroupName that equals company and a value for property Display that equals forbes.
There is room for improvement!
Suppose your table has one million elements, and after the Where, only five elements are left. Then you will have sorted almost one million elements for nothing. Consider to first do the Where, then the Order and finally a Select.
In LINQ, try to do aWhere as soon as possible: all following statements will have to work on less items
In LINQ, try to do a Select as late as possible, preferrably just before the ToList / FirstOrDefault / ... This way the Select has to be done for as few elements as possible
So first the Where, then the OrderBy, then the Select, and finally the ToList / FirstOrDefault, etc:
var result = smsDashboardDBContext.CodeDefinitions
.Where(codeDefinition => ...);
.OrderBy(codeDefinition => codeDefintion.Display)
.Select(codeDefinition => new CodeDefinitionDTO
{
...
});

Linq join collection on collection

The Series object contains a property called Skus and it is IEnumerable
If this sku is in the allowed list of skus then I need that series.
In my example below, I'm joining on s.SeriesId which is not correct.
I believe it needs to be the collection s.Skus
I only want to return a series that has the contained sku in the collection.
IEnumerable<Data.Models.Series> series = await _seriesRepository.GetSeriesAsync(Properties.Settings.Default.Channel, page, limit);
string[] skusInSeries = series?.SelectMany(x => x.Skus).Distinct().ToArray();
IEnumerable<string> itemNumbers = GetAllowedSkus(Customer, Shipto, EnvironmentCode, AcceptLanguage, skusInSeries, Warehouse);
var selected = from s in series
join i in itemNumbers
on s.SeriesId equals i //s.Skus IEnumerable<string>
select s;
var selected = from s in series
where itemNumbers.Any(i => s.Skus.Contains(i))
select s;
Or the other way:
var selected = from s in series
where s.Skus.Any(sku => itemNumbers.Contains(sku))
select s;
I am guessing there are more Skus than itemNumbers typically and the the first choice is better. It may also be better to change itemNumbers to a list that can be passed to the database:
var itemNumbers = GetAllowedSkus(Customer, Shipto, EnvironmentCode, AcceptLanguage, skusInSeries, Warehouse).ToList();
var selected = from s in series
where itemNumbers.Any(i => s.Skus.Contains(i))
select s;
If a SQL (or other) database isn't involved, you would convert itemNumbers to a HashSet for efficient lookup:
var itemNumbers = new HashSet<string>(GetAllowedSkus(Customer, Shipto, EnvironmentCode, AcceptLanguage, skusInSeries, Warehouse));
var selected = from s in series
where s.Skus.Any(sku => itemNumbers.Contains(sku))
select s;

c# Linq get exact value

I need to compare the exact string values with the database.
e.g. string vals = "bicycle_store,clothing_store"
in the database i have other values containing word "store" e.g. electronics_store
when I execute the below linq it finds all contains "store" word. How can I update the linq so it only sets "selected = true" to what is been sent
return (from x in _ctx.Category
select new CategoryVM
{
Text = x.Text,
Value = x.Value,
Selected = vals.Contains(x.Value) == true ? true : false
}).ToList();
You should split the values first:
string vals = "bicycle_store,clothing_store";
string[] values = vals.Split(',');
return (from x in _ctx.Category
select new CategoryVM
{
Text = x.Text,
Value = x.Value,
Selected = values.Contains(x.Value)
}).ToList();
This will translate into a SQL IN statement.
Use where:
return (from x in _ctx.Category
where vals.Contains(x.Value)
select new CategoryVM
{
Text = x.Text,
Value = x.Value
}).ToList();
If you want to limit the output to the given input(s) exactly, do not use a single string. String.Contains will return true if a given value is within the string at all, so "bicycle_store".Contains("store") will return true, since the word "store" exists within the word "bicycle_store".
Instead, use a string array. A Contains on an array will only return true if the string matches one of the elements exactly.
string[] valsArray = vals.Split(',');
return (from x in _ctx.Category
where valsArray.Contains(x.Value)
select new CategoryVM
{
Text = x.Text,
Value = x.Value
}).ToList();

Sorting a list of objects with OrderByDescending

I have an object (KS), which holds ID and Title (which has a number as part of the Title).
All I'm trying to do is sort it into descending order. The object has:
ID Title
1 1 Outlook VPN
2 2 Outlook Access
3 4 Access VBA
4 3 Excel Automation
So when order by Title, it should read:
ID Title
3 4 Access VBA
4 3 Excel Automation
2 2 Outlook Access
1 1 Outlook VPN
The code I'm using to sort it is:
IEnumerable<KS> query = results.OrderByDescending(x => x.Title);
However, query still has the objects in the original order!
Is there something to do with having numbers at the start of Title that I'm missing?
EDIT
I've added the code from the controller for clarity:
[HttpPost]
// [ValidateAntiForgeryToken]
// id is a string of words eg: "outlook access vpn"
// I split the words and want to check the Title to see how many words appear
// Then sort by the most words found
public JsonResult Lookup(string id)
{
List<string> listOfSearch = id.Split(' ').ToList();
var results = db.KS.Where(x => listOfSearch.Any(item => x.Title.Contains(item)));
// search each result, and count how many of the search words in id are found
// then add the count to the start of Title
foreach (KS result in results)
{
result.KSId = 0;
foreach (string li in listOfSearch)
{
if (result.Title.ToLower().Contains(li.ToLower()))
{
result.KSId += 1;
}
}
result.Title = result.KSId.ToString() + " " + result.Title;
}
// sort the results based on the Title - which has number of words at the start
IEnumerable<KS> query = results.OrderByDescending(x => x.Title).ToList();
return Json(query, JsonRequestBehavior.AllowGet);
}
Here is a screenshot after query has been populated showing Titles in the order: 1, 2, 1, 1:
Model for the object if it helps is:
public class KS
{
public int KSId { get; set; }
public string KSSol { get; set; }
public string Title { get; set; }
public string Fix { get; set; }
}
As I said in a comment, put a .ToList() where you declare your results variable. That is:
var results = db.KS.Where(x => listOfSearch.Any(item => x.Title.Contains(item)))
.ToList();
If you don't do that, the foreach loop will modify objects that might not be the same as the objects you sort later, because the database query is run again each time you enumerate your IQueryable<>.
You can always just ignore the strange behavior and go the safe way:
List<KS> query = results.ToList();
query.Sort((a, b) => a.Whatever.CompareTo(b.Whatever));
return Json(query, blah);
I simple did this and it worked for me :-
var sortedOrder = Query.OrderBy(b => b.Title.Substring(b.Title.IndexOf(" ")));
All I have done is SubString the Title at the index of of the blank space when ordering the objects in the sequence, that way, the OrderBy is looking at the first character in the title rather than the number at the beginning.
Old question, but maybe this will help someone using C#. I used the following expressions to sort a list of objects based on their quantity parameter in ascending or descending order. Can modify it to compare text as the original question was concerned with.
Ascending Order:
locationMaterials.Sort((x, y) => x.Quantity.CompareTo(y.Quantity));
Descending Order:
locationMaterials.Sort((x, y) => y.Quantity.CompareTo(x.Quantity));
You are missing .ToList()
IEnumerable<KS> query = results.OrderByDescending(x => x.Title).ToList();
results.OrderByDescending(x => x.Title) is a query, and it has no data.
ToList() forces the query to be executed.
[EDIT]
My answer assumes that your results has acually not been materialized, and that that is the source of your problem.

search the database for the words within a string

Imagine that a user entered a sentence and I need to search for the subjects that consist of words within the entered sentence. These are the code that I thought they could solve the case.
var result = from x in dataBase.tableName
select x;
string[] words = enteredString.Split();
foreach(string word in words)
{
result = result.Where(x => x.subject.Contains(word));
}
it shows only the search result with the last word in sentence, but I thought the result must be narrowed down each time a word is used in the where line.
Try this:
foreach(string word in words)
{
var temp = word;
result = result.Where(x => x.subject.Contains(temp));
}
This is called (by ReSharper at least) "access to modified closure" - lambda expressions don't capture the value, they capture the entire variable. And the value of the variable word is changing with each iteration of the loop. So, since the Where() method is lazy-evaluated, by the time this sequence is consumed, the value of word is the last one in the sequence.
I hade some success by inverting the logic like this:
string[] words = enteredString.Split();
var results = from x in database.TableName
where words.Any(w => x.subject.Contains(w))
select x;
-- Edit
A more generic approach, for this kind of queries, would be:
class SearchQuery
{
public ICollection<string> Include { get; private set; }
public ICollection<string> Exclude { get; private set; }
}
[...]
SearchQuery query = new SearchQuery
{
Include = { "Foo" }, Exclude = { "Bar" }
}
var results = from x in database.Table
where query.Include.All(i => x.Subject.Contains(i)) &&
query.Exclude.All(i => !x.Subject.Contains(i))
select x;
This assumes that all words in query.Include must occur in Subject, if you want to find any subjects that have at least one of the words query.Include.All should be query.Include.Any
I've tested this with Entity Framework 4. Which will create a SQL query that applies all criteria in the database rather than in memory.
Here you go:
var result = from x in dataBase.tableName
select x;
string[] words = enteredString.Split();
result.Where(r => words.Any(w => r.Subject.Contains(w));
it can't do the thing - since with every word you are overwriting the previous result - you need to do something similar to:
List<object> AllResults = new List<object>();
foreach(string word in words)
{
var temp = word;
AllResults.AddRange (result.Where(x => x.subject.Contains(temp)).ToList());
}
Not sure what type your result type is hence the List<object>...

Categories