Finding the most specific matching item

Finding the most specific matching item - c#

User input will be like 'BY1 2PX', which will split and stored into list like below
var items = new List<string> {'BY1 2PX', 'BY12', 'BY1', 'BY'};
I have source list of Products
public class Product
{
public string Name {get;set;}
public string Id {get;set;}
}
Below is a sample product list. There is no guarentee on ordering, it could be in any order.
var products = new List<Product>{
new Product("1", "BY1 2PX"),
new Product("2", "BY12"),
new Product("3", "BY1"),
new Product("4", "BY"),
new Product("5", "AA2 B2X"),
//...etc
}
my output should fetch 1, because its most specific match. If Id = 1 is not there then it should have fetched Id =2 like that...etc Could anyone help me in writing a linq query. I have tried something like below, is this fine?
var result = items.Select(x => products.FirstOrDefault(p =>
string.Equals(p.Name.Trim(), x, StringComparison.OrdinalIgnoreCase)))
.FirstOrDefault();

Well, you can use dictionary with its fast lookups :
var productsDict = products.ToDictionary(p => p.Name, p => p);
var key = items.FirstOrDefault(i => productsDict.ContainsKey(i));
Product result = key != null ? productsDict[key] : null;
Or as Tim suggested, if you have multiple elements with same names you can use Lookup :
var productsDict = products.ToLookup(p => p.Name, p => p);
var key = items.FirstOrDefault(i => productsDict.Contains(i));
Product result = key != null ? productsDict[key] : null;

If you want to select the best-matching product you need to select from the product- not the string-list. You could use following LINQ approach that uses List.FindIndex:
Product bestProduct = products
.Select(p => new {
Product = p,
Index = items.FindIndex(s => String.Equals(p.Name, s, StringComparison.OrdinalIgnoreCase))
})
.Where(x => x.Index != -1)
.OrderBy(x => x.Index) // ensures the best-match logic
.Select(x => x.Product)
.FirstOrDefault();
The Where ensures that you won't get an arbitrary product if there is no matching one.
Update:
A more efficient solution is this query:
Product bestProduct = items
.Select(item => products.FirstOrDefault(p => String.Equals(p.Name, item, StringComparison.OrdinalIgnoreCase)))
.FirstOrDefault(p != null); // ensures the best-match logic

You can try to find resemblance of words by using a specific algorythm called Levenshtein's distance algorythm, which is mostly used on "Did you mean 'word'" on most search websites.
This solution can be found here:
https://stackoverflow.com/a/9453762/1372750
Once you find the distance difference, you can measure which word or phrase is more "like" the searched one.

This will find for each product what is the "most specific" (the longest) match in items and will return the product with the longest match (regardless to order of either of the collections)
var result = products
.Select(p => new
{
Product = p,
MostSpecific = items.Where(item => p.Name.Contains(item))
.OrderByDescending(match => match.Length
.FirstOrDefault()
})
.Where(x => x.MostSpecific != null)
.OrderByDescending(x => x.MostSpecific.Length)
.Select(x => x.Product)
.FirstOrDefault();

Related

convert dictionary to list model

var entity = await _abcRepository.get(Id);
var X = entity.GroupBy(c => c.number).Where(grp => grp.Count() == 1).Take(10).ToList();
in images you see [0] and inside of it one more [0].
How can I get that model value.
X[0][0] is not working.
X.Value is not working.
I need to convert that dictionary to model.

Use .Select to normalize aggregation as per your wish.
var X = entity.GroupBy(c => c.number).Where(grp => grp.Count() == 1)
.Select(group => new { GroupKey = group.Key, Items = group.ToList() })
.Take(10).ToList();

You could try something like this:
var entity = await _abcRepository.get(Id);
var results = entity.GroupBy(c => c.number)
.Where(grp => grp.Count() == 1)
.Take(10)
.ToDictionary(grp => grp.Key, grp => grp.First());
Essentially, the lambda you pass in Where method certifies that the groups are created contains only one item. That being said, you can use the First on each group to fetch that one element.

How to avoid two embedded cycles in linq query C#

var listOfIds = new List<string>();
var allItems = IEnumerable<Info>();
foreach (var id in collectionIds)
{
listOfIds.AddRange(allItems
.Where(p => p.Data.FirstOrDefault(m => m.Key == "myId").Value == id)
.Select(x => x.Id));
}
I would like to avoid using AddRange but use only Add in this case and maybe use only FirstOrDefault in the place of where to avoid the last Select case.
Is this possible and if yes how?

Assuming your original code is giving you the correct data, specifically you are OK with:
Only concerned that the first item in p.Data contains a matching value and;
p.Data will always contains at least a single element.
Then this code will give you the same output:
var listOfIds = allItems
.Where(p => collectionIds.Contains(p.Data.First(m => m.Key == "myId").Value))
.ToList();
However, if you really do care that any value in p.Data matches, then this would be more appropriate:
var listOfIds = allItems
.Where(p => p.Data.Any(m => m.Key == "myId" &&
collectionIds.Contains(m.Value)))
.ToList();

How about this approach:
var listOfIds = new List<string>();
var allItems = IEnumerable<Info>();
var groupedAllItems = allItems.GroupBy(x => x.Data.FirstOrDefault(m => m.Key == "myId")?.Value ?? "MyIdNotFound");
//collectionIds should be of type HashSet<string> for the contains to be fast
listOfIds.AddRange(groupedAllItems.Where(x => collectionIds.Contains(x.Key)).SelectMany(x => x));

Group By Select New Object

I want to retrieve a list of games from my database and the count the number of games that a specified team won and lost and put it into an object with a win and loss property. I was trying this but it doesn't seem to be correct.
var winLoss = _teamService.GetGames()
.Where(x => x.Result != "Tie")
.GroupBy(x => x.Result)
.Select(x => new
{
Wins = x.Count(a => a.Result == "Hello"),
Losses = x.Count(a => a.Result != "Hello")
});
The return type for this is an IQueryable whereas I want it to just be a single object with a Win and Loss property.
Doing a GroupBy on the Results would put all the Wins for the current team into one group and then separate groups for each team they lost to in their own separate group.

Using a LINQ query you're going to end up with a collection, but what you care about is essentially a list of keys and values. I believe this will supply you the information you're looking for:
var winLoss = _teamService.GetGames()
.Where(x => x.Result != "Tie").GroupBy(x => x.Result)
.ToDictionary(e => e.Key, e => e.Count());
int wins = 0;
int losses = 0;
winLoss.TryGetValue("WIN", out wins);
winLoss.TryGetValue("LOSS", out losses);

I just went with two simple count calls to the SQL database.
Wins = _teamService.GetGames().Count(x => x.Result == "Name");
Loses = _teamService.GetGames().IsNotTie().Count(x => x.Result != "Name");
It's not 100% what I wanted but to do it in one call involved more complicated LINQ and therefore more complicated SQL.

You need to count the wins and losses for each team:
var winLoss = _teamService.GetGames()
.GroupBy(x => x.Team)
.Where(gg => gg.Key == "Hello")
.Select(gg => new
{
Wins = gg.Count(g => g.Result == "Hello"),
Losses = gg.Count(g => g.Result != "Hello")
});

Add FirstOrDefault() at the end of your linq query so you will get only the first element:
var winLoss = _teamService.GetGames()
.Where(x => x.Result != "Tie").GroupBy(x => x.Result);
var win = winLoss.Select(x => x.Count(a => a.Result == "Hello")).FirstOrDefault();
var loose = winLoss.Select(x => x.Count(a => a.Result != "Hello")).FirstOrDefault();

Can this query about finding missing keys be improved? (either SQL or LINQ)

I am developing a ASP.NET MVC website and is looking a way to improve this routine. It can be improved either at LINQ level or SQL Server level. I hope at best we can do it within one query call.
Here is the tables involved and some example data:
We have no constraint that every Key has to have each LanguageId value, and indeed the business logic does not allow such contraint. However, at application level, we want to warn the admin that a key is missing a/some language values. So I have this class and query:
public class LocalizationKeyWithMissingCodes
{
public string Key { get; set; }
public IEnumerable<string> MissingCodes { get; set; }
}
This method get the Key list, as well as any missing codes (for example, if we have en + jp + ch language codes, and the key only has values for en + ch, the list will contains jp):
public IEnumerable<LocalizationKeyWithMissingCodes> GetAllKeysWithMissingCodes()
{
var languageList = Utils.ResolveDependency<ILanguageRepository>().GetActive();
var languageIdList = languageList.Select(q => q.Id);
var languageIdDictionary = languageList.ToDictionary(q => q.Id);
var keyList = this.GetActive()
.Select(q => q.Key)
.Distinct();
var result = new List<LocalizationKeyWithMissingCodes>();
foreach (var key in keyList)
{
// Get missing codes
var existingCodes = this.Get(q => q.Active && q.Key == key)
.Select(q => q.LanguageId);
// ToList to make sure it is processed at application
var missingLangId = languageList.Where(q => !existingCodes.Contains(q.Id))
.ToList();
result.Add(new LocalizationKeyWithMissingCodes()
{
Key = key,
MissingCodes = missingLangId
.Select(q => languageIdDictionary[q.Id].Code),
});
}
result = result.OrderByDescending(q => q.MissingCodes.Count() > 0)
.ThenBy(q => q.Key)
.ToList();
return result;
}
I think my current solution is not good, because it make a query call for each key. Is there a way to improve it, by either making it faster, or pack within one query call?
EDIT: This is the final query of the answer:
public IQueryable<LocalizationKeyWithMissingCodes> GetAllKeysWithMissingCodes()
{
var languageList = Utils.ResolveDependency<ILanguageRepository>().GetActive();
var localizationList = this.GetActive();
return localizationList
.GroupBy(q => q.Key, (key, items) => new LocalizationKeyWithMissingCodes()
{
Key = key,
MissingCodes = languageList
.GroupJoin(
items,
lang => lang.Id,
loc => loc.LanguageId,
(lang, loc) => loc.Any() ? null : lang)
.Where(q => q != null)
.Select(q => q.Code)
}).OrderByDescending(q => q.MissingCodes.Count() > 0) // Show the missing keys on the top
.ThenBy(q => q.Key);
}

Another possibility, using LINQ:
public IEnumerable<LocalizationKeyWithMissingCodes> GetAllKeysWithMissingCodes(
List<Language> languages,
List<Localization> localizations)
{
return localizations
.GroupBy(x => x.Key, (key, items) => new LocalizationKeyWithMissingCodes
{
Key = key,
MissingCodes = languages
.GroupJoin( // check if there is one or more match for each language
items,
x => x.Id,
y => y.LanguageId,
(x, ys) => ys.Any() ? null : x)
.Where(x => x != null) // eliminate all languages with a match
.Select(x => x.Code) // grab the code
})
.Where(x => x.MissingCodes.Any()); // eliminate all complete keys
}

Here is the SQL logic to identify the keys that are missing "complete" language assignments:
SELECT
all.[Key],
all.LanguageId
FROM
(
SELECT
loc.[Key],
lang.LanguageId
FROM
Language lang
FULL OUTER JOIN
Localization loc
ON (1 = 1)
WHERE
lang.Active = 1
) all
LEFT JOIN
Localization loc
ON (loc.[Key] = all.[Key])
AND (loc.LanguageId = all.LanguageId)
WHERE
loc.[Key] IS NULL;
To see all keys (instead of filtering):
SELECT
all.[Key],
all.LanguageId,
CASE WHEN loc.[Key] IS NULL THEN 1 ELSE 0 END AS Flagged
FROM
(
SELECT
loc.[Key],
lang.LanguageId
FROM
Language lang
FULL OUTER JOIN
Localization loc
ON (1 = 1)
WHERE
lang.Active = 1
) all
LEFT JOIN
Localization loc
ON (loc.[Key] = all.[Key])
AND (loc.LanguageId = all.LanguageId);

your code seems to be doing a lot of database query and materialization..
in terms of LINQ, the single query would look like this..
we take the cartesian product of language and localization tables to get all combinations of (key, code) and then subtract the (key, code) tuples that exist in the relationship. this gives us the (key, code) combination that don't exist.
var result = context.Languages.Join(context.Localizations, lang => true,
loc => true, (lang, loc) => new { Key = loc.Key, Code = lang.Code })
.Except(context.Languages.Join(context.Localizations, lang => lang.Id,
loc => loc.LanguageId, (lang, loc) => new { Key = loc.Key, Code = lang.Code }))
.GroupBy(r => r.Key).Select(r => new LocalizationKeyWithMissingCodes
{
Key = r.Key,
MissingCodes = r.Select(kc => kc.Code).ToList()
})
.ToList()
.OrderByDescending(lkmc => lkmc.MissingCodes.Count())
.ThenBy(lkmc => lkmc.Key).ToList();
p.s. i typed this LINQ query on the go, so let me know if it has syntax issues..
the gist of the query is that we take a cartesian product and subtract matching rows.

Join an array of string with the result of an existing linq statement

As a follow up to my last question here:
Filtering a list of HtmlElements based on a list of partial ids
I need to take this statement:
doc.All.Cast<HtmlElement>()
.Where(x => x.Id != null)
.Where(x => ids
.Any(id => x.Id.Contains(id))).ToList();
and join it with an array of strings called fields. Assuming the array and list will have the same amount of elements each and line up correctly. I tried using Zip() but thought I might need to use an additional linq statement to make it work.

Assuming that fieldList[0] and IdList[0] corresponding to each other, you can do the following:
var IdList = doc.All.Cast<HtmlElement>()
.Where(x => x.Id != null)
.Where(x => ids
.Any(id => x.Id.Contains(id))).ToList();
var resultList = fieldList
.Select( (item, index) => new { Field = item, Id = IdList[index] })
.ToDictionary(x => x.Id, x => x.Field);

You have mentioned it already, you can use Enumerable.Join:
var joined = from id in fields
join ele in elements on id equals ele.Id
select new { Element = ele, ID = id };
var dict = joined.ToDictionary(x => x.ID, x => x.Element);
I've presumed that you want to join them via ID. I've also presumed that the string[] contains only unique ID's. Otherwise you need to use Distinct.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Finding the most specific matching item - c#

Related

convert dictionary to list model

How to avoid two embedded cycles in linq query C#

Group By Select New Object

Can this query about finding missing keys be improved? (either SQL or LINQ)

Join an array of string with the result of an existing linq statement

Categories

Resources