Optimizing a value based search algorithm with LINQ - c#

I want to build a value based search algorithm. What this means is that once I'm given a list of words I would like to search for entries on the database using those words. However depending on what column/property those words match, I want to alter the value of results returned.
Here is a lazy algorithm that achieves that but is very slow.
//search only active entries
var query = (from a in db.Jobs where a.StatusId == 7 select a);
List<SearchResult> baseResult = new List<SearchResult>();
foreach (var item in search)
{
//if the company title is matched, results are worth 5 points
var companyMatches = (from a in query where a.Company.Name.ToLower().Contains(item.ToLower()) select new SearchResult() { ID = a.ID, Value = 5 });
//if the title is matched results are worth 3 points
var titleMatches = (from a in query where a.Title.ToLower().Contains(item.ToLower()) select new SearchResult() { ID = a.ID, Value = 3 });
//if text within the body is matched results are worth 2 points
var bodyMatches = (from a in query where a.FullDescription.ToLower().Contains(item.ToLower()) select new SearchResult() { ID = a.ID, Value = 2 });
//all results are then added
baseResult = baseResult.Concat(companyMatches.Concat(titleMatches).Concat(bodyMatches)).ToList();
}
// the value gained for each entry is then added and sorted by highest to lowest
List<SearchResult> result = baseResult.GroupBy(x => x.ID).Select(p => new SearchResult() { ID = p.First().ID, Value = p.Sum(i => i.Value) }).OrderByDescending(a => a.Value).ToList<SearchResult>();
//the query for the complete result set is built based on the sorted id value of result
query = (from id in result join jbs in db.Jobs on id.ID equals jbs.ID select jbs).AsQueryable();
I'm looking for ways to optimize this. I am new to LINQ query so I was hoping I could get some help. If there is away I can create the LINQ query that achieves all of this in one go instead of checking for company name and then title and the body text and bringing it all together and creating a sorted list and running it again against the database to get full listing it would be great.

It's best if I study the problem first. My previous answer was optimizing the wrong thing. The primary problem here is going over the results list multiple times. We can change that:
foreach (var a in query)
{
foreach (var item in search)
{
itemLower = item.ToLower();
int val = 0;
if (a.Company.Name.ToLower.Contains(itemLower))
baseResult.Add(new SearchResult { ID = a.ID, Value = 5});
if (a.Title.ToLower.Contains(itemLower))
baseResult.Add(new SearchResult { ID = a.ID, Value = 3});
if (a.FullDescription.ToLower().Contains(itemLower))
baseResult.Add(new SearchResult { ID = a.ID, Value = 2});
}
}
After that, you have your base result and you can continue with your processing.
That reduces it to a single query rather than three queries for each search item.
I wasn't sure if you wanted unique items in your baseResult, or if there was some reason you allowed duplicates and then used the sum of the values to order them. If you want unique items, you could make baseResult a Dictionary, with the ID as the key.
Edit after comment
You could reduce the number of items in the list by doing:
int val = 0;
if (a.Company.Name.ToLower.Contains(itemLower))
val += 5;
if (a.Title.ToLower.Contains(itemLower))
val += 3;
if (a.FullDescription.ToLower().Contains(itemLower))
val += 2;
if (val > 0)
{
baseResult.Add(new SearchResult { ID = a.ID, Value = val });
}
That won't eliminate duplicates altogether, though, because the company name could match one search term, and the title might match another search term. But it would reduce the list somewhat.

Thanks to Jim's answer and some tweeking on my side I managed to reduce the time it takes to complete the search by 80%
Here is the final solution:
//establish initial query
var queryBase = (from a in db.Jobs where a.StatusId == 7 select a);
//instead of running the search against all of the entities, I first take the ones that are possible candidates, this is done through checking if they have any of the search terms under any of their columns. This is the one and only query that will be run against the database
if (search.Count > 0)
{
nquery = nquery.Where(job => search.All(y => (job.Title.ToLower() + " " + job.FullDescription.ToLower() + " " + job.Company.Name.ToLower() + " " + job.NormalLocation.ToLower() + " " + job.MainCategory.Name.ToLower() + " " + job.JobType.Type.ToLower()).Contains(y))); // + " " + job.Location.ToLower() + " " + job.MainCategory.Name.ToLower() + " " + job.JobType.Type.ToLower().Contains(y)));
}
//run the query and grab a list of baseJobs
List<Job> baseJobs = nquery.ToList<Job>();
//A list of SearchResult object (these object act as a container for job ids and their search values
List<SearchResult> baseResult = new List<SearchResult>();
//from here on Jim's algorithm comes to play where it assigns points depending on where the search term is located and added to a list of id/value pair list
foreach (var a in baseJobs)
{
foreach (var item in search)
{
var itemLower = item.ToLower();
if (a.Company.Name.ToLower().Contains(itemLower))
baseResult.Add(new SearchResult { ID = a.ID, Value = 5 });
if (a.Title.ToLower().Contains(itemLower))
baseResult.Add(new SearchResult { ID = a.ID, Value = 3 });
if (a.FullDescription.ToLower().Contains(itemLower))
baseResult.Add(new SearchResult { ID = a.ID, Value = 2 });
}
}
List<SearchResult> result = baseResult.GroupBy(x => x.ID).Select(p => new SearchResult() { ID = p.First().ID, Value = p.Sum(i => i.Value) }).OrderByDescending(a => a.Value).ToList<SearchResult>();
//the data generated through the id/value pair list are then used to reorder the initial jobs.
var NewQuery = (from id in result join jbs in baseJobs on id.ID equals jbs.ID select jbs).AsQueryable();

Related

LINQ Distinct Count returns 1

Im making a delegate Func inside my method to check if schedualCode fits in a certain place in a list, where the limit is 3.
i want to count the distinct values of schedualCode in my list. my problem is that schedualCodeCount returns 1. when it should return 2.
this is my code
Func<string, bool> CheckTimeLimit = delegate (string schedualCode)
{
// check enrolled period count (where limit is 3)
//int periodCount = currentEnrollments.GroupBy(t => t.Times)
//.Select(t => t.Key.Select(key => key.PeriodCode == time.PeriodCode).Distinct()).Count();
var allTimes = currentEnrollments.SelectMany(key => key.Times).ToList();
List<string> schedualCodes = allTimes.Where(key => key.SchedualCode == schedualCode && key.ViewOnSchedual)
.Select(key => key.SchedualCode).ToList();
//schedualCodes List returns a list of count = 2 , and 2 strings exactly the same of value = "A1"
// Getting the distinct count of "A1"
int schedualCodeCount = schedualCodes.Distinct().Count();
// schedualCodeCount gets the value = 1, where it should be 2
// time fits if true
return schedualCodeCount < 3;
};
You are misunderstanding what Distinct does. You have two identical items, Distinct will remove the duplicates leaving you with 1. What you probably want to do is Group and then get the counts of each group.
For example:
var list = new List<string>() { "A1", "A1" };
Console.WriteLine(list.Count); // 2, obviously
var distinct = list.Distinct(); // select only the *distinct* values
Console.WriteLine(distinct.Count()); // 1 - because there is only 1 distinct value
var groups = list.GroupBy(s => s); // group your list (there will only be one
// in this case)
foreach (var g in groups) // for each group
{
// Display the number of items with the same key
Console.WriteLine(g.Key + ":" + g.Count());
}

Listing after implementing ranking skipping numbers

I am trying to achieve ranking functionality as below:
Name Points rank
ram 9 1
kamal 9 1
preet 8 2
lucky 7 3
kishan 6.5 4
devansh 6 5
neha 6 5
I have used below code to achieve this:
finalResult = finalResult.OrderByDescending(i => i.points).ThenBy(i => i.academy).ToList();
finalResult = finalResult.AsEnumerable() // Client-side from here on
.Select((player, index) => new RankingEntity()
{
competitorid = player.competitorid,
firstname = player.firstname,
lastname = player.lastname,
academy = player.academy,
points = player.points,
place = player.place,
eventId = player.eventId,
eventname = player.eventname,
categoryname = player.categoryname,
Rank = index + 1
}).ToList();
var t = (from i in finalResult
let rank = finalResult.First(x => x.points == i.points)
select new
{
Col1 = i,
Rank = rank.Rank
}).ToList();
List<RankingEntity> ttt = new List<RankingEntity>();
foreach (var item in t)
{
var a = item.Col1;
var row = new RankingEntity();
row.competitorid = a.competitorid;
row.firstname = a.firstname;
row.lastname = a.lastname;
row.academy = a.academy;
row.points = a.points;
row.place = a.place;
row.eventId = a.eventId;
row.eventname = a.eventname;
row.categoryname = a.categoryname;
row.Rank = item.Rank;
ttt.Add(row);
}
And i am getting result like below:
Please help what i am doing wrong.
What you are trying to achieve is a ranking of a "group" so group the results by the points and then order the groups. For each item in the group give the same rank.
finalResult.GroupBy(item => item.Points) // Group by points
.OrderDescendingBy(g => g.Key) // Order the groups
.Select((g, index) => new { Data = g, GroupRank = index + 1}) // Rank each group
.SelectMany(g => g.Data.Select(item => new RankingEntity
{
/* properties of each item */
Rank = g.GroupIndex
}); // Flatten groups and set for each item the group's ranking
The problem in your method is that you give the ranking for individual items and not the group. Then when you retrieve the rank for the group (from i in finalResult let rank = finalResult.First(x => x.points == i.points)...) you actually set for each item in the group the ranking of one of the elements in it. Therefore, if you first got the last item of the group - that will be the Rank value of each item in it.
Also notice that in the first line of your code you use ToList. Therefore there is not need to use AsEnumerable in the line under it - it is already a materialized in memory collection.

Get database row from selected item in Combobox

I have a Combobox which gets its data from my database.
var people = (from x in db.Person select new { Value = x.Id, Names = x.Namn + " " + x.EfterNamn }).ToList();
cbpeople.DataSource = people;
cbpeople.DisplayMember = "Names";
cbpeople.ValueMember = "Value";
cbpeople.SelectedIndex = -1;
And I have the SelectedIndex function
int id = cbpeople.SelectedIndex + 1;
string namn = (from x in db.Person where x.Id == id select x.Namn).ToString();
lblNamn.Text = namn;
So as you can see, I'm trying to have it select the information from the same row in the database and put them in labels. (The "cbpeople.SelectedIndex + 1;" is because I had no other way to get the ID from the SelectedValue).
But all it prints out is this long thing instead of the Name (on the label)
"SELECT \r\n [Extent1].[Namn] AS [Namn]\r\n FROM [dbo].[Person] AS [Extent1]\r\n WHERE [Extent1].[Id] = #p__linq__0"
What am I doing wrong?
You calling ToString() over IQueryable object. Of course, it will return it's SQL representation. To execute query you can do this:
string namn = (from x in db.Person where x.Id == id select x.Namn).Single();

How to Group the elements within group in Linq

I have an Employee Collection and i want to filter in such a way that first 2 columns only should be filtered and the third column values should be appended and the final result should be in a single row
The below is my code
List<Employe> Employeecollection = new List<Employe>();
Employeecollection.Add(new Employe("Employee1", "Dept1","Language1"));
Employeecollection.Add(new Employe("Employee2", "Dept2", "Language2"));
Employeecollection.Add(new Employe("Employee3", "Dept3", "Language3"));
Employeecollection.Add(new Employe("Employee3", "Dept3", "Language3"));
Employeecollection.Add(new Employe("Employee1", "Dept1", "Language2"));
foreach (Employe item in Employeecollection.GroupBy(x => new { fName = x.EmpName, lName = x.EmpDept, mName = x.KnownLanguages }).Select(g => g.First()))
{
Console.WriteLine(item.EmpName + " " + item.EmpDept + " " + item.KnownLanguages);
}
but i would like to display the results like below
Employee1 Dept1 Language1,Language2
Employee2 Dept2 Language2
Employee3 Dept3 Language3
Group employees by name and department, then select joined string of known languages for each employee:
from e in Employeecollection
group e by new { e.Name, e.EmpDept } into g
select new {
g.Key.Name,
g.Key.EmpDept,
Languages = String.Join(",", g.Select(x => x.KnownLanguages))
}
If you want results as single row, then do following projection instead:
select String.Format("{0} {1} {2}",
g.Key.Name, g.Key.EmpDept, String.Join(",", g.Select(x => x.KnownLanguages)))
BTW I think its a weird property name KnownLanguages for property which holds single language
You don't want to group on KnownLanguages. It shouldn't be included in your group selector. The group selector should select all of thing that you want to be the same for all items in a group.
You also need to change how you print your results. Get the common values for each of the items in a group through the Key, and the other values through iterating the group itself.
var query = Employeecollection.GroupBy(x => new
{
x.EmpName,
x.EmpDept
};
foreach (var group in query)
{
string languages = string.Join(", ",
group.Select(employee => employee.KnownLanguages)
.Distinct());
Console.WriteLine(group.Key.EmpName + " " + group.Key.EmpDept + " "
+ languages;
}

Nested Group by LINQ

I am unable to solve this problem with the LINQ Query.
So we have the table structure as follows:
Id || bug_category || bug_name || bug_details || bug_priority
I want to group by bug_category first. For each bug_category, I want to in turn group by bug__priority.
So basically I want something like :
bug_category = AUDIO :: No of BUGS --> Critical = 3, Medium = 2 and Low = 7 bugs.
bug_category = VIDEO :: No of BUGS --> Critical = 5, Medium = 1 and Low = 9 bugs.
The below query returns all unique combinations of category AND customer_priority:
(where RawDataList is simply a List of data which has the above mentioned structure )
var ProceesedData = from d in RawDataList
group d by new { d.bug_category, d.bug_priority } into g
select new
{
g.Key.bug_category,
g.Key.bug_priority
};
The below query returns the category followed by a list of records in that category:
var ProceesedData = from d in RawDataList
group d by d.bug_category into g
select new { g.Key, records = g
};
But I am unable to proceed further as ProcessedData(the return variable) is an unknown type. Any thoughts on this?
This is an easier way to accomplish nested groupings. I've tested it for in memory collections, whether or not your particular DB provider will handle it well might vary, or whether it performs well is unknown.
Assuming you had two properties, and wanted to group by both State and Country:
var grouped = People
.GroupBy(l => new { l.State, l.Country})//group by two things
.GroupBy(l=> l.Key.Country)//this will become the outer grouping
foreach(var country in grouped)
{
foreach(var state in country)
{
foreach(var personInState in state)
{
string description = $"Name: {personInState.Name}, State: {state.StateCode}, Country: {country.CountryCode}";
...
}
}
}
I suspect you want (names changed to be more idiomatic):
var query = from bug in RawListData
group bug by new { bug.Category, bug.Priority } into grouped
select new {
Category = grouped.Key.Category,
Priority = grouped.Key.Priority,
Count = grouped.Count()
};
Then:
foreach (var result in query)
{
Console.WriteLine("{0} - {1} - {2}",
result.Category, result.Priority, result.Count);
}
Alternatively (but see later):
var query = from bug in RawListData
group bug by new bug.Category into grouped
select new {
Category = grouped.Category,
Counts = from bug in grouped
group bug by grouped.Priority into g2
select new { Priority = g2.Key, Count = g2.Count() }
};
foreach (var result in query)
{
Console.WriteLine("{0}: ", result.Category);
foreach (var subresult in result.Counts)
{
Console.WriteLine(" {0}: {1}", subresult.Priority, subresult.Count);
}
}
EDIT: As noted in comments, this will result in multiple SQL queries. To obtain a similar result structure but more efficiently you could use:
var dbQuery = from bug in RawListData
group bug by new { bug.Category, bug.Priority } into grouped
select new {
Category = grouped.Key.Category,
Priority = grouped.Key.Priority,
Count = grouped.Count()
};
var query = dbQuery.ToLookup(result => result.Category,
result => new { result.Priority, result.Count };
foreach (var result in query)
{
Console.WriteLine("{0}: ", result.Key);
foreach (var subresult in result)
{
Console.WriteLine(" {0}: {1}", subresult.Priority, subresult.Count);
}
}
I think you're searching something like that:
var processedData =
rawData.GroupBy(bugs => bugs.bug_category,
(category, elements) =>
new
{
Category = category,
Bugs = elements.GroupBy(bugs => bugs.bug_priority,
(priority, realbugs) =>
new
{
Priority = priority,
Count = realbugs.Count()
})
});
foreach (var data in processedData)
{
Console.WriteLine(data.Category);
foreach (var element in data.Bugs)
Console.WriteLine(" " + element.Priority + " = " + element.Count);
}
You can do it like this
var retList = (from dbc in db.Companies
where dbc.IsVerified && dbc.SellsPCBs && !dbc.IsDeleted && !dbc.IsSpam && dbc.IsApproved
select new
{
name = dbc.CompanyName,
compID = dbc.CompanyID,
state = dbc.State,
city = dbc.City,
businessType = dbc.BusinessType
}).GroupBy(k => k.state).ToList();
List<dynamic> finalList = new List<dynamic>();
foreach (var item in retList)
{
finalList.Add(item.GroupBy(i => i.city));
}

Categories