Orderby C# string record - c#

I have the following orderby for a record read from db and then building a string.
The following code works fine but I know this can be improved any suggestion is highly appreciated.
result.Sites.ForEach(x =>
{
result.SiteDetails +=
string.Concat(ICMSRepository.Instance.GetSiteInformationById(x.SiteInformationId).SiteCode,
",");
});
//Sort(Orderby) sites by string value NOT by numerical order
result.SiteDetails = result.SiteDetails.Trim(',');
List<string> siteCodes = result.SiteDetails.Split(',').ToList();
var siteCodesOrder = siteCodes.OrderBy(x => x).ToArray();
string siteCodesSorted = string.Join(", ", siteCodesOrder);
result.SiteDetails = siteCodesSorted;

That's a little convoluted, yeah.
All we need to do is select out the SiteCode as string, sort with OrderBy, then join the results. Since String::Join has a variant that works with IEnumerable<string> we don't need to convert to array in the middle.
What we end up with is a single statement for assigning to your SiteDetails member:
result.SiteDetails = string.Join(", ",
result.Sites
.Select(x => $"{ICMSRepository.Instance.GetSiteInformationById(x.SiteInformationId).SiteCode}")
.OrderBy(x => x)
);
(Or you could use .ToString() instead of $"{...}")
This is the general process for most transforms in LINQ. Figure out what your inputs are, what you need to do with them, and how the outputs should look.
If you're using LINQ it's uncommon that you will have to build and manipulate intermediary lists unless you're doing something quite complex. For simple tasks like sorting a sequence of values there is almost never a reason to put them into transitional collections, since the framework handles all of that for you.
And the best part is it enumerates the collection one time to get the full set of data. No more loops to pull the data out, then process, then rebuild.

One thing that will improve performance is to get rid of the .ToList() and the .ToString. Neither is necessary and just take up extra processing time and memory.

Go with Corey's answer, which this is a variant of, but I thought I'd offer a slightly clearer way to express the query:
result.SiteDetails =
String.Join(", ",
from x in result.Sites
let sc = ICMSRepository.Instance.GetSiteInformationById(x.SiteInformationId).SiteCode
orderby sc
select sc);

Related

Linq performance when diffing two lists using inner Contains

EDIT 01: I seem to have found a solution (click for the answer) that works for me. Going from and hour to merely seconds by pre-computing and then applying the .Except() extension method; but leaving this open if anyone else encounters this problem or if anyone else finds a better solution.
ORIGINAL QUESTION
I have the following set of queries, for differend kind of objects I'm staging from a source system so I can keep it in sync and make a delta stamp myself, as the sourcesystem doesn't provide it, nor can we build or touch it.
I get all data in memory an then for example perform this query, where I look for objects that don't exist any longer in the source system, but are present in the staging database - and thus have to be marked "deleted". The bottleneck is the first part of the LINQ query - on the .Contains(), how can I improve it's performance - mayve with .Except(), with a custom comparer?
Or should I best put them in a hashing list and them perform the compare?
The problem is though I have to have the staged objects afterwards to do some property transforms on them, this seemed the simplest solution, but unfortunately it's very slow on 20k objects
stagedSystemObjects.Where(stagedSystemObject =>
!sourceSystemObjects.Select(sourceSystemObject => sourceSystemObject.Code)
.Contains(stagedSystemObject.Code)
)
.Select(x =>
{
x.ActiveStatus = ActiveStatuses.Disabled;
x.ChangeReason = ChangeReasons.Edited;
return x;
})
.ToList();
Based on Yves Schelpe's answer. I made a little tweaks to make it faster.
The basic idea is to cancel the first two ToList and use PLINQ. See if this help
var stagedSystemCodes = stagedSystemObjects.Select(x => x.Code);
var sourceSystemCodes = sourceSystemObjects.Select(x => x.Code);
var codesThatNoLongerExistInSourceSystem = stagedSystemCodes.Except(sourceSystemCodes).ToArray();
var y = stagedSystemObjects.AsParallel()
.Where(stagedSystemObject =>
codesThatNoLongerExistInSourceSystem.Contains(stagedSystemObject.Code))
.Select(x =>
{
x.ActiveStatus = ActiveStatuses.Disabled;
x.ChangeReason = ChangeReasons.Edited;
return x;
}).ToArray();
Note that PLINQ may only work well for computational limited task with multi-core CPU. It could make things worse in other scenarios.
I have found a solution for this problem - which brought it down to mere seconds in stead of an hour for 200k objects.
It's done by pre-computing and then applying the .Except() extension method
So no longer "chaining" linq queries, or doing .Contains inside a method... but make it "simpler" by first projecting both to a list of strings, so that inner calculation doesn't have to happen over and over again in the original question's example code.
Here is my solution, that for now is satisfactory. However I'm leaving this open if anyone comes up with a refined/better solution!
var stagedSystemCodes = stagedSystemObjects.Select(x => x.Code).ToList();
var sourceSystemCodes = sourceSystemObjects.Select(x => x.Code).ToList();
var codesThatNoLongerExistInSourceSystem = stagedSystemCodes.Except(sourceSystemCodes).ToList();
return stagedSystemObjects
.Where(stagedSystemObject =>
codesThatNoLongerExistInSourceSystem.Contains(stagedSystemObject.Code))
.Select(x =>
{
x.ActiveStatus = ActiveStatuses.Disabled;
x.ChangeReason = ChangeReasons.Edited;
return x;
})
.ToList();

Formatting What's returned from LINQ

So, I currently have a LINQ query
BStops.JPPlatforms.Platform
.Where(Stop => Stop.Name.ToLower().Contains(SearchBox.Text.ToLower()))
.Select(Stop => new { Stop.Name, Stop.PlatformNo })
.ToList();
Which is returning the data I expect it to, the property Platform contains a list of stops that hold another class with properties I want to access to such as Name, PlatformNo and PlatforTag, now the killer for me is two things, one is less important at the moment but if you can help it would be great!
So I want to format this output so when you search it doesn't have all this garbled stuff around it, I would prefer it to be like
Annex Rd near Railway (50643)
I've tried adjusting my query to be like
BStops.JPPlatforms.Platform
.Where(Stop => Stop.Name.ToLower().ToString().Contains(SearchBox.Text.ToLower().ToString()))
.Select(Stop => String.Format("{0} ({1})",new { Stop.Name, Stop.PlatformNo }))
.ToList();
But that causes it to crash back to a unhanded exception, for the life of me I can't seem to figure this out, as for the second part. I'd also like my LINQ query to search both the Name and PlatformNo properties I've already tried the logical || but it crashes back to an unhanded exception and I don't know enough about LINQ to figure out why, any help at this point would be great :).
Changing your LINQ query to this would solve the problem.
BStops.JPPlatforms.Platform.Where(Stop => Stop.Name.ToLower()
.Contains(SearchBox.Text.ToLower()))
.Select(Stop => new
{
StopAddress = $"{Stop.Name} {Stop.PlatformNo}"
})
.ToList();
The Where clause is not performant. The Text.ToLower should be done outside of the Linq. Also ToLower returns a string, so there is no need go call ToString
The Select should not create a new object.
var text = SearchBox.Text.ToLower();
BStops.JPPlatforms.Platform
.Where(stop => stop.Name.ToLower().Contains(text))
.Select(stop => String.Format("{0} ({1})", stop.Name, stop.PlatformNo))
.ToList();

Case-insensitive "contains" in Linq

I have a mvc project which I use linq in it.
In my database there is some records, for example "Someth ing","SOmeTH ing","someTh ing","SOMETH ING","someTH ING"
I want to do this:
SELECT * FROM dbo.doc_dt_records WHERE name LIKE '%' + #records.Name + '%'
However if I run this code, list.Count returns 0. What should I do?
records.Name = "someth ing"; //for example
var rec = db.Records.ToList();
var lists = rec.Where(p => p.Name.Contains(records.Name)).ToList();
if (lists.Count > 0)
{
// do sthng
}
Thanks for your helps...
the easy way is to use ToLower() method
var lists = rec.Where(p => p.Name.ToLower().Contains(records.Name.ToLower())).ToList();
a better solution (based on this post: Case insensitive 'Contains(string)')
var lists = rec.Where(p =>
CultureInfo.CurrentCulture.CompareInfo.IndexOf
(p.Name, records.Name, CompareOptions.IgnoreCase) >= 0).ToList();
That is totally not a LINQ issue.
Case sensitiivty on the generated SQL depends on the collation relevant for the table. Which in your case likely is case insensitive.
You would get the same result from any SQL you emit.
use IndexOf and StringComparison.OrdinalIgnoreCase:
p.Name.IndexOf(records.Name, StringComparison.OrdinalIgnoreCase) >= 0;
You can create an extension function like this:
public static bool Contains(this string src, string toCheck, StringComparison comp)
{
return src.IndexOf(toCheck, comp) >= 0;
}
To my understanding, this question does not have an unambiguous answer. The matter is that the best way of doing this depends on details which aren't provided in the question. For instance, what exact ORM do you use and what precise DB server you are connected to. For example, if you use Entity Framework against MS SQL Server, you better do not touch your LINQ expression at all. All you need to do is to set the case-insensitive collation on the database/table/column you compare your string with. That will do the trick much better than any change of your LINQ expression. The matter is that when LINQ is translated to SQL, it better be the straight comparison of the column having case-insensitive collation to your string than anything else. Just because it usually works quicker and it is the natural way to do the trick.
You do not want the final query to be something like:
SELECT *
FROM AspNetUsers U
WHERE UPPER(U.Name) LIKE '%SOMETHING%';
It is much better to come up with something like:
SELECT *
FROM AspNetUsers U
WHERE U.Name LIKE '%SOMETHING%';
But with a case-insensitive collation of [Name] column. The difference is that if you have let's say index containing [Name] column, the second query might use it, the first one would do the full scan of the table anyway.
So if let's say records references to DBSet<T> and the record is just one object of type T. You code would be like this:
var lists = records.Where(p => p.Name.Contains(record.Name)).ToList();
And you do the rest on SQL-server. Or if all you need to know is there any value in the list and do not need these values, it would be even better to do like this:
if (records.Any(p => p.Name.Contains(record.Name)))
{
// do something
}
Generally speaking, if you use any sort of ORM connected to any sort of SQL server, you better do case-insensitivity by setting up appropriate parameters of your server/database/table/column. And only if it is impossible or by far too expensive, you consider other possibilities. Otherwise, you might bang into some unexpected and very unpleasant behaviour. For instance, Entity Framework Core 2.x if it cannot translate your LINQ expression straightway into SQL query, is doing different tricks replacing server-side operations with client-side ones. So you can end up with a solution which fetches all data from the table to the client and filter it there. It might be quite a problem if your table is big enough.
As for the situation when LINQ query is processed locally, there are a lot of ways to do the trick. My favourite one is the next:
var lists = records.Where(p => p.Name
.Contains(record.Name, StringComparison.InvariantCultureIgnoreCase))
.ToList();
try this
var lists = rec.Where(p => String.Equals(p.Name,records.Name,StringComparison.OrdinalIgnoreCase)).ToList();
refer here for documentation

linq: separate orderby and thenby statements

I'm coding through the 101 Linq tutorials from here:
http://code.msdn.microsoft.com/101-LINQ-Samples-3fb9811b
Most of the examples are simple, but this one threw me for a loop:
[Category("Ordering Operators")]
[Description("The first query in this sample uses method syntax to call OrderBy and ThenBy with a custom comparer to " +
"sort first by word length and then by a case-insensitive sort of the words in an array. " +
"The second two queries show another way to perform the same task.")]
public void Linq36()
{
string[] words = { "aPPLE", "AbAcUs", "bRaNcH", "BlUeBeRrY", "ClOvEr", "cHeRry", "b1" };
var sortedWords =
words.OrderBy(a => a.Length)
.ThenBy(a => a, new CaseInsensitiveComparer());
// Another way. TODO is this use of ThenBy correct? It seems to work on this sample array.
var sortedWords2 =
from word in words
orderby word.Length
select word;
var sortedWords3 = sortedWords2.ThenBy(a => a, new CaseInsensitiveComparer());
No matter which combination of words I throw at it the length is always the first ordering criteria ... even though I don't know how the second statement (with no orderby!) knows what the original orderby clause was.
Am I going crazy? Can anyone explain how Linq "remembers" what the original ordering was?
The return type of OrderBy is not IEnumerable<T>. It's IOrderedEnumerable<T>. This is an object that "remembers" all of the orderings it's been given, and as long as you don't call another method that turns the variable back into an IEnumerable it will retain that knowledge.
See Jon Skeets wonderful blog series Eduling in which he re-implements Linq-to-objects for more info. The key entries on OrderBy/ThenBy are:
IOrderedEnumerable
OrderBy, OrderByDescending, ThenBy, ThenByDescending
This is because LINQ is lazy, the first i.e. all the evaluation only happens when you enumerate the sequence.. the expression tree that has been constructed gets executed.
Your question really doesn't make much sense on the surface because you're not considering the nature of the deferred execution. It doesn't "remember" in either case truthfully, it simply isn't executed until it's really needed. If you run over your examples in the debugger you will find that these generate identical (structurally anyway) statements. Consider:
var sortedWords =
words.OrderBy(a => a.Length)
.ThenBy(a => a, new CaseInsensitiveComparer());
You've explicitly told it to OrderBy, ThenBy. Each statement is stacked on until they're all complete, and the finally query is constructed to look like (psuedo):
Select from sorted words, order by length, order by comparer
Then once that is all ready to go it is executed and placed into sortedWords. Now consider:
var sortedWords2 =
from word in words
orderby word.Length // You're telling it to sort here
select word;
// Now you're telling it to ThenBy here
var sortedWords3 = sortedWords2.ThenBy(a => a, new CaseInsensitiveComparer());
And then once those queries are stacked up it will be executed. However, it WON'T be executed until you NEED them. sortedWords3 won't really have any value until you act on it because the need for it is deferred. So in both cases, you're basically saying to the compiler:
Wait until I'm done building my query
Select from source
Order by length
Then by comparer
Ok do your stuff.
Note: To sum up, LINQ doesn't "remember", it simply doesn't execute until you're done giving it instructions to execute. Then it stacks them up into a query and runs them all at once when they're needed.

How to use LINQ to SQL to create ranked search results?

I am looking for a way to use l2s to return ranked result based on keywords.
I would like to take a keyword and be able to search the table for that keyword using .contains(). The trick that I haven't been able to figure out is how to get a count of how many times that keyqord appears, and then .OrderByDescending() based on that count.
So if i had some thing like:
string keyword = "SomeKeyword";
IQueryable<Article> searchResults = from a in GenesisRepository.Article
where a.Body.Contains(keyword)
select a;
What is the best way to order searchResults based on the number of times keyword appears in a.Body?
Thanks for any help.
try inserting order by a.Body.Split(' ').Count(w=>w == keyword). That should allow you to see that the concept works. However, I STRONGLY recommend that the final version include this as part of the select projection, possibly using a key-value pair, and order by the property name:
string keyword = "SomeKeyword";
//EDIT: restructured query to force the ordering to be done on the projection,
//not the source.
IQueryable<Article> searchResults = (from a in GenesisRepository.Article
where a.Body.Contains(keyword)
select new KeyValuePair<int, Article>(
a.Body.Split(' ').Count(w=>w == keyword), a))
.OrderBy(kvp=>kvp.Key);
The reason is performance; the Split().Count() method chain is linear-complexity, and will be evaluated for every comparison of two values, making the overall sort N^2logN complexity (slow).
EDIT: Also, understand that a.Body.Contains(keyword) will not search by whole words, and so will return articles that contain "SomeKeywordLongerThanSearch" and "ThisIsSomeKeyword" as well as "SomeKeyword". You can avoid this with a Regex match on the pattern "\bSomeKeyword\b", which will only match instances of SomeKeyword with a word boundary immediately before and after.
This is a little hack I came up with, pretty simple but definitely not a "best practices" one.
IQueryable<Article> searchResults = from a in GenesisRepository.Article
where a.Body.Contains(keyword)
orderby a.Body.Split(new string[] { keyword }, StringSplitOptions.RemoveEmptyEntries).Count() descending
select a;
Maybe this will work...
IQueryable<Article> searchResults = from a in GenesisRepository.Article
where a.Body.Contains(keyword)
select a;
searchResults.OrderByDescending(s => Regex.Matches(a.Body, keyword).Count);

Categories