Projection: filling 2 arrays at once - c#

I thought I would be clever and write something like this code sample. It also seemed like a clean and efficient way to fill an array without enumerating a second time.
int i = 0;
var tickers = new List<string>();
var resultTable = results.Select(result => new Company
{
Ticker = tickers[i++] = result.CompanyTicker,
});
I don't really care for an alternative way to do this, because I can obviously accomplish this easily with a for loop. I'm more interested why this snippet doesn't work ie, tickers.Count = 0 after the code runs, despite there being 100+ results. Can anyone tell me why I'm getting this unexpected behavior?

You need to iterate your query, for example use .ToArray() or ToList() at the end. Currently you just created a query, it hasn't been executed yet.
You may see: LINQ and Deferred Execution
Plus, I believe your code should throw an exception, for IndexOutOfRange, since your List doesn't have any items.

This is due to LINQ's lazy execution. When the query gets executed (i.e. when you iterate over it), the list should have your results. An easy way to do this is to use ToArrayorToList.

Linq should ideally not have side affects.
I don't see what would prevent this from being a two step process:
var tickers = results.Select(r => r.CompanyTicker).ToList();
var resultTable = tickers.Select(t => new Company { Ticker = t }).ToList();

Related

Accessing info from CommonPart is extremely slow?

I'm new to Orchard and this must be something involving how the underlying data is stored.
The joining with CommonPart seems fast enough, like this:
var items = _contentManager.Query<MyUserPart, MyUserPartRecord>("someTypeName")
.ForVersion(VersionOptions.Published)
.Join<CommonPartRecord>().List().ToList();
That runs fairly fast. But whenever I try accessing some field in CommonPart, it runs extremely slow like this:
var items = _contentManager.Query<MyUserPart, MyUserPartRecord>("someTypeName")
.ForVersion(VersionOptions.Published)
.Join<CommonPartRecord>().List()
//access some field from commonpart
.Select(e => new {
User = e.As<CommonPart>().Owner.UserName
}).ToList();
The total data is just about 1200 items, and the time it needs is about 5 seconds, it cannot be slow like that. For a simple SQL query run in background, it should take a time of about 0.5 second or even less than.
I've tried investigating the Orchard's source code but found nothing that could be the issue. Everything seems to go into a blackbox at the accessing point of IContent. I hope someone here could give me some suggestion to diagnose and solve this hard issue. Thanks!
Update:
I've tried debugging a bit and seen that the following method is hit inside the DefaultContentManager:
ContentItem New(string contentType) { ... }
Well that's really interesting, the query is just asking for data without modifying, inserting and updating anything. But that method being hit shows that something's wrong here.
Update:
With #Bertrand Le Roy's comment, I've tried the following codes with QueryHint but looks like it does not change anything:
var items = _contentManager.Query<MyUserPart, MyUserPartRecord>("someTypeName")
.ForVersion(VersionOptions.Published)
.Join<CommonPartRecord>()
.WithQueryHints(new QueryHints().ExpandParts<CommonPart>())
.List()
//access some field from commonpart
.Select(e => new {
User = e.As<CommonPart>().Owner.UserName
}).ToList();
and this (without .Join)
var items = _contentManager.Query<MyUserPart, MyUserPartRecord>("someTypeName")
.ForVersion(VersionOptions.Published)
.WithQueryHints(new QueryHints().ExpandParts<CommonPart>())
.List()
//access some field from commonpart
.Select(e => new {
User = e.As<CommonPart>().Owner.UserName
}).ToList();
Accessing the Owner property from your Select causes the lazy loader in CommonPartHandler to ask the content manager to load the user content item: _contentManager.Get<IUser>(part.Record.OwnerId). This happens once per content item result from your query, so results in a select n+1 where n = 1200 according to your question.
There are at least two ways of avoiding that:
You can use HQL and craft a query that gives you everything you need up front in 1 operation.
You can make a 1st content manager query to get the set of owner ids, and then
make a second content manager query for those Ids and get everything you need with a total of 2 queries instead of 1201.

Orderby C# string record

I have the following orderby for a record read from db and then building a string.
The following code works fine but I know this can be improved any suggestion is highly appreciated.
result.Sites.ForEach(x =>
{
result.SiteDetails +=
string.Concat(ICMSRepository.Instance.GetSiteInformationById(x.SiteInformationId).SiteCode,
",");
});
//Sort(Orderby) sites by string value NOT by numerical order
result.SiteDetails = result.SiteDetails.Trim(',');
List<string> siteCodes = result.SiteDetails.Split(',').ToList();
var siteCodesOrder = siteCodes.OrderBy(x => x).ToArray();
string siteCodesSorted = string.Join(", ", siteCodesOrder);
result.SiteDetails = siteCodesSorted;
That's a little convoluted, yeah.
All we need to do is select out the SiteCode as string, sort with OrderBy, then join the results. Since String::Join has a variant that works with IEnumerable<string> we don't need to convert to array in the middle.
What we end up with is a single statement for assigning to your SiteDetails member:
result.SiteDetails = string.Join(", ",
result.Sites
.Select(x => $"{ICMSRepository.Instance.GetSiteInformationById(x.SiteInformationId).SiteCode}")
.OrderBy(x => x)
);
(Or you could use .ToString() instead of $"{...}")
This is the general process for most transforms in LINQ. Figure out what your inputs are, what you need to do with them, and how the outputs should look.
If you're using LINQ it's uncommon that you will have to build and manipulate intermediary lists unless you're doing something quite complex. For simple tasks like sorting a sequence of values there is almost never a reason to put them into transitional collections, since the framework handles all of that for you.
And the best part is it enumerates the collection one time to get the full set of data. No more loops to pull the data out, then process, then rebuild.
One thing that will improve performance is to get rid of the .ToList() and the .ToString. Neither is necessary and just take up extra processing time and memory.
Go with Corey's answer, which this is a variant of, but I thought I'd offer a slightly clearer way to express the query:
result.SiteDetails =
String.Join(", ",
from x in result.Sites
let sc = ICMSRepository.Instance.GetSiteInformationById(x.SiteInformationId).SiteCode
orderby sc
select sc);

Linq performance when diffing two lists using inner Contains

EDIT 01: I seem to have found a solution (click for the answer) that works for me. Going from and hour to merely seconds by pre-computing and then applying the .Except() extension method; but leaving this open if anyone else encounters this problem or if anyone else finds a better solution.
ORIGINAL QUESTION
I have the following set of queries, for differend kind of objects I'm staging from a source system so I can keep it in sync and make a delta stamp myself, as the sourcesystem doesn't provide it, nor can we build or touch it.
I get all data in memory an then for example perform this query, where I look for objects that don't exist any longer in the source system, but are present in the staging database - and thus have to be marked "deleted". The bottleneck is the first part of the LINQ query - on the .Contains(), how can I improve it's performance - mayve with .Except(), with a custom comparer?
Or should I best put them in a hashing list and them perform the compare?
The problem is though I have to have the staged objects afterwards to do some property transforms on them, this seemed the simplest solution, but unfortunately it's very slow on 20k objects
stagedSystemObjects.Where(stagedSystemObject =>
!sourceSystemObjects.Select(sourceSystemObject => sourceSystemObject.Code)
.Contains(stagedSystemObject.Code)
)
.Select(x =>
{
x.ActiveStatus = ActiveStatuses.Disabled;
x.ChangeReason = ChangeReasons.Edited;
return x;
})
.ToList();
Based on Yves Schelpe's answer. I made a little tweaks to make it faster.
The basic idea is to cancel the first two ToList and use PLINQ. See if this help
var stagedSystemCodes = stagedSystemObjects.Select(x => x.Code);
var sourceSystemCodes = sourceSystemObjects.Select(x => x.Code);
var codesThatNoLongerExistInSourceSystem = stagedSystemCodes.Except(sourceSystemCodes).ToArray();
var y = stagedSystemObjects.AsParallel()
.Where(stagedSystemObject =>
codesThatNoLongerExistInSourceSystem.Contains(stagedSystemObject.Code))
.Select(x =>
{
x.ActiveStatus = ActiveStatuses.Disabled;
x.ChangeReason = ChangeReasons.Edited;
return x;
}).ToArray();
Note that PLINQ may only work well for computational limited task with multi-core CPU. It could make things worse in other scenarios.
I have found a solution for this problem - which brought it down to mere seconds in stead of an hour for 200k objects.
It's done by pre-computing and then applying the .Except() extension method
So no longer "chaining" linq queries, or doing .Contains inside a method... but make it "simpler" by first projecting both to a list of strings, so that inner calculation doesn't have to happen over and over again in the original question's example code.
Here is my solution, that for now is satisfactory. However I'm leaving this open if anyone comes up with a refined/better solution!
var stagedSystemCodes = stagedSystemObjects.Select(x => x.Code).ToList();
var sourceSystemCodes = sourceSystemObjects.Select(x => x.Code).ToList();
var codesThatNoLongerExistInSourceSystem = stagedSystemCodes.Except(sourceSystemCodes).ToList();
return stagedSystemObjects
.Where(stagedSystemObject =>
codesThatNoLongerExistInSourceSystem.Contains(stagedSystemObject.Code))
.Select(x =>
{
x.ActiveStatus = ActiveStatuses.Disabled;
x.ChangeReason = ChangeReasons.Edited;
return x;
})
.ToList();

Removing from a collection while in a foreach with linq

From what I understand, this seems to not be a safe practice...
I have a foreach loop on my list object that I am stepping through. Inside that foreach loop I am looking up records by an Id. Once I have that new list of records returned by that Id I do some parsing and add them to a new list.
What I would like to do is not step through the same Id more than once. So my thought process would be to remove it from the original list. However, this causes an error... and I understand why.
My question is... Is there a safe way to go about this? or should I restructure my thought process a bit? I was wondering if anyone had any experience or thoughts on how to solve this issue?
Here is a little pseudocode:
_myList.ForEach(x =>
{
List<MyModel> newMyList = _myList.FindAll(y => y.SomeId == x.SomeId).ToList();
//Here is where I would do some work with newMyList
//Now I am done... time to remove all records with x.SomeId
_myList.RemoveAll(y => y.SomeId == x.SomeId);
});
I know that _myList.RemoveAll(y => y.SomeId == x.SomeId); is wrong, but in theory that would kinda be what I would be looking for.
I have also toyed around with the idea of pushing the used SomeId to an idList and then have it check each time, but that seems cumbersome and was wondering if there was a nicer way to handle what I am looking to do.
Sorry if i didnt explain this that well. If there are any questions, please feel free to comment and I will answer/make edits where needed.
First off, using ForEach in your example isn't a great idea for these reasons.
You're right to think there are performance downsides to iterating through the full list for each remaining SomeId, but even making the list smaller every time would still require another full iteration of that subset (if it even worked).
As was pointed out in the comments, GroupBy on SomeId organizes the elements into groupings for you, and allows you to efficiently step through each subset for a given SomeId, like so:
_myList.GroupBy(x => x.SomeId)
.Select(g => DoSomethingWithGroupedElements(g));
Jon Skeet has an excellent set of articles about how the Linq extensions could be implemented. I highly recommend checking it out for a better understanding of why this would be more efficient.
First of all, a list inside a foreach is immutable, you can't add or delete content, nor rewrite an element. There are a few ways you could handle this situation:
GroupBy
This is the method I would use. You can group your list by the property you want, and iterate through the IGrouping formed this way
var groups = list.GroupBy(x => x.yourProperty);
foreach(var group in groups)
{
//your code
}
Distinct properties list
You could also save properties in another list, and cycle through that list instead of the original one
var propsList = list.Select(x=>x.yourProperty).Distinct();
foreach(var prop in propsList)
{
var tmpList = list.Where(x=>x.yourProperty == prop);
//your code
}
While loop
This will actually do what you originally wanted, but performances may not be optimal
while(list.Any())
{
var prop = list.First().yourProperty;
var tmpList = list.Where(x=>x.yourProperty == prop);
//your code
list.RemoveAll(x=>x.yourProperty == prop);
}

When Where clause is used inside Linq statement produces different results than when used outside

I have the following statement:
List<string> tracks = new List<string> { "ABC", "DEF" };
var items = (from i in Agenda.AgendaSessions
select i).Where(p => p.Tracks.Any(s => tracks.Contains(s.Code)));
this returns all sessions which track contains either ABC or DEF, now when I rewrite the statement like the following, it returns All sessions regardless, as if the clause always yeilds into true, can anyone shed any light on this please?
var items = from i in Agenda.AgendaSessions
where i.Tracks.Any(s=> tracks.Contains(s.Code))
select i;
Update
if there are other clauses within the where, does that affect the results?
The two code snippets are equivalent, i.e. they should always produce the same results under all circumstances. Of course, that assumes that AgendaSessions, Tracks and .Contains() are what we expect them to be; if they are property getters/methods which have curious side-effects such as modifying the contents of tracks, then anything could happen.
In other words, without knowing what the rest of your code looks like, we cannot help you, because there is no semantic difference between the two code snippets.

Categories