Reusing a where array

Reusing a where array - c#

Say I needed to do a whole bunch of queries from various tables like so
var weights = db.weights.Where(x => ids.Contains(x.ItemId)).Select(x => x.weight).ToList();
var heights = db.heights.Where(x => ids.Contains(x.ItemId)).Select(x => x.height).ToList();
var lengths = db.lengths.Where(x => ids.Contains(x.ItemId)).Select(x => x.length).ToList();
var widths = db.widths.Where( x => ids.Contains(x.ItemId)).Select(x => x.width ).ToList();
Okay, its really not that stupid in reality but its just for illustrating the question. Basically, that array "ids" gets sent to the database 4 times in this example. I was thinking I could save some bandwidth by just sending it once. Is it possible to do that? Sorta like
db.SetTempVariable("ids", ids);
var weights = db.weights.Where(x => db.TempVariable["ids"].Contains(x.ItemId)).Select(x => x.weight).ToList();
var heights = db.heights.Where(x => db.TempVariable["ids"].Contains(x.ItemId)).Select(x => x.height).ToList();
var lengths = db.lengths.Where(x => db.TempVariable["ids"].Contains(x.ItemId)).Select(x => x.length).ToList();
var widths = db.widths.Where( x => db.TempVariable["ids"].Contains(x.ItemId)).Select(x => x.width ).ToList();
db.DeleteTempVariable("ids");
I'm just imagining the possible syntax here. In essence, SetTempVariable would send the data to the database and db.TempVariables["ids"] would be just like a dummy object to use in expressions that really only contains a reference to previously sent data and then the database magically understands this and reuses the list of ids i sent it instead of me sending it again and again.
So, how can I do that?

Well, this is more a database design problem than anything. A properly designed database would have one table that contains weights, heights, lengths and widths for every item (or "id" as you call it), so one query on the item returns everything at once.
I'm reluctant to suggest bandaid fixes for the broken database design you're using, because you really should just fix that, but you'll find a large improvement in performance if you open a transaction first and run all 4 of your queries in it. Or just join the tables on id (they seem the same?), and then your queries become one query.
To answer your actual question, and again you're barking up the wrong tree here, that's what temp tables are. You can upload your data to a temp table and then join it against your other table(s).

Related

How is it possible that selecting first after GroupBy on a DataTable does not return one value for every group?

I am trying to get one row per id from a DataTable, and I do not care which row I take. The same id can exist on several rows in the table.
Here's the expression that's giving me trouble:
dt.AsEnumerable().GroupBy(i => i.Field<int>("id")).Select(i => i.First())
Running just this section dt.AsEnumerable().GroupBy(i => i.Field<int>("id") correctly gives me a result of 22 groupings for my DataTable. (I have 22 ids with data in this table)
However, when adding on the .Select(i => i.First()), I am only seeing 10 data rows.
To me this doesn't seem to make any sense. If the GroupBy function managed to find 22 distinct id values, I would expect this logic to grab one of each.
My only other thought is that maybe it's just a weird side effect of viewing this data through a watch in Visual Studio rather than assigning to a variable.

If you think it's just weird side effects of viewing the data in a watch, which can happen with LINQ statements, then split it out into
var groups = dt.AsEnumerable().GroupBy(i => i.Field<int>("id")).ToList();
var firstOfGroups = groups.Select(i => i.First()).ToList();
and then look at groups and firstOfGroups in the debugger. Temporarily evaluating items with .ToList() can help a lot with viewing things in the debugger.

I think it is possible, can double check the count of each group items
.Select(g=>new { k = g.Key, c = g.Count() })

Linq performance when diffing two lists using inner Contains

EDIT 01: I seem to have found a solution (click for the answer) that works for me. Going from and hour to merely seconds by pre-computing and then applying the .Except() extension method; but leaving this open if anyone else encounters this problem or if anyone else finds a better solution.
ORIGINAL QUESTION
I have the following set of queries, for differend kind of objects I'm staging from a source system so I can keep it in sync and make a delta stamp myself, as the sourcesystem doesn't provide it, nor can we build or touch it.
I get all data in memory an then for example perform this query, where I look for objects that don't exist any longer in the source system, but are present in the staging database - and thus have to be marked "deleted". The bottleneck is the first part of the LINQ query - on the .Contains(), how can I improve it's performance - mayve with .Except(), with a custom comparer?
Or should I best put them in a hashing list and them perform the compare?
The problem is though I have to have the staged objects afterwards to do some property transforms on them, this seemed the simplest solution, but unfortunately it's very slow on 20k objects
stagedSystemObjects.Where(stagedSystemObject =>
!sourceSystemObjects.Select(sourceSystemObject => sourceSystemObject.Code)
.Contains(stagedSystemObject.Code)
)
.Select(x =>
{
x.ActiveStatus = ActiveStatuses.Disabled;
x.ChangeReason = ChangeReasons.Edited;
return x;
})
.ToList();

Based on Yves Schelpe's answer. I made a little tweaks to make it faster.
The basic idea is to cancel the first two ToList and use PLINQ. See if this help
var stagedSystemCodes = stagedSystemObjects.Select(x => x.Code);
var sourceSystemCodes = sourceSystemObjects.Select(x => x.Code);
var codesThatNoLongerExistInSourceSystem = stagedSystemCodes.Except(sourceSystemCodes).ToArray();
var y = stagedSystemObjects.AsParallel()
.Where(stagedSystemObject =>
codesThatNoLongerExistInSourceSystem.Contains(stagedSystemObject.Code))
.Select(x =>
{
x.ActiveStatus = ActiveStatuses.Disabled;
x.ChangeReason = ChangeReasons.Edited;
return x;
}).ToArray();
Note that PLINQ may only work well for computational limited task with multi-core CPU. It could make things worse in other scenarios.

I have found a solution for this problem - which brought it down to mere seconds in stead of an hour for 200k objects.
It's done by pre-computing and then applying the .Except() extension method
So no longer "chaining" linq queries, or doing .Contains inside a method... but make it "simpler" by first projecting both to a list of strings, so that inner calculation doesn't have to happen over and over again in the original question's example code.
Here is my solution, that for now is satisfactory. However I'm leaving this open if anyone comes up with a refined/better solution!
var stagedSystemCodes = stagedSystemObjects.Select(x => x.Code).ToList();
var sourceSystemCodes = sourceSystemObjects.Select(x => x.Code).ToList();
var codesThatNoLongerExistInSourceSystem = stagedSystemCodes.Except(sourceSystemCodes).ToList();
return stagedSystemObjects
.Where(stagedSystemObject =>
codesThatNoLongerExistInSourceSystem.Contains(stagedSystemObject.Code))
.Select(x =>
{
x.ActiveStatus = ActiveStatuses.Disabled;
x.ChangeReason = ChangeReasons.Edited;
return x;
})
.ToList();

Whats more efficient, returning all columns or casting to model in linq C#

I have a database table with 20 columns but I only need to work with 8 of the columns.
Is it more efficient to return the whole table
_context.products.where(x => x.active);
Or is it better to do this
_context.products.where(x => x.active).select(x => new SubModels.ProductItem { id = x.id, name = x.name, category = x.category etc etc});
Thanks

You can pretty much refer to this question "Why is SELECT * considered harmful".
It seems rather relevant to your case and has a lot of points.
So it's always better from performance point of view of one query to select only what you need. But you should always consider some other things like likelihood of issuing additional queries to get the data you could have returned already.

The select is more efficient, because it queries just the data you need. on your first example, it will query all columns on the db.

LINQ pull records from a database where string list is from a model

I have a database of strings that contain IDs. I need to pass that list into a LINQ query so I can pull the correct records.
model.SelectedDealers = db.dealers.Any(a => a.sdealer_name.Contains(UserToEdit.UserViewAccesses.Select(s => s.ViewReferenceNumber)));
SelectedDealers is of type dealers
ViewReferenceNumber should be a list of strings which should match sdealer_name
So essentially I am trying to find all of the dealers whos sdealer_name matches the list of sdealer_names I have in my UserToEdit.UserViewAccesses
I've tried moving parts around and switching them in different spots and can't seem to figure this out.

Any() is just a boolean indicating if there are any results. It doesn't actually return the results.
If I understand what you are after correctly, then this might work:
var dealerNames = UserToEdit.UserViewAccesses.Select(s => s.ViewReferenceNumber).ToList();
model.SelectedDealers = db.dealers.Where(a => dealerNames.Contains(a.sdealer_name));

So essentially I am trying to find all of the dealers whos
sdealer_name matches the list of sdealer_names I have in my
UserToEdit.UserViewAccesses
var dealersthatmatched = (from d in UserToEdit.UserViewAccesses.sdealer_names
where d == sdealer_name
select d).ToList()
Wish I could have made a comment instead, but as I don't have enough rep I can't. I wish I understood the requirement better, but you seem ready and able to try stuff so perhaps you find this useful.

RavenDB Map Reduce Distinct Index

We have an object with nested properties which we want to make easily searchable. This has been simple enough to achieve but we also want to aggregate information based on multiple fields. In terms of the domain we have multiple deals which have the same details with the exception of the seller. We need consolidate these as a single result and show seller options on the following page. However, we still need to be able to filter based on the seller on the initial page.
We attempted something like the below, to try to collect multiple sellers on a row but it contains duplicates and the creation of the index takes forever.
Map = deals => deals.Select(deal => new
{
Id = deal.ProductId,
deal.ContractLength,
Provider = deal.Provider.Id,
Amount = deal.Amount
});
Reduce = deals => deals.GroupBy(result => new
{
result.ProductId,
result.ContractLength,
result.Amount
}).Select(result => new
{
result.Key.ProductId,
result.Key.ContractLength,
Provider = result.Select(x => x.Provider).Distinct(),
result.Key.Amount
});
I'm not sure this the best way to handle this problem but fairly new to Raven and struggling for ideas. If we keep the index simple and group on the client side then we can't keep paging consistent.
Any ideas?

You are grouping on the document id. deal.Id, so you'll never actually generate a reduction across multiple documents.
I don't think that this is intended.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Reusing a where array - c#

Related

How is it possible that selecting first after GroupBy on a DataTable does not return one value for every group?

Linq performance when diffing two lists using inner Contains

Whats more efficient, returning all columns or casting to model in linq C#

LINQ pull records from a database where string list is from a model

RavenDB Map Reduce Distinct Index

Categories

Resources