Saving group of items from a list - c#

What I am trying to accomplish is to group my list by supplier id. Iterate over that list calculate stuff etc and when finished pass that list on to save and clear the list out and start over again. The error is the : "collection was modified enumeration operation may not execute. c#". Now from googling it I'm just suming then pass that group off to save and clear the list rinse and repeat. I thought the clear would empty the list and allow for reuse.
Error - "collection was modified enumeration operation may not execute. c#"
group is grouping the list that is passed in to this method.
I created the count to see how many groups I had(6).
ProcessSpecialOrders receives a grouped list, po and the total.
What happens here is this will process through the first group once. Save to the DB then it will throw the error. If I refresh my page or check the DB I can see grop one has saved.
List<xxxxx> specialOrderList = new List<xxxxx>();
var group = from supplier in list group supplier by supplier.SupplierId;
var count = group.Count();
foreach (var grp in group.ToList())
{
specialOrderList.Clear();
foreach (var g in grp.ToList())
{
var orderItemList = new MOrderItem();
if (g.ItemType == "SD")
{
xxx.Id = 0;
xxx.ItemId = g.Id;
xxx.OnHandQty = g.OnHand;
xxx.ItemNo = g.ItemNumber;
xxx.ThisQty = 0;
xxx.NetCost = itemPrice.Where(x => x.ItemId == g.Id).Sum(l => (decimal?)l.NetCost * 1) ?? 0.0M;
xxx.ExtendedCost = 123.99M;
xxx.OnHandWhenBuildQty = 0;
xxx.OnOrderQty = matcoOrderItem.Where(x => x.ItemId == g.Id).Sum(x => (int?)x.ThisQty) ?? 0;
specialOrderList.Add(orderItemList);
}
}
ProcessSpecialOrders(specialOrderList, poNo, specialOrderTotal);
}

Related

LINQ List overwrites previous value instead of setting new one

I have a LINQ query which outputs ToList it works fine other than the fact each time it's run it updates the original record instead of creating a new one.
On every run through this code the data.EventID changes so I'd like every record to appear in the list.
The code:
foreach(var data in vehicleqry)
{
bool inUK = boundryChecker.IsLongLatInUK((double)data.declatfloat, (double)data.declongfloat);
if (inUK == true)
{
var qryevent = (from e in db.events
where e.eventID == data.EventID
select new
{
e.eventID,
e.sysdatetime,
e.vehicleID
}).ToList();
}
{
I also have a list with the eventIDs in if I can use this to query the list?
I think what you actually want is to only run a single query instead of looping around. You can do this by making use of the Contains method:
var vehicleqry = ...;
// Get all of the individual event IDs for entries that are "inUK"
var vehicleEventIds = vehicleqry
.Where(ve => boundryChecker
.IsLongLatInUK((double)ve.declatfloat, (double)ve.declongfloat)
.Select(ve => ve.EventID);
// Get all the matching events
var qryevent = (from e in db.events
where vehicleEventIds.Contains(e.eventID)
select new
{
e.eventID,
e.sysdatetime,
e.vehicleID
}).ToList();

Most efficient way to search enumerable

I am writing a small program that takes in a .csv file as input with about 45k rows. I am trying to compare the contents of this file with the contents of a table on a database (SQL Server through dynamics CRM using Xrm.Sdk if it makes a difference).
In my current program (which takes about 25 minutes to compare - the file and database are the exact same here both 45k rows with no differences), I have all existing records on the database in a DataCollection<Entity> which inherits Collection<T> and IEnumerable<T>
In my code below I am filtering using the Where method and then doing a logic based the count of matches. The Where seems to be the bottleneck here. Is there a more efficient approach than this? I am by no means a LINQ expert.
foreach (var record in inputDataLines)
{
var fields = record.Split(',');
var fund = fields[0];
var bps = Convert.ToDecimal(fields[1]);
var withdrawalPct = Convert.ToDecimal(fields[2]);
var percentile = Convert.ToInt32(fields[3]);
var age = Convert.ToInt32(fields[4]);
var bombOutTerm = Convert.ToDecimal(fields[5]);
var matchingRows = existingRecords.Entities.Where(r => r["field_1"].ToString() == fund
&& Convert.ToDecimal(r["field_2"]) == bps
&& Convert.ToDecimal(r["field_3"]) == withdrawalPct
&& Convert.ToDecimal(r["field_4"]) == percentile
&& Convert.ToDecimal(r["field_5"]) == age);
entitiesFound.AddRange(matchingRows);
if (matchingRows.Count() == 0)
{
rowsToAdd.Add(record);
}
else if (matchingRows.Count() == 1)
{
if (Convert.ToDecimal(matchingRows.First()["field_6"]) != bombOutTerm)
{
rowsToUpdate.Add(record);
entitiesToUpdate.Add(matchingRows.First());
}
}
else
{
entitiesToDelete.AddRange(matchingRows);
rowsToAdd.Add(record);
}
}
EDIT: I can confirm that all existingRecords are in memory before this code is executed. There is no IO or DB access in the above loop.
Himbrombeere is right, you should execute the query first and put the result into a collection before you use Any, Count, AddRange or whatever method will execute the query again. In your code it's possible that the query is executed 5 times in every loop iteration.
Watch out for the term deferred execution in the documentation. If a method is implemented in that way, then it means that this method can be used to construct a LINQ query(so you can chain it with other methods and at the end you have a query). But only methods that don't use deferred execution like Count, Any, ToList(or a plain foreach) will actually execute it. If you dont want that the whole query is executed everytime and you have to access this query multiple times , it's better to store the result in a collection(.f.e with ToList).
However, you could use a different approach which should be much more efficient, a Lookup<TKey, TValue> which is similar to a dictionary and can be used with an anonymous type as key:
var lookup = existingRecords.Entities.ToLookup(r => new
{
fund = r["field_1"].ToString(),
bps = Convert.ToDecimal(r["field_2"]),
withdrawalPct = Convert.ToDecimal(r["field_3"]),
percentile = Convert.ToDecimal(r["field_4"]),
age = Convert.ToDecimal(r["field_5"])
});
Now you can access this lookup in the loop very efficiently.
foreach (var record in inputDataLines)
{
var fields = record.Split(',');
var fund = fields[0];
var bps = Convert.ToDecimal(fields[1]);
var withdrawalPct = Convert.ToDecimal(fields[2]);
var percentile = Convert.ToInt32(fields[3]);
var age = Convert.ToInt32(fields[4]);
var bombOutTerm = Convert.ToDecimal(fields[5]);
var matchingRows = lookup[new {fund, bps, withdrawalPct, percentile, age}].ToList();
entitiesFound.AddRange(matchingRows);
if (matchingRows.Count() == 0)
{
rowsToAdd.Add(record);
}
else if (matchingRows.Count() == 1)
{
if (Convert.ToDecimal(matchingRows.First()["field_6"]) != bombOutTerm)
{
rowsToUpdate.Add(record);
entitiesToUpdate.Add(matchingRows.First());
}
}
else
{
entitiesToDelete.AddRange(matchingRows);
rowsToAdd.Add(record);
}
}
Note that this will work even if the key does not exist(an empty list is returned).
Add a ToList after your Convert.ToDecimal(r["field_5"]) == age);-line to force an immediate execution of the query.
var matchingRows = existingRecords.Entities.Where(r => r["field_1"].ToString() == fund
&& Convert.ToDecimal(r["field_2"]) == bps
&& Convert.ToDecimal(r["field_3"]) == withdrawalPct
&& Convert.ToDecimal(r["field_4"]) == percentile
&& Convert.ToDecimal(r["field_5"]) == age)
.ToList();
The Where doesn´t actually execute your query, it just prepares it. The actual execution happens later in a delayed way. In your case that happens when calling Count which itself will iterate the entire collection of items. But if the first condition fails, the second one is checked leading to a second iteration of the complete collection when calling Count. In this case you actually execute that query a thrird time when calling matchingRows.First().
When forcing an immediate execution you´re executing the query only once and thus iterating the entire collection only once also which will decrease your overall-time.
Another option, which is basically along the same lines as the other answers, is to prepare your data first, so that you're not repeatedly calling things like r["field_2"] (which are relatively slow to look up).
This is a (1) clean your data, (2) query/join your data, (3) process your data approach.
Do this:
(1)
var inputs =
inputDataLines
.Select(record =>
{
var fields = record.Split(',');
return new
{
fund = fields[0],
bps = Convert.ToDecimal(fields[1]),
withdrawalPct = Convert.ToDecimal(fields[2]),
percentile = Convert.ToInt32(fields[3]),
age = Convert.ToInt32(fields[4]),
bombOutTerm = Convert.ToDecimal(fields[5]),
record
};
})
.ToArray();
var entities =
existingRecords
.Entities
.Select(entity => new
{
fund = entity["field_1"].ToString(),
bps = Convert.ToDecimal(entity["field_2"]),
withdrawalPct = Convert.ToDecimal(entity["field_3"]),
percentile = Convert.ToInt32(entity["field_4"]),
age = Convert.ToInt32(entity["field_5"]),
bombOutTerm = Convert.ToDecimal(entity["field_6"]),
entity
})
.ToArray()
.GroupBy(x => new
{
x.fund,
x.bps,
x.withdrawalPct,
x.percentile,
x.age
}, x => new
{
x.bombOutTerm,
x.entity,
});
(2)
var query =
from i in inputs
join e in entities on new { i.fund, i.bps, i.withdrawalPct, i.percentile, i.age } equals e.Key
select new { input = i, matchingRows = e };
(3)
foreach (var x in query)
{
entitiesFound.AddRange(x.matchingRows.Select(y => y.entity));
if (x.matchingRows.Count() == 0)
{
rowsToAdd.Add(x.input.record);
}
else if (x.matchingRows.Count() == 1)
{
if (x.matchingRows.First().bombOutTerm != x.input.bombOutTerm)
{
rowsToUpdate.Add(x.input.record);
entitiesToUpdate.Add(x.matchingRows.First().entity);
}
}
else
{
entitiesToDelete.AddRange(x.matchingRows.Select(y => y.entity));
rowsToAdd.Add(x.input.record);
}
}
I would suspect that this will be the among the fastest approaches presented.

C# Filter List to remove any double object

Have searched ant tested many examples in this forum but can't get a fully working method.
I am using linq to bulk insert a list of entity classes (RemoteReadings).
Due to unique constraints I need to filter out any items already inserted.
Uniqiuness is composed of 2 columns meterid and datetime in RemoteReadings table.
// approx 5000 records (I need to do this in batches of 2000 due to a
// constraint in L2S,but can do this after get this working)
List<RemoteReading> lst = createListFromCSV();
// Option 1:
// This does not work as am comparing memory list to db list. I need to use contains() method.
// Actually am trying to accomplish this infollowing examples.
List<RemoteReading> myLst = (from ri in db.RemoteReadings
from l in lst
where l.meterid = ri.meterid
&& l.date = r.date
select ri).ToList();
////
// Option2:
// Get the list from DB that are in memory lst
List<RemoteReading> myLst = (from ri in db.RemoteReadings
where
// where in this list by comparing meaterid and datemeaured
(from l in lst
select
/// help here !
///
select ri).ToList<RemoteInterconnectorReading>();
// Option3:
// Get the list from lst that are not in database
// I am bit confused here !
// Tried also to remove from list any duplicates:
List<RemoteReading> result = List<RemoteReading>)myLst.Except(lst).ToList<RemoteReading>();
// Ultimately
db.RemoteReading.InsertAllOnSubmit(result);
db.submitChanges();
Any help please?
Due to limitations in EF, we can't join DB query with in-memory list. Also, Contains can only be used with primitive list. So we need to make some efforts to find the duplicates on two columns.
var newItems = createListFromCSV();
var meterIds = newItems.Select(n=> n.meterid).Distinct().ToList();
var dates = newItems.Select(n=> n.date).Distinct().ToList();
var probableMatches = (from ri in db.RemoteReadings
where (meterIds.Contains(ri.meterids)
|| dates.Contains(ri.date)
select new {ri.merterid, ri.date}).ToList();
var duplicates = (from existingRi in probaleMatches
join newRi in newItems
on new {existingRi.meterid, existingRi.date}
equals {newRi.meterid, newRi.date}
select newRi).ToList();
var insertList = newItems.Except(duplicates).ToList();
db.RemoteReadings.Insert(insertList); // or whatever
With the great help of aSharma and some other tweaks, I finally got a working and tested method. As my lists contain over 5000 items I had to execute in batches to override the 2112 SQL RPC call limitation. Added some comments and credits :)
/// List<RemoteReadings> contains a list of database Entity Classes RemoteReadings
public List<RemoteReadings> removeDublicatesFirst(List<RemoteReadings> lst)
{
try
{
DataClasses1DataContext db = new DataClasses1DataContext();
var meterIds = lst.Select(n => n.meterId).Distinct().ToList();
var dates = lst.Select(n => n.mydate).Distinct().ToList();
var myfLst = new List<RemoteReadings>();
// To avoid the following SqlException, Linq query should be exceuted in batches as follows.
//{System.Data.SqlClient.SqlException
// The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect.
// Too many parameters were provided in this RPC request. The maximum is 2100.
foreach (var batch in dates.Batch(2000))
{
// Gets a list of possible matches from DB.
var probableMatches = (from ri in db.RemoteReadingss
where (meterIds.Contains(ri.meterId)
&& batch.Contains(ri.mydate))
select new { ri.meterId, ri.mydate }).ToList();
// Join the probableMatches with the lst in memory on unique
// constraints meterid.date to find any duplicates
var duplicates = (from existingRi in probableMatches
join newRi in lst
on new
{
existingRi.meterId,
existingRi.mydate
}
equals new { newRi.meterId, newRi.mydate }
select newRi).ToList();
//Add duplicates in a new List due to batch executions.
foreach (var s in duplicates)
{
myfLst.Add(s);
}
}
// Remove the duplicates from lst found in myfLst;
var insertList = lst.Except(myfLst).ToList();
return insertList;
}
catch
(Exception ex)
{
return null;
}
}
// Found this extension Class to divide IEnumerable in batches.
// http://stackoverflow.com/a/13731854/288865
public static class MyExtensions
{
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> items,
int maxItems)
{
return items.Select((item, inx) => new { item, inx })
.GroupBy(x => x.inx / maxItems)
.Select(g => g.Select(x => x.item));
}
}

Find a dupe in a List with Linq

I am building a list of Users. each user has a FullName.
I'm comparing users on FullName.
i'm taking a DataTable with the users from the old DB and parsing them to a 'User' Object. and adding them in a List<Users>. which in the code is a List<Deelnemer>
It goes like this:
List<Deelnemer> tempDeeln = new List<Deelnemer>();
bool dupes = false;
foreach (DataRow rij in deeln.Rows) {
Deelnemer dln = new Deelnemer();
dln.Dln_Creatiedatum = DateTime.Now;
dln.Dln_Email = rij["Ler_Email"].ToString();
dln.Dln_Inst_ID = inst.Inst_ID;
dln.Dln_Naam = rij["Ler_Naam"].ToString();
dln.Dln_Username = rij["LerLog_Username"].ToString();
dln.Dln_Voornaam = rij["Ler_Voornaam"].ToString();
dln.Dln_Update = (DateTime)rij["Ler_Update"];
if (!dupes && tempDeeln.Count(q => q.FullName.ToLower() == dln.FullName.ToLower()) > 0)
dupes = true;
tempDeeln.Add(dln);
}
then when the foreach is done, i look if the bool is true, check which ones are the doubles, and remove the oldest ones.
now, i think this part of the code is very bad:
if (!dupes && tempDeeln.Count(q => q.FullName.ToLower() == dln.FullName.ToLower()) > 0)
it runs for every user added, and runs over all the already created users.
my question: how would I optimize this.
You can use a set such as a HashSet<T> to track unique names observed so far. A hash-set supports constant-time insertion and lookup, so a full linear-search will not be required for every new item unlike you exising solution.
var uniqueNames = new HashSet<string>(StringComparer.CurrentCultureIgnoreCase);
...
foreach(...)
{
...
if(!dupes)
{
// Expression is true only if the set already contained the string.
dupes = !uniqueNames.Add(dln.FullName);
}
}
If you want to "remove" dupes (i.e. produce one representative element for each name) after you have assembled the list (without using a hash-set), you can do:
var distinctItems = tempDeeln.GroupBy(dln => dln.FullName,
StringComparer.CurrentCultureIgnoreCase)
.Select(g => g.First());
Try this out--
http://blogs.msdn.com/b/ericwhite/archive/2008/08/19/find-duplicates-using-linq.aspx
Count will go through whole set of items. Try to use Any, this way it will only check for first occurrence of the item.
if (!dupes && tempDeeln.Any(q => q.FullName.ToLower() == dln.FullName.ToLower()))
dupes = true;

Filling in missing dates using a linq group by date query

I have a Linq query that basically counts how many entries were created on a particular day, which is done by grouping by year, month, day. The problem is that because some days won't have any entries I need to back fill those missing "calendar days" with an entry of 0 count.
My guess is that this can probably be done with a Union or something, or maybe even some simple for loop to process the records after the query.
Here is the query:
from l in context.LoginToken
where l.CreatedOn >= start && l.CreatedOn <= finish
group l by
new{l.CreatedOn.Year, l.CreatedOn.Month, l.CreatedOn.Day} into groups
orderby groups.Key.Year , groups.Key.Month , groups.Key.Day
select new StatsDateWithCount {
Count = groups.Count(),
Year = groups.Key.Year,
Month = groups.Key.Month,
Day = groups.Key.Day
}));
If I have data for 12/1 - 12/4/2009 like (simplified):
12/1/2009 20
12/2/2009 15
12/4/2009 16
I want an entry with 12/3/2009 0 added by code.
I know that in general this should be done in the DB using a denormalized table that you either populate with data or join to a calendar table, but my question is how would I accomplish this in code?
Can it be done in Linq? Should it be done in Linq?
I just did this today. I gathered the complete data from the database and then generated a "sample empty" table. Finally, I did an outer join of the empty table with the real data and used the DefaultIfEmpty() construct to deal with knowing when a row was missing from the database to fill it in with defaults.
Here's my code:
int days = 30;
// Gather the data we have in the database, which will be incomplete for the graph (i.e. missing dates/subsystems).
var dataQuery =
from tr in SourceDataTable
where (DateTime.UtcNow - tr.CreatedTime).Days < 30
group tr by new { tr.CreatedTime.Date, tr.Subsystem } into g
orderby g.Key.Date ascending, g.Key.SubSystem ascending
select new MyResults()
{
Date = g.Key.Date,
SubSystem = g.Key.SubSystem,
Count = g.Count()
};
// Generate the list of subsystems we want.
var subsystems = new[] { SubSystem.Foo, SubSystem.Bar }.AsQueryable();
// Generate the list of Dates we want.
var datetimes = new List<DateTime>();
for (int i = 0; i < days; i++)
{
datetimes.Add(DateTime.UtcNow.AddDays(-i).Date);
}
// Generate the empty table, which is the shape of the output we want but without counts.
var emptyTableQuery =
from dt in datetimes
from subsys in subsystems
select new MyResults()
{
Date = dt.Date,
SubSystem = subsys,
Count = 0
};
// Perform an outer join of the empty table with the real data and use the magic DefaultIfEmpty
// to handle the "there's no data from the database case".
var finalQuery =
from e in emptyTableQuery
join realData in dataQuery on
new { e.Date, e.SubSystem } equals
new { realData.Date, realData.SubSystem } into g
from realDataJoin in g.DefaultIfEmpty()
select new MyResults()
{
Date = e.Date,
SubSystem = e.SubSystem,
Count = realDataJoin == null ? 0 : realDataJoin.Count
};
return finalQuery.OrderBy(x => x.Date).AsEnumerable();
I made a helper function which is designed to be used with anonymous types, and reused in as generic way as possible.
Let's say this is your query to get a list of orders for each date.
var orders = db.Orders
.GroupBy(o => o.OrderDate)
.Select(o => new
{
OrderDate = o.Key,
OrderCount = o.Count(),
Sales = o.Sum(i => i.SubTotal)
}
.OrderBy(o => o.OrderDate);
For my function to work please note this list must be ordered by date. If we had a day with no sales there would be a hole in the list.
Now for the function that will fill in the blanks with a default value (instance of anonymous type).
private static IEnumerable<T> FillInEmptyDates<T>(IEnumerable<DateTime> allDates, IEnumerable<T> sourceData, Func<T, DateTime> dateSelector, Func<DateTime, T> defaultItemFactory)
{
// iterate through the source collection
var iterator = sourceData.GetEnumerator();
iterator.MoveNext();
// for each date in the desired list
foreach (var desiredDate in allDates)
{
// check if the current item exists and is the 'desired' date
if (iterator.Current != null &&
dateSelector(iterator.Current) == desiredDate)
{
// if so then return it and move to the next item
yield return iterator.Current;
iterator.MoveNext();
// if source data is now exhausted then continue
if (iterator.Current == null)
{
continue;
}
// ensure next item is not a duplicate
if (dateSelector(iterator.Current) == desiredDate)
{
throw new Exception("More than one item found in source collection with date " + desiredDate);
}
}
else
{
// if the current 'desired' item doesn't exist then
// create a dummy item using the provided factory
yield return defaultItemFactory(desiredDate);
}
}
}
The usage is as follows:
// first you must determine your desired list of dates which must be in order
// determine this however you want
var desiredDates = ....;
// fill in any holes
var ordersByDate = FillInEmptyDates(desiredDates,
// Source list (with holes)
orders,
// How do we get a date from an order
(order) => order.OrderDate,
// How do we create an 'empty' item
(date) => new
{
OrderDate = date,
OrderCount = 0,
Sales = 0
});
Must make sure there are no duplicates in the desired dates list
Both desiredDates and sourceData must be in order
Because the method is generic if you are using an anonymous type then the compiler will automatically tell you if your 'default' item is not the same 'shape' as a regular item.
Right now I include a check for duplicate items in sourceData but there is no such check in desiredDates
If you want to ensure the lists are ordered by date you will need to add extra code
Essentially what I ended up doing here is creating a list of the same type with all the dates in the range and 0 value for the count. Then union the results from my original query with this list. The major hurdle was simply creating a custom IEqualityComparer. For more details here: click here
You can generate the list of dates starting from "start" and ending at "finish", a then step by step check the number of count for each date separately

Categories