Issue regarding C# collection performance

Issue regarding C# collection performance - c#

My application contains two observable collections:
1
int count = collection.Count();
It takes nearly 30 milliseconds.
Please tell me any method which should take very less time.
2
I am comparing collection_1 with collection_2 with specified value
like:
var common = collection_2.firstOrDefault(i=>i.name == collection_1.name);
It takes more than 6 milliseconds where collection_1 contains more than 35,0000 records and collection_2 contains ore than 1,00,000 records. Please tell me the best way.
my code:
foreach (var singleItem in StartWindow.omsReqRes.Where(i => i.Tag.Contains("Request")))
{
AddFileData(singleItem);
}
public void AddFileData(LogClass singleItem)
{
var responses = (StartWindow.omsReqRes.Where(i => i.Clordid == singleItem.Clordid && i.Tag.Contains("Response")));
foreach (var response in responses)
{
LogClass obj = new LogClass();
obj.AlgoName = singleItem.AlgoName;
obj.RealTime = singleItem.RealTime;
obj.TimeStamp = singleItem.TimeStamp;
var request = StartWindow.gatewayReqRes.FirstOrDefault(i => i.Tag == singleItem.Tag && i.Clordid == singleItem.Clordid);
if (request != null)
obj.v = request.e;
}
}

The collection contains more than 1 lakh records, so it takes more time for searching, I divided collection based on one value to key value pair dictionary
hence it takes less time

Related

Tolist() takes so long to convert, how can I improve this?

I have following code in which I am using tolist method to convert my data from db to list. The reason why I have to convert whole data to list is that, I have to perform search operations after that for which I am using where and lambda statement, for which we need the list.
Is there any alternative for this?
// This takes less than 2 seconds to execute
var wdata = (from s in db.VIEW_ADDED_LOT
select new LotModel
{
CREATION_DATE = s.CREATION_DATE,
LOT_NO_SPL = s.LOT_NO_SPL,
LOT_TYPE = s.LOT_TYPE,
ITEM = s.ITEM,
BUSINESS_UNIT = s.BUSINESS_UNIT,
INSPECTOR = s.INSPECTOR,
NCRNO = s.NCRNO,
BUILDING_NO = s.BUILDING_NO,
CELL = s.CELL,
NCR_DT = s.NCR_DT,
INVENTORY_ROUTER = s.INVENTORY_ROUTER,
DOC_ISSUE = s.DOC_ISSUE,
COMMENTS = s.COMMENTS,
AGING = s.AGING,
ARCHIVAL_DATE = s.ARCHIVAL_DATE,
NCR_COMPLETION_STATUS = s.NCR_COMPLETION_STATUS,
FLAG_LINK = s.FLAG_LINK,
P_KEY = s.P_KEY
});
// This takes around 1 minute to convert to list as there is 500 000 rows
var data = wdata.ToList();
// The reason why I am converting to list is that I have to perform n number of
// search on the basis of the filter chosen by user
if (NCR != null && NCR != "")
{
data = data.Where(a => a.NCRNO == NCR).ToList();
}
if (LOT != null && LOT != "")
{
data = data.Where(a => a.LOT_NO_SPL == LOT).ToList();
}

In your example the wdata.ToList() call evaluates the query and loads the entirety of wdata into memory. Your initial assignment of wdata only creates an IQueryable object, it does not actually query the database.
To avoid the slow performance you should apply all your filters to the IQueryable and then call ToList() at the end, for example:
var data = wdata; // at this point its a queryable of your initial linq
if (NCR != null && NCR != "")
{
data = data.Where(a => a.NCRNO == NCR); // appends one filter condition
}
if (LOT != null && LOT != "")
{
data = data.Where(a => a.LOT_NO_SPL == LOT); // appends another filter condition
}
var finalResult = data.ToList();
This will append your conditions to the IQueryable which will eventually be resolved once you call .ToList(), which will mean you'll not have to load all your entities into memory, as the filters will be evaluated in the DB.

Filter c# list using timestamp, Take first record of each 5 seconds

I have a scenario like to filter the record based on timings.That is first record in a range of 5 seconds.
Example :
Input data :
data timings
1452 10:00:11
1455 10:00:11
1252 10:00:13
1952 10:00:15
1454 10:00:17
1451 10:00:19
1425 10:00:20
1425 10:00:21
1459 10:00:23
1422 10:00:24
Expected output
1452 10:00:11
1454 10:00:17
1459 10:00:23
I have tried to group the data based on timings like below
listSpacialRecords=listSpacialRecords.GroupBy(x => x.timings).Select(x => x.FirstOrDefault()).ToList();
But using this i can only filter the data using same time.
It hope someone can help me to resolve this
List contain huge data, so is there any way rather than looping through list ?

This works for me:
var results =
source
.Skip(1)
.Aggregate(
source.Take(1).ToList(),
(a, x) =>
{
if (x.timings.Subtract(a.Last().timings).TotalSeconds >= 5.0)
{
a.Add(x);
}
return a;
});
I get your desired output.

This should do (assuming listSpacialRecords is in order)
var result = new List<DateTime>();
var distance = TimeSpan.FromSeconds(5);
var pivot = default(DateTime);
foreach(var record in listSpacialRecords)
{
if(record.timings > pivot)
{
result.Add(record.timings); // yield return record.timings; as an alternative if you need defered execution
pivot = record.timings +distance;
}
}
If not, easiest but maybe not the most efficient way would be to change the foreach a littlebit
foreach(var time in listSpacialRecords.OrderBy(t=>t))
Doing this only using Linq is possible, but wont benefit readability.

assuming your class looks something like this:
public class DataNode
{
public int Data { get; set; }
public TimeSpan Timings { get; set; }
}
I wrote an extension method:
public static IEnumerable<DataNode> TimeFilter(this IEnumerable<DataNode> list, int timeDifference )
{
DataNode LastFound = null;
foreach (var item in list.OrderByDescending(p=> p.Timings))
{
if (item.Timings > LastFound?.Timings.Add(new TimeSpan(0,0,timeDifference)))
{
LastFound = item;
yield return item;
}
}
}
This can then be used like this
var list = new List<DataNode>();
var result = list.TimeFilter(5);

Something like this approach may work, using the % Operator (Modulo)
Assumptions
The list is in order
You don't care if it skips missing seconds
There is always a first element
And this is only within a 24 hour period
Note : Totally untested
var seconds = listSpacialRecords
.First() // get the first element
.Timmings
.TimeOfDay // convert it to TimeSpan
.TotalSeconds; // get the total seconds of the day
var result = listSpacialRecords
.Where(x => (x.Timmings
.TimeOfDay
.TotalSeconds - seconds) % 5 == 0)
// get the difference and mod 5
.ToList();

Most efficient way to search enumerable

I am writing a small program that takes in a .csv file as input with about 45k rows. I am trying to compare the contents of this file with the contents of a table on a database (SQL Server through dynamics CRM using Xrm.Sdk if it makes a difference).
In my current program (which takes about 25 minutes to compare - the file and database are the exact same here both 45k rows with no differences), I have all existing records on the database in a DataCollection<Entity> which inherits Collection<T> and IEnumerable<T>
In my code below I am filtering using the Where method and then doing a logic based the count of matches. The Where seems to be the bottleneck here. Is there a more efficient approach than this? I am by no means a LINQ expert.
foreach (var record in inputDataLines)
{
var fields = record.Split(',');
var fund = fields[0];
var bps = Convert.ToDecimal(fields[1]);
var withdrawalPct = Convert.ToDecimal(fields[2]);
var percentile = Convert.ToInt32(fields[3]);
var age = Convert.ToInt32(fields[4]);
var bombOutTerm = Convert.ToDecimal(fields[5]);
var matchingRows = existingRecords.Entities.Where(r => r["field_1"].ToString() == fund
&& Convert.ToDecimal(r["field_2"]) == bps
&& Convert.ToDecimal(r["field_3"]) == withdrawalPct
&& Convert.ToDecimal(r["field_4"]) == percentile
&& Convert.ToDecimal(r["field_5"]) == age);
entitiesFound.AddRange(matchingRows);
if (matchingRows.Count() == 0)
{
rowsToAdd.Add(record);
}
else if (matchingRows.Count() == 1)
{
if (Convert.ToDecimal(matchingRows.First()["field_6"]) != bombOutTerm)
{
rowsToUpdate.Add(record);
entitiesToUpdate.Add(matchingRows.First());
}
}
else
{
entitiesToDelete.AddRange(matchingRows);
rowsToAdd.Add(record);
}
}
EDIT: I can confirm that all existingRecords are in memory before this code is executed. There is no IO or DB access in the above loop.

Himbrombeere is right, you should execute the query first and put the result into a collection before you use Any, Count, AddRange or whatever method will execute the query again. In your code it's possible that the query is executed 5 times in every loop iteration.
Watch out for the term deferred execution in the documentation. If a method is implemented in that way, then it means that this method can be used to construct a LINQ query(so you can chain it with other methods and at the end you have a query). But only methods that don't use deferred execution like Count, Any, ToList(or a plain foreach) will actually execute it. If you dont want that the whole query is executed everytime and you have to access this query multiple times , it's better to store the result in a collection(.f.e with ToList).
However, you could use a different approach which should be much more efficient, a Lookup<TKey, TValue> which is similar to a dictionary and can be used with an anonymous type as key:
var lookup = existingRecords.Entities.ToLookup(r => new
{
fund = r["field_1"].ToString(),
bps = Convert.ToDecimal(r["field_2"]),
withdrawalPct = Convert.ToDecimal(r["field_3"]),
percentile = Convert.ToDecimal(r["field_4"]),
age = Convert.ToDecimal(r["field_5"])
});
Now you can access this lookup in the loop very efficiently.
foreach (var record in inputDataLines)
{
var fields = record.Split(',');
var fund = fields[0];
var bps = Convert.ToDecimal(fields[1]);
var withdrawalPct = Convert.ToDecimal(fields[2]);
var percentile = Convert.ToInt32(fields[3]);
var age = Convert.ToInt32(fields[4]);
var bombOutTerm = Convert.ToDecimal(fields[5]);
var matchingRows = lookup[new {fund, bps, withdrawalPct, percentile, age}].ToList();
entitiesFound.AddRange(matchingRows);
if (matchingRows.Count() == 0)
{
rowsToAdd.Add(record);
}
else if (matchingRows.Count() == 1)
{
if (Convert.ToDecimal(matchingRows.First()["field_6"]) != bombOutTerm)
{
rowsToUpdate.Add(record);
entitiesToUpdate.Add(matchingRows.First());
}
}
else
{
entitiesToDelete.AddRange(matchingRows);
rowsToAdd.Add(record);
}
}
Note that this will work even if the key does not exist(an empty list is returned).

Add a ToList after your Convert.ToDecimal(r["field_5"]) == age);-line to force an immediate execution of the query.
var matchingRows = existingRecords.Entities.Where(r => r["field_1"].ToString() == fund
&& Convert.ToDecimal(r["field_2"]) == bps
&& Convert.ToDecimal(r["field_3"]) == withdrawalPct
&& Convert.ToDecimal(r["field_4"]) == percentile
&& Convert.ToDecimal(r["field_5"]) == age)
.ToList();
The Where doesn´t actually execute your query, it just prepares it. The actual execution happens later in a delayed way. In your case that happens when calling Count which itself will iterate the entire collection of items. But if the first condition fails, the second one is checked leading to a second iteration of the complete collection when calling Count. In this case you actually execute that query a thrird time when calling matchingRows.First().
When forcing an immediate execution you´re executing the query only once and thus iterating the entire collection only once also which will decrease your overall-time.

Another option, which is basically along the same lines as the other answers, is to prepare your data first, so that you're not repeatedly calling things like r["field_2"] (which are relatively slow to look up).
This is a (1) clean your data, (2) query/join your data, (3) process your data approach.
Do this:
(1)
var inputs =
inputDataLines
.Select(record =>
{
var fields = record.Split(',');
return new
{
fund = fields[0],
bps = Convert.ToDecimal(fields[1]),
withdrawalPct = Convert.ToDecimal(fields[2]),
percentile = Convert.ToInt32(fields[3]),
age = Convert.ToInt32(fields[4]),
bombOutTerm = Convert.ToDecimal(fields[5]),
record
};
})
.ToArray();
var entities =
existingRecords
.Entities
.Select(entity => new
{
fund = entity["field_1"].ToString(),
bps = Convert.ToDecimal(entity["field_2"]),
withdrawalPct = Convert.ToDecimal(entity["field_3"]),
percentile = Convert.ToInt32(entity["field_4"]),
age = Convert.ToInt32(entity["field_5"]),
bombOutTerm = Convert.ToDecimal(entity["field_6"]),
entity
})
.ToArray()
.GroupBy(x => new
{
x.fund,
x.bps,
x.withdrawalPct,
x.percentile,
x.age
}, x => new
{
x.bombOutTerm,
x.entity,
});
(2)
var query =
from i in inputs
join e in entities on new { i.fund, i.bps, i.withdrawalPct, i.percentile, i.age } equals e.Key
select new { input = i, matchingRows = e };
(3)
foreach (var x in query)
{
entitiesFound.AddRange(x.matchingRows.Select(y => y.entity));
if (x.matchingRows.Count() == 0)
{
rowsToAdd.Add(x.input.record);
}
else if (x.matchingRows.Count() == 1)
{
if (x.matchingRows.First().bombOutTerm != x.input.bombOutTerm)
{
rowsToUpdate.Add(x.input.record);
entitiesToUpdate.Add(x.matchingRows.First().entity);
}
}
else
{
entitiesToDelete.AddRange(x.matchingRows.Select(y => y.entity));
rowsToAdd.Add(x.input.record);
}
}
I would suspect that this will be the among the fastest approaches presented.

Increase c# list performance

I have a C# class with 20 fields and I am looping through a List "A" (list of class objects) and adding all the values in List "A" to List "B" (list of class objects) . At any given time the maximum number of values in List "A" will not exceed 160 records.
It is taking me 25 Secs for the operation to get completed (Looping through List "A" and adding them to List "B")
I tried changing the list to HashSet and the performance time was reduced to 19 secs. What can I do to increase the performance significantly i.e to get it down to 2-3 seconds. Any suggestions?
var products = new List<ProductDto>();
using (var _userEntities = new UserEntities())
{
UserDto user = GetUserDto(userEmail, _userEntities);
if (user != null)
{
var users = _userEntities.Where(x.User.userId == user.Id);
foreach (User user in users)
{
if (_userEntities.Products.FirstOrDefault(y => y.userId == user.Id) != null)
{
var Product = new ProductDto()
{
Id = user.Id.ToString(),
ProductId = user.Product != null ? user.ProductId : string.Empty,
Name = user.Product != null ? user.Product.Name : string.Empty,
SalePrice = user.SalePrice == null ? string.Empty : user.SalePrice.ToString(),
OrderId = user.OrderID,
CreatedDate = user.CreatedDate,
HasChildItems = user.Product != null && user.Product.HasChildItems != null && user.Product.HasChildItems ? true : false,
OrderNumber = user.OrderNumber,
};
products.Add(Product);
}
}
}
}

The problem is not the length of the list but the number of SQL queries. If you have 160 elements you have at list 161 sql statements which seems to be very slow in your case (use a profiler to analyze). You can try to modify your linq query to join the required products with the users. I don't know your data model, but it could be something like this:
var users = from u in dbContext.Users.Include(u => u.Product);
or
var products = from p in dbContext.Products.Include(p => p.User);
Eager loading instead of lazy loading.

Use LINQ to group a sequence by date with no gaps

I'm trying to select a subgroup of a list where items have contiguous dates, e.g.
ID StaffID Title ActivityDate
-- ------- ----------------- ------------
1 41 Meeting with John 03/06/2010
2 41 Meeting with John 08/06/2010
3 41 Meeting Continues 09/06/2010
4 41 Meeting Continues 10/06/2010
5 41 Meeting with Kay 14/06/2010
6 41 Meeting Continues 15/06/2010
I'm using a pivot point each time, so take the example pivot item as 3, I'd like to get the following resulting contiguous events around the pivot:
ID StaffID Title ActivityDate
-- ------- ----------------- ------------
2 41 Meeting with John 08/06/2010
3 41 Meeting Continues 09/06/2010
4 41 Meeting Continues 10/06/2010
My current implementation is a laborious "walk" into the past, then into the future, to build the list:
var activity = // item number 3: Meeting Continues (09/06/2010)
var orderedEvents = activities.OrderBy(a => a.ActivityDate).ToArray();
// Walk into the past until a gap is found
var preceedingEvents = orderedEvents.TakeWhile(a => a.ID != activity.ID);
DateTime dayBefore;
var previousEvent = activity;
while (previousEvent != null)
{
dayBefore = previousEvent.ActivityDate.AddDays(-1).Date;
previousEvent = preceedingEvents.TakeWhile(a => a.ID != previousEvent.ID).LastOrDefault();
if (previousEvent != null)
{
if (previousEvent.ActivityDate.Date == dayBefore)
relatedActivities.Insert(0, previousEvent);
else
previousEvent = null;
}
}
// Walk into the future until a gap is found
var followingEvents = orderedEvents.SkipWhile(a => a.ID != activity.ID);
DateTime dayAfter;
var nextEvent = activity;
while (nextEvent != null)
{
dayAfter = nextEvent.ActivityDate.AddDays(1).Date;
nextEvent = followingEvents.SkipWhile(a => a.ID != nextEvent.ID).Skip(1).FirstOrDefault();
if (nextEvent != null)
{
if (nextEvent.ActivityDate.Date == dayAfter)
relatedActivities.Add(nextEvent);
else
nextEvent = null;
}
}
The list relatedActivities should then contain the contiguous events, in order.
Is there a better way (maybe using LINQ) for this?
I had an idea of using .Aggregate() but couldn't think how to get the aggregate to break out when it finds a gap in the sequence.

Here's an implementation:
public static IEnumerable<IGrouping<int, T>> GroupByContiguous(
this IEnumerable<T> source,
Func<T, int> keySelector
)
{
int keyGroup = Int32.MinValue;
int currentGroupValue = Int32.MinValue;
return source
.Select(t => new {obj = t, key = keySelector(t))
.OrderBy(x => x.key)
.GroupBy(x => {
if (currentGroupValue + 1 < x.key)
{
keyGroup = x.key;
}
currentGroupValue = x.key;
return keyGroup;
}, x => x.obj);
}
You can either convert the dates to ints by means of subtraction, or imagine a DateTime version (easily).

In this case I think that a standard foreach loop is probably more readable than a LINQ query:
var relatedActivities = new List<TActivity>();
bool found = false;
foreach (var item in activities.OrderBy(a => a.ActivityDate))
{
int count = relatedActivities.Count;
if ((count > 0) && (relatedActivities[count - 1].ActivityDate.Date.AddDays(1) != item.ActivityDate.Date))
{
if (found)
break;
relatedActivities.Clear();
}
relatedActivities.Add(item);
if (item.ID == activity.ID)
found = true;
}
if (!found)
relatedActivities.Clear();
For what it's worth, here's a roughly equivalent -- and far less readable -- LINQ query:
var relatedActivities = activities
.OrderBy(x => x.ActivityDate)
.Aggregate
(
new { List = new List<TActivity>(), Found = false, ShortCircuit = false },
(a, x) =>
{
if (a.ShortCircuit)
return a;
int count = a.List.Count;
if ((count > 0) && (a.List[count - 1].ActivityDate.Date.AddDays(1) != x.ActivityDate.Date))
{
if (a.Found)
return new { a.List, a.Found, ShortCircuit = true };
a.List.Clear();
}
a.List.Add(x);
return new { a.List, Found = a.Found || (x.ID == activity.ID), a.ShortCircuit };
},
a => a.Found ? a.List : new List<TActivity>()
);

Somehow, I don't think LINQ was truly meant to be used for bidirectional-one-dimensional-depth-first-searches, but I constructed a working LINQ using Aggregate. For this example I'm going to use a List instead of an array. Also, I'm going to use Activity to refer to whatever class you are storing the data in. Replace it with whatever is appropriate for your code.
Before we even start, we need a small function to handle something. List.Add(T) returns null, but we want to be able to accumulate in a list and return the new list for this aggregate function. So all you need is a simple function like the following.
private List<T> ListWithAdd<T>(List<T> src, T obj)
{
src.Add(obj);
return src;
}
First, we get the sorted list of all activities, and then initialize the list of related activities. This initial list will contain the target activity only, to start.
List<Activity> orderedEvents = activities.OrderBy(a => a.ActivityDate).ToList();
List<Activity> relatedActivities = new List<Activity>();
relatedActivities.Add(activity);
We have to break this into two lists, the past and the future just like you currently do it.
We'll start with the past, the construction should look mostly familiar. Then we'll aggregate all of it into relatedActivities. This uses the ListWithAdd function we wrote earlier. You could condense it into one line and skip declaring previousEvents as its own variable, but I kept it separate for this example.
var previousEvents = orderedEvents.TakeWhile(a => a.ID != activity.ID).Reverse();
relatedActivities = previousEvents.Aggregate<Activity, List<Activity>>(relatedActivities, (items, prevItem) => items.OrderBy(a => a.ActivityDate).First().ActivityDate.Subtract(prevItem.ActivityDate).Days.Equals(1) ? ListWithAdd(items, prevItem) : items).ToList();
Next, we'll build the following events in a similar fashion, and likewise aggregate it.
var nextEvents = orderedEvents.SkipWhile(a => a.ID != activity.ID);
relatedActivities = nextEvents.Aggregate<Activity, List<Activity>>(relatedActivities, (items, nextItem) => nextItem.ActivityDate.Subtract(items.OrderBy(a => a.ActivityDate).Last().ActivityDate).Days.Equals(1) ? ListWithAdd(items, nextItem) : items).ToList();
You can properly sort the result afterwards, as now relatedActivities should contain all activities with no gaps. It won't immediately break when it hits the first gap, no, but I don't think you can literally break out of a LINQ. So it instead just ignores anything which it finds past a gap.
Note that this example code only operates on the actual difference in time. Your example output seems to imply that you need some other comparison factors, but this should be enough to get you started. Just add the necessary logic to the date subtraction comparison in both entries.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Issue regarding C# collection performance - c#

The collection contains more than 1 lakh records, so it takes more time for searching, I divided collection based on one value to key value pair dictionary hence it takes less time

Related

Tolist() takes so long to convert, how can I improve this?

Filter c# list using timestamp, Take first record of each 5 seconds

Most efficient way to search enumerable

Increase c# list performance

Use LINQ to group a sequence by date with no gaps

Categories

Resources