C# List Iteration Performance

C# List Iteration Performance - c#

I have a for loop that does 24 total iterations each representing a single hour of the day and then checks each 15 minute interval in another nested for loop. An additional nest checks a List for the hour and minute value and then aggregates some of the items in my List if they meet my time requirement. The issue is that my List can contain up to 1 million records which means that I traverse 1 million records 24*4 times.
How can I optimize my code for faster performance in this case? I know this could probably be simplified with LINQ statements but I'm not sure it would make it faster. Here's an example of what I am doing.
List<SummaryData> Aggregates = new List<SummaryData>();
for(int startHour = 0; startHour < 24; startHour++)
{
for(int startMin = 0; startMin < 60; startMin+= 15)
{
int aggregateData = 0;
//My ItemList can have up to 1 million records.
foreach(ListItem item in ItemList)
{
if((item.time.Hour == startHour)&&(item.time.Minute == startMinute))
{
aggregateData += item.number;
}
}
SummaryData aggregate = new SummaryData { SummaryId = item.id, TotalNumber = aggregateData
Aggregates.Add(aggregate);
}
}
class SummaryData
{
public int SummaryId {get; set;}
public int TotalNumber {get; set;}
}

Instead of looking for each Hour and Minute in every single item, iterate over ItemList just once and act based on each item.time.Hour and item.time.Minute.

Given your logic above, you should only have to iterate the list one time. You can nest your for loops within the foreach and likely achieve better performance. I would also use a Dictionary to hold your aggregate data, and base its key on the total minute (meaning hour * 60 + minute).
Dictionary<int, AggregateDate> aggregate = new Dictionary<int, AggregateData>();
foreach(ListItem item in ItemList)
{
int key = item.Hour * 60 + item.Minute;
AggregateData data;
if(!aggregate.TryGetValue(key, out data))
{
aggregate.Add(key, data = new AggregateData());
}
data.Number += item.Number;
}

I'd be organizing the data roughly like this:
(see also: http://ideone.com/dyfoD)
using System;
using System.Linq;
using System.Collections.Generic;
public class P
{
struct DataItem
{
public System.DateTime time;
public int number;
}
public static void Main(string[] args)
{
var ItemList = new DataItem[] {} ;
var groups = ItemList
.GroupBy(item => item.time.Hour * 60 + (item.time.Minute/15)*15 );
var sums = groups
.ToDictionary(g => g.Key, g => g.Sum(item => item.number));
// lookups now become trivially easy:
int slot1900 = sums[1900];
int slot1915 = sums[1915];
int slot1930 = sums[1930];
}
}

What is the result of this algorithm? Apologies if I'm being daft for not getting it.
It seems to indentify all items in itemList whose minute value is evenly divisible by 15, then add its number value into a running counter, and then add that running counter into this Aggregates object.
Because I'm not clear on the types of some of these objects, I'm a little fuzzy on what's actually happening here. You seem to aggregate once with "aggregateData += item.number" and then aggregate AGAIN with "Aggregates.Add(aggregateData)" are you sure you're not double-summing these things? I'm not even clear if you're trying to sum values of qualified items or create a list of them.
That aside, it's definitely not required or optimal to go over the entire list of 1 million items 24*4 times, but I can't be sure what is correct without a clearer understanding of the goal.
As suggested in the other answers, the correct approach is likely to iterate over itemList exactly once and operate on every single item, rather than iterating ~100 times and discarding each item in the list ~99 times (since you know it can only qualify for one of the ~100 iterations).

Your problem statement is a wee bit fuzzy. It looks like you want a summary, by item id, giving you the sum of all item numbers where the timestamp falls on a integral quarter-hour boundary.
The following should do the trick, I think.
one pass through the list
the datastore is a SortedDictionary (a height balanced binary tree), so lookup, insertion and removal is O(log N).
Here's the code:
public class SummaryData
{
public SummaryData( int id )
{
this.SummaryId = id ;
this.TotalNumber = 0 ;
}
public int SummaryId { get; set; }
public int TotalNumber { get; set; }
}
public class ListItem
{
public int Id ;
public int Number ;
public DateTime Time ;
}
public IEnumerable<SummaryData> Summarize( IEnumerable<ListItem> ItemList )
{
const long TICKS_PER_QUARTER_HOUR = TimeSpan.TicksPerMinute * 15;
SortedDictionary<int,SummaryData> summary = new SortedDictionary<int , SummaryData>();
foreach ( ListItem item in ItemList )
{
long TimeOfDayTicks = item.Time.TimeOfDay.Ticks;
bool on15MinuteBoundary = ( 0 == TimeOfDayTicks % TICKS_PER_QUARTER_HOUR ? true : false );
if ( on15MinuteBoundary )
{
int key = (int)( TimeOfDayTicks / TICKS_PER_QUARTER_HOUR );
SummaryData value;
bool hasValue = summary.TryGetValue( key , out value );
if ( !hasValue )
{
value = new SummaryData( item.Id );
summary.Add( value.SummaryId , value ) ;
}
value.TotalNumber += item.Number;
}
}
return summary.Values;
}

Related

how do you create a loop that cycles through more than one List in C#

I do not want the answer, I would just like how to go about this or some examples please !
Requirements
Create a custom method called CombineTheLists and use the 2 lists you
created as arguments in the function call.
i. This method should have 2 parameters and catch the incoming Lists.
ii. It should not return anything.
iii. Inside of the function, create a loop that cycles through both
lists at the same time.
iv. Each time the loop runs, pull an item from the 1st List and the
matching price from the 2nd List and combine them into one text string
using the format of “The X costs $Y.” Where X is the item to be bought
and Y is a the cost of the item. Make sure to use a $ and format to 2
decimal places
My Current Code
enter code class Program
{
static void Main(string[] args)
{
// list for items
List<string> items = new List<string>() {
"laptop", "book", "backpack", "cellphone", "pencils", "notebook", "pens" };
// list for prices
List<double> prices = new List<double>() {
900.54, 40.20, 21.00, 600.00, 4.25, 10.50, 5.00 };
}
}
public static void CombineTheLists( string item, double prices)
{
for (int i = 0; i < item.Length; i++)
{
}
}
}

It seems, that you are looking for Zip:
var data = items
.Zip(prices, (item, price) => new {
item,
price });
...
foreach (var value in data) {
// value.item for item
// value.price for price
}
Edit: In case of good old for loop:
namespace MySolution {
class Program {
static void Main(string[] args) {
...
CombineTheLists(items, prices);
}
public static void CombineTheLists(List<string> items, List<double> prices) {
for (int i = 0; i < Math.Min(items.Count, prices.Count); ++i)
Console.WriteLine($"item {items[i]} costs {prices[i]:f2}"); // f2 - format string
}
}
}

You are on (one of) the right tracks, but first, you should change your method declaration to accept two lists, instead of a string and double:
public static void CombineTheLists(List<string> items, List<double> prices)
{
for (int i = 0; i < items.Count; i++)
{
}
}
You already have written a for loop with the looping variable i. As you know, i will increase every iteration. This means that items[i] will be the corresponding item in that iteration and prices[i] will be the corresponding price. For example, here's some code to print the item and price on one line:
// in the for loop
Console.WriteLine($"{items[i]} - {prices[i]}");
// try to do the formatting to 2 d.p. yourself

Why dont you create a object with two attributes and a list of objects?
class item {
String name;
String price;
/*getter and setters*/
}
class main {
public static void CombineTheLists()
{
List<item> items = new List<item>() { item1, item2, item3,.... };
}
}

Trouble getting total count using .Take() and .Skip()

I'm having some trouble implementing some paging using Linq and I've read the various questions (this and this for example) on here but I'm still getting the error;
System.InvalidOperationException: The result of a query cannot be enumerated more than once.
My (slightly obfuscated) code is;
public List<Thing> GetThings(ObjectParameter[] params, int count, int pageIndex)
{
var things = from t in Context.ExecuteFunction<Something>("function", params)
select new Thing
{
ID = t.ID
});
var pagedThings = things;
if (pageIndex == 0)
pagedThings = things.Take(count);
else if (pageIndex > 0)
pagedThings = things.Skip(count * pageIndex).Take(count);
var countOfThings = things.Count();
return pagedThings.ToList();
}
As soon as the final .ToList() is called, the error is thrown but I can't see why - are the calls to things.Count() and pagedThings.ToList() enumerating the same thing?
Edit: I'm using Entity Framework if that makes any difference

ExecuteFunction actually returns an ObjectResult if I'm not mistaken, which is... more complicated. You may get different results if you make the function composable (which would execute a separate query when you Count()), but it's been a while since I worked with low level EF so I'm not 100% sure that would work.
Since you can't get out of executing what are effectively two queries, the safest bet is to make a completely separate one for counting - and by completely separate I mean a separate function or stored procedure, which just does the count, otherwise you may end up (depending on your function) returning rows to EF and counting them in memory. Or rewrite the function as a view if at all possible, which may make it more straightforward.

You are setting pagedThings = things. So you are working on the same object. You would need to copy things to a new collection if you want to do what you're trying above, but I would recommend refactoring this code in general.
You can check out this SO post to get some ideas on how to get the count without enumerating the list:
How to COUNT rows within EntityFramework without loading contents?

In general, Linq is able to do that. In LinqPad, I wrote the following code and successfully executed it:
void Main()
{
var sampleList = new List<int>();
for (int i = 0; i < 100; i++){
sampleList.Add(i);
}
var furtherQuery = sampleList.Take(3).Skip(4);
var count = furtherQuery.Count();
var cache = furtherQuery.ToList();
}
Note, as your error mentions, this will execute the query twice. Once for Count() and once for ToList().
It must be that the Linq provider that you are representing as Context.ExecuteFunction<Something>("function", params) is protecting you from making multiple expensive calls. You should look for a way to iterate over the results only once. As written, for example, you could .Count() on the List that you had already generated.

Normally, we call them pageIndex and pageSize.
Please check pageIndex whether 0 as start index or 1 as start index depending on your requirement.
public List<Thing> GetThings(ObjectParameter[] params, int pageIndex, int pageSize)
{
if (pageSize <= 0)
pageSize = 1;
if (pageIndex < 0)
pageIndex = 0;
var source = Context.ExecuteFunction<Something>("function", params);
var total = source.Count();
var things = (from t in source select new Thing { ID = t.ID })
.Skip(pageIndex * pageSize).Take(pageSize).ToList();
return things.ToList();
}

Here is my implementation of your code. a few things to notice.
1. You can handle Skip in one statement.
2. The main method shows how to pass multiple pages into the method.
using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
public static void Main()
{
List<Thing> thingList = new List<Thing>();
for (int i = 0; i < 99; i++)
{
thingList.Add(new Thing(i));
}
int count = 20;
int pageIndex = 0;
int numberPages = (int)Math.Ceiling(thingList.Count * 1.0/ (count ));
for( ; pageIndex < numberPages; pageIndex ++)
{
var myPagedThings = GetThings(thingList, count, pageIndex);
foreach( var item in myPagedThings)
{
Console.WriteLine(item.ID );
}
}
}
public static IEnumerable<Thing> GetThings(List<Thing> myList, int count, int pageIndex)
{
var things = (
from t in myList
select new Thing{ID = t.ID}).ToList();
return things.Skip(count * pageIndex).Take(count);
}
}
public class Thing
{
public int ID
{ get; set; }
public Thing (){}
public Thing(int id)
{ this.ID = id; }
}

As it happens, ExecuteFunction causes the enumeration to occur immediately, ultimately meaning the code could be re-ordered and the copying of the list was not required - it now looks like the below
public ThingObjects GetThings(ObjectParameter[] params, int count, int pageIndex)
{
var things = from t in Context.ExecuteFunction<Something>("function", params)
select new Thing
{
ID = t.ID
}).ToList();
var countOfThings = things.Count;
if (pageIndex >= 0)
things = things.Skip(count * pageIndex).Take(count);
return new ThingObjects(things, countOfThings);
}

Iterate over unknown nested lists

I need to iterate over unknown nested lists and size (subcategories.subcategories.subcategories etc..) and check if any values in my array contains the nested lists values. I might need an recursive function. How could i make this possible.
Here is my code so far (it will not check deeper then 2 level)
for (int counter = 0; counter < filteredList[0].subcategories.Count; counter++)
{
var item = filteredList[0].subcategories[counter].questionanswer;
for (int i = 0; i < item.Count; i++)
{
var results = Array.FindAll(questionIDs, s => s.Equals(item[i].id.ToString()));
if (results.Length > 0)
{
QuestionViewModel question = new QuestionViewModel();
question.formattedtext = item[i].formattedtext;
question.id = item[i].id;
question.sortorder = item[i].sortorder;
question.breadCrum.AddRange(breadCrumCategoryId);
filteredQuestions.Add(question);
}
}
}

Been writing recursive functions for 40 years. If I can't recurse, nobody can. Best way is define classes. Normally done like code below. Add properties to class like Name, Rank, Serial Number.
public class Category
{
List<Category> children { get; set; }
}

Parallel loop in c#, accessing the same variable

I have an Item object with a property called generator_list (hashset of strings). I have 8000 objects, and for each object, I'd like to see how it's generator_list intersects with every other generator_list, and then I'd like to store the intersection number in a List<int>, which will have 8000 elements, logically.
The process takes about 8 minutes, but only a few minutes with parallel processing, but I don't think I'm doing the parallel part right, hence the question. Can anyone please tell me if and how I need to modify my code to take advantage of the parallel loops?
The code for my Item object is:
public class Item
{
public int index { get; set; }
public HashSet<string> generator_list = new HashSet<string>();
}
I stored all my Item objects in a List<Item> items (8000 elements). I created a method that takes in items (the list I want to compare) and 1 Item (what I want to compare to), and it's like this:
public void Relatedness2(List<Item> compare, Item compare_to)
{
int compare_to_length = compare_to.generator_list.Count;
foreach (Item block in compare)
{
int block_length = block.generator_list.Count;
int both = 0; //this counts the intersection number
if (compare_to_length < block_length) //to make sure I'm looping
//over the smaller set
{
foreach (string word in compare_to.generator_list)
{
if (block.generator_list.Contains(word))
{
both = both + 1;
}
}
}
else
{
foreach (string word in block.generator_list)
{
if (compare_to.generator_list.Contains(word))
{
both = both + 1;
}
}
}
// I'd like to store the intersection number, both,
// somewhere so I can effectively use parallel loops
}
}
And finally, my Parallel forloop is:
Parallel.ForEach(items, (kk, state, index) => Relatedness2(items, kk));
Any suggestions?

Maybe something like this
public Dictionary<int, int> Relatedness2(IList<Item> compare, Item compare_to)
{
int compare_to_length = compare_to.generator_list.Count;
var intersectionData = new Dictionary<int, int>();
foreach (Item block in compare)
{
int block_length = block.generator_list.Count;
int both = 0;
if (compare_to_length < block_length)
{
foreach (string word in compare_to.generator_list)
{
if (block.generator_list.Contains(word))
{
both = both + 1;
}
}
}
else
{
foreach (string word in block.generator_list)
{
if (compare_to.generator_list.Contains(word))
{
both = both + 1;
}
}
}
intersectionData[block.index] = both;
}
return intersectionData;
}
And
List<Item> items = new List<Item>(8000);
//add to list
var dictionary = new ConcurrentDictionary<int, Dictionary<int, int>>();//thread-safe dictionary
var readOnlyItems = items.AsReadOnly();// if you sure you wouldn't modify collection, feel free use items directly
Parallel.ForEach(readOnlyItems, item =>
{
dictionary[item.index] = Relatedness2(readOnlyItems, item);
});
I assumed that index unique.
i used a dictionaries, but you may want to use your own classes
in my example you can access data in following manner
var intesectiondata = dictionary[1]//dictionary of intersection for item with index 1
var countOfintersectionItemIndex1AndItemIndex3 = dictionary[1][3]
var countOfintersectionItemIndex3AndItemIndex7 = dictionary[3][7]
don't forget about possibility dictionary[i] == null

Thread safe collections is probably what you are looking for http://msdn.microsoft.com/en-us/library/dd997305(v=vs.110).aspx.
When working in multithreaded environment, you need to make sure that
you are not manipulating shared data at the same time without
synchronizing access.
the .NET Framework offers some collection classes that are created
specifically for use in concurrent environments, which is what you
have when you're using multithreading. These collections are
thread-safe, which means that they internally use synchronization to
make sure that they can be accessed by multiple threads at the same
time.
Source: Programming in C# Exam Ref 70-483, Objective 1.1: Implement multhitreading and asynchronous processing, Using Concurrent collections
Which are the following collections
BlockingCollection<T>
ConcurrentBag<T>
ConcurrentDictionary<T>
ConcurentQueue<T>
ConcurentStack<T>

If your Item's index is contiguous and starts at 0, you don't need the Item class at all. Just use a List< HashSet< < string>>, it'll take care of indexes for you. This solution finds the intersect count between 1 item and the others in a parallel LINQ. It then takes that and runs it on all items of your collection in another parallel LINQ. Like so
var items = new List<HashSet<string>>
{
new HashSet<string> {"1", "2"},
new HashSet<string> {"2", "3"},
new HashSet<string> {"3", "4"},
new HashSet<string>{"1", "4"}
};
var intersects = items.AsParallel().Select( //Outer loop to run on all items
item => items.AsParallel().Select( //Inner loop to calculate intersects
item2 => item.Intersect(item2).Count())
//This ToList will create a single List<int>
//with the intersects for that item
.ToList()
//This ToList will create the final List<List<int>>
//that contains all intersects.
).ToList();

Why Doesn't My Anonymous Method Work in a Loop?

This function is supposed to set descending order numbers on an IEnumerable<Order>, but it doesn't work. Can anyone tell me what's wrong with it?
private void orderNumberSetter(IEnumerable<Order> orders)
{
var i = 0;
Action<Order, int> setOrderNumber = (Order o, int count) =>
{
o.orderNumber = i--;
};
var orderArray = orders.ToArray();
for (i = 0; i < orders.Count(); i++)
{
var order = orderArray[i];
setOrderNumber(order, i);
}
}

You are re-using i as loop variable and i gets modified in your setOrderNumber lambda - don't modify i - it's unclear what you meant to do, maybe the following:
Action<Order, int> setOrderNumber = (Order o, int count) =>
{
o.orderNumber = count;
};
If the above is the case you could have achieved that much, much easier though, your code seems unnecessarily complex, i.e:
for (i = 0; i < orderArray.Length; i++)
{
orderArray[i].orderNumber = i;
}
or even simpler without having to create an array at all:
int orderNum = 0;
foreach(var order in orders)
{
order.orderNumber = orderNum++;
}
Edit:
To set descending order numbers, you can determine the number of orders first then go backwards from there:
int orderNum = orders.Count();
foreach(var order in orders)
{
order.orderNumber = orderNum--;
}
Above would produce one based order numbers in descending order. Another approach, more intuitive and probably easier to maintain is to just walk the enumeration in reverse order:
int orderNum = 0;
foreach(var order in orders.Reverse())
{
order.orderNumber = orderNum++;
}

I agree with BrokenGlass, you are running into an infinite loop.
You could achieve the same thing using foreach:
private void orderNumberSetter(IEnumerable<Order> orders)
{
var count = orders.Count();
orders.ToList().ForEach(o =>
{
o.orderNumber = count--;
});
}

I would try this code instead that decrements i while it enumerates through the array
private void orderNumberSetter(IEnumerable<Order> orders)
{
int i = orders.Count();
foreach (Order order in orders.ToArray())
{
order.orderNumber = --i;
}
}

Though its hard to tell what your trying to do, its a good bet that you didn't mean to keep referring to the same variable i, which is whats causing an infinite loop.
heres another example of what I believe you wanted
IEnumerable<Order> reversed = orders.ToArray(); //To avoid editing the original
reversed.Reverse();
int orderNumber = 0;
foreach (Order order in reversed)
{
order.orderNumber = orderNumber++;
}
I suggest editing your title. Your title describes your question, and I'm sure you didn't want a Broken C# function, since you already had one :P. Its also good to describe what your code to do in the post thoroughly, including what your expected results are, and how your current example doesn't meet them. Don't let your non working example alone explain what you want, It only showed us an example of what you didn't want.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# List Iteration Performance - c#

Instead of looking for each Hour and Minute in every single item, iterate over ItemList just once and act based on each item.time.Hour and item.time.Minute.

Related

how do you create a loop that cycles through more than one List in C#

Trouble getting total count using .Take() and .Skip()

Iterate over unknown nested lists

Parallel loop in c#, accessing the same variable

Why Doesn't My Anonymous Method Work in a Loop?

Categories

Resources