Is Linq Faster, Slower or the same?

Is Linq Faster, Slower or the same? - c#

Is this:
Box boxToFind = AllBoxes.FirstOrDefault(box => box.BoxNumber == boxToMatchTo.BagNumber);
Faster or slower than this:
Box boxToFind ;
foreach (Box box in AllBoxes)
{
if (box.BoxNumber == boxToMatchTo.BoxNumber)
{
boxToFind = box;
}
}
Both give me the result I am looking for (boxToFind). This is going to run on a mobile device that I need to be performance conscientious of.

It should be about the same, except that you need to call First (or, to match your code, Last), not Where.
Calling Where will give you a set of matching items (an IEnumerable<Box>); you only want one matching item.
In general, when using LINQ, you need to be aware of deferred execution. In your particular case, it's irrelevant, since you're getting a single item.

The difference is not important unless you've identified that this particular loop as a performance bottleneck through profiling.
If profiling does find it to be a problem, then you'll want to look into alternate storage. Store the data in a dictionary which provides faster lookup than looping through an array.

If micro-optimization is your thing, LINQ performs worse, this is just one article, there are a lot of other posts you can find.

Micro optimization will kill you.
First, finish the whole class, then, if you have performance problems, run a profiler and check for the hotspots of the application.
Make sure you're using the best algorithms you can, then turn to micro optimizations like this.
In case you already did :
Slow -> Fast
LINQ < foreach < for < unsafe for (The last option is not recommended).
Abstractions will make your code slower, 95% of the time.

The fastest is when you are using for loop. But the difference is so small that you are ignore it. It will only matter if you are building a real-time application but then for those applications maybe C# is not the best choice anyway!

If AllBoxes is an IQueryable, it can be faster than the loop, because the queryable could have an optimized implementation of the Where-operation (for example an indexed access).

LINQ is absolutely 100% slower
Depends on what you are trying to accomplish in your program, but for the most part this is most certainly what I would call LAZY PROGRAMMER CODE...
You are going to essentially "stall-out" if you are performing any complex queries, joins etc... total p.o.s for those types of functions/methods- just don't use it. If you do this the hard/long way you will be much happier in the long run...and performance will be a world apart.
NOTE:
I would definitely not recommend LINQ for any program built for speed/synchronization tasks/computation
(i.e. HFT trading &/or AT trading i-0-i for starters).
TESTED:
It took nearly 10 seconds to complete a join in "LINQ" vs. < 1 millisecond.

LINQ vs Loop – A performance test
LINQ: 00:00:04.1052060, avg. 00:00:00.0041052
Loop: 00:00:00.0790965, avg. 00:00:00.0000790
References:
http://ox.no/posts/linq-vs-loop-a-performance-test
http://www.schnieds.com/2009/03/linq-vs-foreach-vs-for-loop-performance.html

Related

Performance and Memory Consumption in C#

I have two question:
1) I need some expert view in terms of witting code which will be Performance and Memory Consumption wise sound enough.
2) Performance and Memory Consumption wise how good/bad is following piece of code and why ???
Need to increment the counter that could go maximum by 100 and writing code like this:
Some Sample Code is as follows:
for(int i=0;i=100;i++)
{
Some Code
}
for(long i=0;i=1000;i++)
{
Some Code
}
how good is to use Int16 or anything else instead of int, long if the requirement is same.

Need to increment the counter that could go maximum by 100 and writing code like this:
Options given:
for(int i=0;i=100;i++)
for(long i=0;i=1000;i++)
EDIT: As noted, neither of these would even actually compile, due to the middle expression being an assignment rather than an expression of type bool.
This demonstrates a hugely important point: get your code working before you make it fast. Your two loops don't do the same thing - one has an upper bound of 1000, the other has an upper bound of 100. If you have to choose between "fast" and "correct", you almost always want to pick "correct". (There are exceptions to this, of course - but that's usually in terms of absolute correctness of results across large amounts of data, not code correctness.)
Changing between the variable types here is unlikely to make any measurable difference. That's often the case with micro-optimizations. When it comes to performance, architecture is usually much more important than in-method optimizations - and it's also a lot harder to change later on. In general, you should:
Write the cleanest code you can, using types that represent your data most correctly and simply
Determine reasonable performance requirements
Measure your clean implementation
If it doesn't perform well enough, use profiling etc to work out how to improve it

DateTime dtStart = DateTime.Now;
for(int i=0;i=10000;i++)
{
Some Code
}
response.write ((DateTime.Now - dtStart).TotalMilliseconds.ToString());
same way for Long as well and you can know which one is better... ;)

When you are doing things that require a number representing iterations, or the quantity of something, you should always use int unless you have a good semantic reason to use a different type (ie data can never be negative, or it could be bigger than 2^31). Additionally, Worrying about this sort of nano-optimization concern will basically never matter when writing c# code.
That being said, if you are wondering about the differences between things like this (incrementing a 4 byte register versus incrementing 8 bytes), you can always cosult Mr. Agner's wonderful instruction tables.
On an Amd64 machine, incrementing long takes the same amount of time as incrementing int.**
On a 32 bit x86 machine, incrementing int will take less time.
** The same is true for almost all logic and math operations, as long as the value is not both memory bound and unaligned. In .NET a long will always be aligned, so the two will always be the same.

Sort or RemoveAll first on an IEnumerable that needs both?

When an IEnumerable needs both to be sorted and for elements to be removed, are there advantages/drawback of performing the stages in a particular order? My performance tests appear to indicate that it's irrelevant.
A simplified (and somewhat contrived) example of what I mean is shown below:
public IEnumerable<DataItem> GetDataItems(int maximum, IComparer<DataItem> sortOrder)
{
IEnumerable<DataItem> result = this.GetDataItems();
result.Sort(sortOrder);
result.RemoveAll(item => !item.Display);
result = result.Take(maximum);
return result;
}

If your tests indicate it's irrelevant, than why worry about it? Don't optimize before you need to, only when it becomes a problem. If you find a problem with performance, and have used a profiler, and have found that that method is the hotspot, then you can worry more about it.
On second thought, have you considered using LINQ? Those calls could be replaced with a call to Where and OrderBy, both of which are deferred, and then calling Take, like you have in your example. The LINQ libraries should find the best way of doing this for you, and if your data size expands to the point where it takes a noticeable amount of time to process, you can use PLINQ with a simple call to AsParallel.

You might as well RemoveAll before sorting so that you'll have fewer elements to sort.

I think that Sort() method would usually have complexity of O(n*log(n)), and RemoveAll() just O(n), so in general it is probably better to remove items first.

You'd want something like this:
public IEnumerable<DataItem> GetDataItems(int maximum, IComparer<DataItem> sortOrder)
{
IEnumerable<DataItem> result = this.GetDataItems();
return result
.Where(item => item.Display)
.OrderBy(sortOrder)
.Take(maximum);
}

There are two answers that are correct, but won't teach you anything:
It doesn't matter.
You should probably do RemoveAll first.
The first is correct because you said your performance tests showed it's irrelevant. The second is correct because it will have an effect on larger datasets.
There's a third answer that also isn't very useful: Sometimes it's faster to do removals afterwards.
Again, it doesn't actually tell you anything, but "sometimes" always means there is more to learn.
There's also only so much value in saying "profile first". What if profiling shows that 90% of the time is spent doing x.Foo(), which it does in a loop? Is the problem with Foo(), with the loop or with both? Obviously if we can make both more efficient we should, but how do we reason about that without knowledge outside of what a profiler tells us?
When something happens over multiple items (which is true of both RemoveAll and Sort) there are five things (I'm sure there are more I'm not thinking of now) that will affect the performance impact:
The per-set constant costs (both time and memory). How much it costs to do things like calling the function that we pass a collection to, etc. These are almost always negligible, but there could be some nasty high cost hidden there (often because of a mistake).
The per-item constant costs (both time and memory). How much it costs to do something that we do on some or all of the items. Because this happens multiple times, there can be an appreciable win in improving them.
The number of items. As a rule the more items, the more the performance impact. There are exceptions (next item), but unless those exceptions apply (and we need to consider the next item to know when this is the case), then this will be important.
The complexity of the operation. Again, this is a matter of both time-complexity and memory-complexity, but here the chances that we might choose to improve one at the cost of another. I'll talk about this more below.
The number of simultaneous operations. This can be a big difference between "works on my machine" and "works on the live system". If a super time-efficient approach uses .5GB of memory is tested on a machine with 2GB of memory available, it'll work wonderfully, but when you move it to a machine with 8GB of memory available and have multiple concurrent users, it'll hit a bottleneck at 16 simultaneous operations, and suddenly what was beating other approaches in your performance measurements becomes the application's hotspot.
To talk about complexity a bit more. The time complexity is a measure of how the time taken to do something relates the number of items it is done with, while memory complexity is a measure of how the memory used relates to that same number of items. Obtaining an item from a dictionary is O(1) or constant because it takes the same amount of time however large the dictionary is (not strictly true, strictly it "approaches" O(1), but it's close enough for most thinking). Finding something in an already sorted list can be O(log2 n) or logarithmic. Filtering through a list will be linear or O(n). Sorting something using a quicksort (which is what Sort uses) tends to be linearithmic or O(n log2 n) but in its worse case - against a list already sorted - will be quadratic O(n2).
Considering these, with a set of 8 items, an O(1) operation will take 1k seconds to do something, where k is a constant amount of time, O(log2 n) means 3k seconds, O(n) means 8k, O(n log2 n) means 24k and O(n2) means 64k. These are the most commonly found though there are plenty of others like O(nm) which is affected by two different sizes, or O(n!) which would be 40320k.
Obviously, we want as low a complexity as possible, though since k will be different in each case, sometimes the best solution for a small set has a high complexity (but low k constant) though a lower-complexity case will beat it with larger input.
So. Let's go back to the cases you are considering, viz filtering followed by sorting vs. sorting followed by filtering.
Per-set constants. Since we are moving two operations around but still doing both, this will be the same either way.
Per-item constants. Again, we're still doing the same things per item in either case, so no effect.
Number of items. Filtering reduces the number of items. Therefore the sooner we filter items, the more efficient the rest of the operation. Therefore doing RemoveAll first wins in this regard.
Complexity of the operation. It's either a O(n) followed by a average-case-O(log2 n)-worse-case-O(n2), or it's an average-case-O(log2 n)-worse-case-O(n2) followed by an O(n). Same either way.
Number of simultaneous cases. Total memory pressure will be relieved the sooner we remove some items, (slight win for RemoveAll first).
So, we've got two reasons to consider RemoveAll first as likely to be more efficient and none to consider it likely to be less efficient.
We would not assume that we were 100% guaranteed to be correct here. For a start we could simply have made a mistake in our reasoning. For another, there could be other factors we've dismissed as irrelevant that were actually pertinent. It is still true that we should profile before optimising, but reasoning about the sort of things I've mentioned above will both make us more likely to write performant code in the first place (not the same as optimising; but a matter of picking between options when readability, clarity and correctness is equal either way) and makes it easier to find likely ways to improve those things that profiling has found to be troublesome.
For a slightly different but relevant case, consider if the criteria sorted on matched those removed on. E.g. if we were to sort by date and remove all items after a given date.
In this case, if the list deallocates on all removals, it'll still be O(n), but with a much smaller constant. Alternatively, if it just moved the "last-item" pointer*, it becomes O(1). Finding the pointer is O(log2 n), so here there's both reasons to consider that filtering first will be faster (the reasons given above) and that sorting first will be faster (that removal can be made a much faster operation than it was before). With this sort of case it becomes only possible to tell by extending our profiling. It is also true that the performance will be affected by the type of data sent, so we need to profile with realistic data, rather than artificial test data, and we may even find that what was the more performant choice becomes the less performant choice months later when the dataset it is used on changes. Here the ability to reason becomes even more important, because we should note the possibility that changes in real-world use may make this change in this regard, and know that it is something we need to keep an eye on throughout the project's life.
(*Note, List<T> does not just move a last-item pointer for a RemoveRange that covers the last item, but another collection could.)

It would probably be better to the RemoveAll first, although it would only make much of a difference if your sorting comparison was intensive to calculate.

Why can LINQ operations be faster than a normal loop?

A friend and I were a bit perplexed during a programming discussion today. As an example, we created a fictive problem of having a List<int> of n random integers (typically 1.000.000) and wanted to create a function that returned the set of all integers that there were more than one of. Pretty straightforward stuff. We created one LINQ statement to solve this problem, and a plain insertion sort based algorithm.
Now, as we tested the speed the code ran at (using System.Diagnostics.StopWatch), the results were confusing. Not only did the LINQ code outperform the simple sort, but it ran faster than a single foreach/for that only did a single loop of the list, and that had no operations within (which, on a side track, I thought the compiler was supposed to discover and remove alltogether).
If we generated a new List<int> of random numbers in the same execution of the program and ran the LINQ code again, the performance would increase by orders of magnitude (typically thousandfold). The performance of the empty loops were of course the same.
So, what is going on here? Is LINQ using parallelism to outperform normal loops? How are these results even possible? LINQ uses quicksort which runs at n*log(n), which per definition is already slower than n.
And what is happening at the performance leap on the second run?
We were both baffled and intrigued at these results and were hoping for some clarifying insights from the community, just to satisfy our own curiosity.

Undoubtedly you haven't actually performed the query, you've merely defined it. LINQ constructs an expression tree that isn't actually evaluated until you perform an operation that requires that the enumeration be iterated. Try adding a ToList() or Count() operation to the LINQ query to force the query to be evaluated.
Based on your comment I expect this is similar to what you've done. Note: I haven't spent any time figuring out if the query is as efficient as possible; I just want some query to illustrate how the code may be structured.
var dataset = ...
var watch = Stopwatch.StartNew();
var query = dataset.Where( d => dataset.Count( i => i == d ) > 1 );
watch.Stop(); // timer stops here
foreach (var item in query) // query is actually evaluated here
{
... print out the item...
}

I would suggest that LINQ is only faster than a 'normal loop' when your algorithm is less than perfect (or you have some problem in your code). So LINQ will be faster at sorting than you are if you don't write an efficient sorting algorithm, etc.
LINQ is usually 'as fast as' or 'close enough to' the speed of a normal loop, and can be faster (and simpler) to code / debug / read. That's its benefit - not execution speed.
If it's performing faster than an empty loop, you are doing something wrong. Most likely, as suggested in comments, you aren't considering deferred execution and the LINQ statement is not actually executing.

If you did not compile with "Optimize Code" enabled, you would probably see this behaviour. (It would certainly explain why the empty loop was not removed.)
The code underlying LINQ, however, is part of already-compiled code, which will certainly have been optimised (by the JIT, NGen or similar).

Where to draw the line - is it possible to love LINQ too much? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I recently found LINQ and love it. I find lots of occasions where use of it is so much more expressive than the longhand version but a colleague passed a comment about me abusing this technology which now has me second guessing myself. It is my perspective that if a technology works efficiently and the code is elegant then why not use it? Is that wrong? I could spend extra time writing out processes "longhand" and while the resulting code may be a few ms faster, it's 2-3 times more code and therefore 2-3 times more chance that there may be bugs.
Is my view wrong? Should I be writing my code out longhand rather than using LINQ? Isn't this what LINQ was designed for?
Edit: I was speaking about LINQ to objects, I don't use LINQ to XML so much and I have used LINQ to SQL but I'm not so enamoured with those flavours as LINQ to objects.

I have to agree with your view - if it's more efficient to write and elegant then what's a few milliseconds. Writing extra code gives more room for bugs to creep in and it's extra code that needs to be tested and most of all it's extra code to maintain. Think about the guy who's going to come in behind you and maintain your code - they'll thank you for writing elegant easy to read code long before they thank you for writing code that's a few ms faster!
Beware though, this cost of a few ms could be significant when you take the bigger picture into account. If that few milliseconds is part of a loop of thousands of repetitions, then the milliseconds add up fast.

Yes you can love LINQ too much - Single Statement LINQ RayTracer
Where do you draw the line? I'd say use LINQ as much as it makes the code simpler and easier to read.
The moment the LINQ version becomes more difficult to understand then the non-LINQ version it's time to swap, and vice versa. EDIT: This mainly applies to LINQ-To-Objects as the other LINQ flavours have their own benefits.

Its not possible to love Linq to Objects too much, it's a freaking awesome technology !
But seriously, anything that makes your code simple to read, simple to maintain and does the job it was intended for, then you would be silly not to use it as much as you can.

LINQ's supposed to be used to make filtering, sorting, aggregating and manipulating data from various sources as intuitive and expressive as possible. I'd say, use it wherever you feel it's the tidiest, most expressive and most natural syntax for doing what it is you're trying to do, and don't feel guilty about it.
If you start humping the documentation, then it may be time to reconsider your position.

It's cases like these where it's important to remember the golden rules of optimization:
Don't Do It
For Experts: Don't do it yet
You should absolutely not worry about "abusing" linq unless you can indentify it explicitly as the cause of a performance problem

Like anything, it can be abused. As long as you stay away from obvious poor decisions such as
var v = List.Where(...);
for(int i = 0; i < v.Count(); i++)
{...}
and understand how differed execution works, then it is most likely not going to be much slower than the longhand way. According to Anders Hejlsburg (C# architect), the C# compiler is not particularly good at optimizing loops, however it is getting much better at optimizing and parallelizing expression trees. In time, it may be more effective than a loop. The List<>'s ForEach version is actually as fast as a for loop, although I can't find the link that proves that.
P.S. My personal favorite is ForEach<>'s lesser known cousin IndexedForEach (utilizing extension methods)
List.IndexedForEach( (p,i) =>
{
if(i != 3)
p.DoSomething(i);
};

LINQ can be like art. Keep using it to make the code beautiful.

You're answering your own question by talking about writing 2-3 times more code for a few ms of performance. I mean, if your problem domain requires that speedup then yes, if not probably not. However, is it really only a few ms of performance or is it > 5% or > 10%. This is a value judgement based on the individual case.

Where to draw the line?
Well, we already know that it is a bad idea to implement your own quicksort in linq, at least compared to just using linq's orderby.

I've found that using LINQ has speed up my development and made it easier to avoid stupid mistakes that loops can introduce. I have had instances where the performance of LINQ was poor, but that was when I was using it to things like fetch data for an excel file from a tree structure that had millions of nodes.

While I see how there is a point of view that LINQ might make a statement harder to read, I think it is far outweighed by the fact that my methods are now strictly related to the problems that they are solving and not spending time either including lookup loops or cluttering classes with dedicated lookup functions.
It took a little while to get used to doing things with LINQ, since looping lookups, and the like, have been the main option for so long. I look at LINQ as just being another type of syntactic sugar that can do the same task in a more elegant way. Right now, I am still avoiding it in processing-heavy mission critical code - but that is just until the performance improves as LINQ evolves.

My only concern about LINQ is with its implementation of joins.
As I determined when trying to answer this question (and it's confirmed here), the code LINQ generates to perform joins is (necessarily, I guess) naive: for each item in the list, the join performs a linear search through the joined list to find matches.
Adding a join to a LINQ query essentially turns a linear-time algorithm into a quadratic-time algorithm. Even if you think premature optimization is the root of all evil, the jump from O(n) to O(n^2) should give you pause. (It's O(n^3) if you join through a joined item to another collection, too.)
It's relatively easy to work around this. For instance, this query:
var list = from pr in parentTable.AsEnumerable()
join cr in childTable.AsEnumerable() on cr.Field<int>("ParentID") equals pr.Field<int>("ID")
where pr.Field<string>("Value") == "foo"
select cr;
is analogous to how you'd join two tables in SQL Server. But it's terribly inefficient in LINQ: for every parent row that the where clause returns, the query scans the entire child table. (Even if you're joining on an unindexed field, SQL Server will build a hashtable to speed up the join if it can. That's a little outside LINQ's pay grade.)
This query, however:
string fk = "FK_ChildTable_ParentTable";
var list = from cr in childTable.AsEnumerable()
where cr.GetParentRow(fk).Field<string>("Value") == "foo"
select cr;
produces the same result, but it scans the child table once only.
If you're using LINQ to objects, the same issues apply: if you want to join two collections of any significant size, you're probably going to need to consider implementing a more efficient method to find the joined object, e.g.:
Dictionary<Foo, Bar> map = buildMap(foos, bars);
var list = from Foo f in foos
where map[f].baz == "bat"
select f;

Back to basics; for-loops, arrays/vectors/lists, and optimization

I was working on some code recently and came across a method that had 3 for-loops that worked on 2 different arrays.
Basically, what was happening was a foreach loop would walk through a vector and convert a DateTime from an object, and then another foreach loop would convert a long value from an object. Each of these loops would store the converted value into lists.
The final loop would go through these two lists and store those values into yet another list because one final conversion needed to be done for the date.
Then after all that is said and done, The final two lists are converted to an array using ToArray().
Ok, bear with me, I'm finally getting to my question.
So, I decided to make a single for loop to replace the first two foreach loops and convert the values in one fell swoop (the third loop is quasi-necessary, although, I'm sure with some working I could also put it into the single loop).
But then I read the article "What your computer does while you wait" by Gustav Duarte and started thinking about memory management and what the data was doing while it's being accessed in the for-loop where two lists are being accessed simultaneously.
So my question is, what is the best approach for something like this? Try to condense the for-loops so it happens in as little loops as possible, causing multiple data access for the different lists. Or, allow the multiple loops and let the system bring in data it's anticipating. These lists and arrays can be potentially large and looping through 3 lists, perhaps 4 depending on how ToArray() is implemented, can get very costy (O(n^3) ??). But from what I understood in said article and from my CS classes, having to fetch data can be expensive too.
Would anyone like to provide any insight? Or have I completely gone off my rocker and need to relearn what I have unlearned?
Thank you

The best approach? Write the most readable code, work out its complexity, and work out if that's actually a problem.
If each of your loops is O(n), then you've still only got an O(n) operation.
Having said that, it does sound like a LINQ approach would be more readable... and quite possibly more efficient as well. Admittedly we haven't seen the code, but I suspect it's the kind of thing which is ideal for LINQ.

For referemce,
the article is at
What your computer does while you wait - Gustav Duarte
Also there's a guide to big-O notation.
It's impossible to answer the question without being able to see code/pseudocode. The only reliable answer is "use a profiler". Assuming what your loops are doing is a disservice to you and anyone who reads this question.

Well, you've got complications if the two vectors are of different sizes. As has already been pointed out, this doesn't increase the overall complexity of the issue, so I'd stick with the simplest code - which is probably 2 loops, rather than 1 loop with complicated test conditions re the two different lengths.
Actually, these length tests could easily make the two loops quicker than a single loop. You might also get better memory fetch performance with 2 loops - i.e. you are looking at contiguous memory - i.e. A[0],A[1],A[2]... B[0],B[1],B[2]..., rather than A[0],B[0],A[1],B[1],A[2],B[2]...
So in every way, I'd go with 2 separate loops ;-p

Am I understanding you correctly in this?
You have these loops:
for (...){
// Do A
}
for (...){
// Do B
}
for (...){
// Do C
}
And you converted it into
for (...){
// Do A
// Do B
}
for (...){
// Do C
}
and you're wondering which is faster?
If not, some pseudocode would be nice, so we could see what you meant. :)
Impossible to say. It could go either way. You're right, fetching data is expensive, but locality is also important. The first version may be better for data locality, but on the other hand, the second has bigger blocks with no branches, allowing more efficient instruction scheduling.
If the extra performance really matters (as Jon Skeet says, it probably doesn't, and you should pick whatever is most readable), you really need to measure both options, to see which is fastest.
My gut feeling says the second, with more work being done between jump instructions, would be more efficient, but it's just a hunch, and it can easily be wrong.

Aside from cache thrashing on large functions, there may be benefits on tiny functions as well. This applies on any auto-vectorizing compiler (not sure if Java JIT will do this yet, but you can count on it eventually).
Suppose this is your code:
// if this compiles down to a raw memory copy with a bitmask...
Date morningOf(Date d) { return Date(d.year, d.month, d.day, 0, 0, 0); }
Date timestamps[N];
Date mornings[N];
// ... then this can be parallelized using SSE or other SIMD instructions
for (int i = 0; i != N; ++i)
mornings[i] = morningOf(timestamps[i]);
// ... and this will just run like normal
for (int i = 0; i != N; ++i)
doOtherCrap(mornings[i]);
For large data sets, splitting the vectorizable code out into a separate loop can be a big win (provided caching doesn't become a problem). If it was all left as a single loop, no vectorization would occur.
This is something that Intel recommends in their C/C++ optimization manual, and it really can make a big difference.

... working on one piece of data but with two functions can sometimes make it so that code to act on that data doesn't fit in the processor's low level caches.
for(i=0, i<10, i++ ) {
myObject object = array[i];
myObject.functionreallybig1(); // pushes functionreallybig2 out of cache
myObject.functionreallybig2(); // pushes functionreallybig1 out of cache
}
vs
for(i=0, i<10, i++ ) {
myObject object = array[i];
myObject.functionreallybig1(); // this stays in the cache next time through loop
}
for(i=0, i<10, i++ ) {
myObject object = array[i];
myObject.functionreallybig2(); // this stays in the cache next time through loop
}
But it was probably a mistake (usually this type of trick is commented)
When data is cycicly loaded and unloaded like this, it is called cache thrashing, btw.
This is a seperate issue from the data these functions are working on, as typically the processor caches that separately.

I apologize for not responding sooner and providing any kind of code. I got sidetracked on my project and had to work on something else.
To answer anyone still monitoring this question;
Yes, like jalf said, the function is something like:
PrepareData(vectorA, VectorB, xArray, yArray):
listA
listB
foreach(value in vectorA)
convert values insert in listA
foreach(value in vectorB)
convert values insert in listB
listC
listD
for(int i = 0; i < listB.count; i++)
listC[i] = listB[i] converted to something
listD[i] = listA[i]
xArray = listC.ToArray()
yArray = listD.ToArray()
I changed it to:
PrepareData(vectorA, vectorB, ref xArray, ref yArray):
listA
listB
for(int i = 0; i < vectorA.count && vectorB.count; i++)
convert values insert in listA
convert values insert in listB
listC
listD
for(int i = 0; i < listB.count; i++)
listC[i] = listB[i] converted to something
listD[i] = listA[i]
xArray = listC.ToArray()
yArray = listD.ToArray()
Keeping in mind that the vectors can potentially have a large number of items. I figured the second one would be better, so that the program wouldnt't have to loop n times 2 or 3 different times. But then I started to wonder about the affects (effects?) of memory fetching, or prefetching, or what have you.
So, I hope this helps to clear up the question, although a good number of you have provided excellent answers.

Thank you every one for the information. Thinking in terms of Big-O and how to optimize has never been my strong point. I believe I am going to put the code back to the way it was, I should have trusted the way it was written before instead of jumping on my novice instincts. Also, in the future I will put more reference so everyone can understand what the heck I'm talking about (clarity is also not a strong point of mine :-/).
Thank you again.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.